suitecake (@suitecake.bsky.social) reply parent
The finding there was that creating a custom wedding cake is a form of speech; presumably wouldn't apply here
suitecake (@suitecake.bsky.social) reply parent
The finding there was that creating a custom wedding cake is a form of speech; presumably wouldn't apply here
suitecake (@suitecake.bsky.social) reply parent
Not a great comparison; Powell has much less to lose. Starmer getting on Trump's bad side could have very negative impacts, from the economy to defense. For better or worse, Starmer's first responsibility is to his people, and that sometimes means having to sit and eat some bullshit.
suitecake (@suitecake.bsky.social) reply parent
Folks are missing the point here I think; it's not an argument against offloading cognitive tasks, it's an argument against offloading ALL cognitive tasks. Offloading some cognitive tasks can free up time to pursue higher order cognitive tasks (which would exercise that critical thinking muscle)
suitecake (@suitecake.bsky.social) reply parent
Long-termism is actually quite prominent in the AI community, and a lot of research goes into positive social impact (including de-biasing research, see for example www.anthropic.com/research/eva...). (though they really should be doing more about alignment overall)
suitecake (@suitecake.bsky.social) reply parent
That's not a full and accurate set of takeaways from the studies, and these are just two studies that were close at hand and are not exhaustive. There are more; I encourage you to do some research. Effective AI use improves critical thinking skills and accelerates software dev
suitecake (@suitecake.bsky.social) reply parent
Claude Opus 4 routinely one-shots 200-line Python scripts that actually do what I want, often with more sophistication than I had in mind When was the last time you tried using them for significant work? Hallucinations are far better than they were (while not hitting zero).
suitecake (@suitecake.bsky.social) reply parent
Two studies demonstrating significant productivity gains: 1. www.science.org/doi/10.1126/... 2. academic.oup.com/qje/article/...
suitecake (@suitecake.bsky.social) reply parent
Alternately: there's genuine productivity gains here for those who take them. (This is the actual answer, mostly)
suitecake (@suitecake.bsky.social) reply parent
I think it's facially implausible that the average American uses 100,000 queries per day
suitecake (@suitecake.bsky.social) reply parent
Per capita beef consumption in the US is estimated at somewhere around 60 pounds per person per year. Assuming each hamburger is 8 oz (generous), that's roughly 1 hamburger per American every 3 days, which works out to (pessimistically) 100,000 chatgpt queries per day. I'm not using that many.
suitecake (@suitecake.bsky.social) reply parent
Ah, I inadvertently left out half of my reasoning: pair that with LLMs capacity for detecting mistakes in their own output via Extended Thinking, and it's a demonstration of what you're talking about. But that's getting fuzzy and remote; simplest would be for you to test it yourself. They can
suitecake (@suitecake.bsky.social) reply parent
They already can; here's a study from 2024 demonstrating a method for improving their existing indirect reasoning ability: arxiv.org/abs/2402.03667
suitecake (@suitecake.bsky.social) reply parent
For studies demonstrating LLM utility for medical diagnostics and decision-making, see: 1. www.nature.com/articles/s41... 2. www.nature.com/articles/s43... Offloading thinking can enable higher-order thinking; researchers have assistants and judges have clerks; agree ofc there are failure modes
suitecake (@suitecake.bsky.social) reply parent
It's not zero, but it is absolutely swamped by orders of magnitude compared to other common consumption like beef
suitecake (@suitecake.bsky.social) reply parent
Agree!
suitecake (@suitecake.bsky.social) reply parent
The GPT-4 responses to questions 1 and 2 were pre-generated, but not for 3 and 4 No separate control group; participants' answers before AI assistance is the control Agree that study has limitations shouldn't in isolation heavily swing someone's priors
suitecake (@suitecake.bsky.social)
I've long lurked but not posted/replied much on whatever the term is for X and BlueSky, and after replying on some stuff yesterday, logging off for a couple hours and coming back to dozens of notifications, yeah. I see the appeal
suitecake (@suitecake.bsky.social) reply parent
It shows both: "Here we show that physicians are willing to modify their clinical decisions based on GPT-4 assistance, leading to improved accuracy scores from 47% to 65% in the white male patient group and 63% to 80% in the Black female patient group."
suitecake (@suitecake.bsky.social) reply parent
I've been hearing a whole lot of researchers in a variety of domains praise Deep Research as a significant aid to their work. LLMs making folks dumber is a skill issue. Used well, they make you smarter
suitecake (@suitecake.bsky.social) reply parent
Nah, he's just neoliberal. I'm presumably on that same blocklist.
suitecake (@suitecake.bsky.social) reply parent
Sure thing, here are two: 1. www.nature.com/articles/s41... 2. www.nature.com/articles/s43...
suitecake (@suitecake.bsky.social) reply parent
It wasn't obvious, which is why I asked. I was trying to figure out where they were coming from; there were a few options (and some people really are anti-energy production!)
suitecake (@suitecake.bsky.social) reply parent
www.seangoedecke.com/water-impact...
suitecake (@suitecake.bsky.social) reply parent
There will always be people saying hyperbolic shit to get clicks though
suitecake (@suitecake.bsky.social) reply parent
I'm talking in clinical environments, appropriately prompted, with an appropriate level of skepticism
suitecake (@suitecake.bsky.social) reply parent
Seems analogous to human cognitive bias to me. We've got some weird ones; evolutionary residue, totally batshit irrational, and very difficult to disentangle ourselves from
suitecake (@suitecake.bsky.social) reply parent
So it can't solve problems, except it can, but that isn't reasoning? Making sure I'm following
suitecake (@suitecake.bsky.social) reply parent
It's able to successfully solve multi-step problems by deriving valid and salient intermediate steps. That fits a sensible, coherent and common definition of "reasoning." As for why they have a jagged graph of capabilities relative to humans, I don't see the relevance re: whether it can reason
suitecake (@suitecake.bsky.social) reply parent
LLMs have diagnostic utility too, and quite a lot of studies are suggesting serious value here (though with how new everything is, we're still waiting in many cases on peer review and relocation). And beware reductionism here; we humans are "just" electrical signals in a fatty, organic substrate.
suitecake (@suitecake.bsky.social) reply parent
We absolutely need much more energy infrastructure investment than we currently have, agreed. And don't forget about the efficiency improvements over time in all this.
suitecake (@suitecake.bsky.social) reply parent
There have been quite a few studies demonstrating significant value for medical diagnostics from LLMs specifically, not just other kinds of models
suitecake (@suitecake.bsky.social) reply parent
On a skim, there's some stuff I like about that article (equity and interpretability were and continue to be major concerns), but I wish the author would spend less time dunking on the most breathless extremes
suitecake (@suitecake.bsky.social) reply parent
How does this even apply??
suitecake (@suitecake.bsky.social) reply parent
My following someone doesn't mean I like and support everything they've ever said
suitecake (@suitecake.bsky.social) reply parent
No, I'm familiar with English and know it can mean "new," which is the sense in which I meant it; IE, not containing directly in training data. I'm not interested in getting into an argument about semantics; if you want that gotcha as an excuse to dip, have at it
suitecake (@suitecake.bsky.social) reply parent
Yes it is
suitecake (@suitecake.bsky.social) reply parent
Depends on your definition of novel. By novel, I just mean "a problem that isn't in its training set such that it can simply quote text as the answer." Original context here is that LLMs are capable of doing this and birds can't.
suitecake (@suitecake.bsky.social) reply parent
I support this (with, y'know, "Do No Harm" as part of that do whatever you want vibe)
suitecake (@suitecake.bsky.social) reply parent
I mean, what exactly are YOU doing here?
suitecake (@suitecake.bsky.social) reply parent
If you mean unassisted and autonomously, that's quite a high bar! By "novel problems," I simply mean "problems not contained directly in its training set." So, solving for X in an algebraic problem that does not exist as-is in a training set would count for me as novel. Birds can't do that; LLMs can
suitecake (@suitecake.bsky.social) reply parent
Full sentences? Prior to today, it's been quite awhile since I've jumped in comments re: AI. It rankles me when I see disinformation, and there's a LOT of disinformation about AI on BlueSky.
suitecake (@suitecake.bsky.social) reply parent
This was a fantastic listen and a much deeper and more thoughtful level of analysis than I generally see re: AI. Given the serious near-term alignment risks (biosecurity being a big one), it's encouraging to see people calling it out and taking it seriously.
suitecake (@suitecake.bsky.social) reply parent
Never before in my life accused of this, now it's twice in one day. Why do you think I'm using AI? I'm not even saying "delve," or using em dashes.
suitecake (@suitecake.bsky.social) reply parent
I encourage you to go test this out yourself. I promise you that they are capable of solving novel problems, unless you're operating from an extremely narrow definition of "novel." As far as whether it's reasoning, comes down to what you think reasoning is. I don't believe it requires qualia
suitecake (@suitecake.bsky.social) reply parent
Sure thing. For a commission of this length, it'll be $450 for the lot. DM me for my Venmo
suitecake (@suitecake.bsky.social) reply parent
FrontierMath is specifically designed to be extremely difficult because all the other math benchmarks got so saturated as to be meaningless. But also, my understanding is its question set is unpublished. How do you square that with the >0% succes rate from models in the last month?
suitecake (@suitecake.bsky.social) reply parent
First time I've been accused of that. I guess all I can do is encourage you to follow up this discussion by reading up on the scientific consensus.
suitecake (@suitecake.bsky.social) reply parent
Can't do that, but would you prefer a sonnet about tangerines? :P
suitecake (@suitecake.bsky.social) reply parent
It can. It was capable of this pre-2024, but Chain of Thought significantly increased its abilities to conduct multi-step problem-solving. This 2022 paper was influential: arxiv.org/abs/2201.11903 But you can also just test this yourself; every frontier model has been able to do it for awhile.
suitecake (@suitecake.bsky.social) reply parent
Scientific consensus is that we aren't at risk of making the whole planet uninhabitable, but I agree that we need to throw our weight behind addressing climate change, for humanitarian reasons if nothing else. It's the globally underserved who are and will continue to do most of the suffering
suitecake (@suitecake.bsky.social) reply parent
There's a lot of slop, no argument here, and it's having negative impacts all over the place. But there's a lot of value in AI too, and the energy is much less than you probably think.
suitecake (@suitecake.bsky.social) reply parent
Why not? There's massive value in a variety of domains eg. medical diagnostics. Their prospective use as tutors is a massive opportunity for educational equity (still need to be evaluated). Possible upside is massive (not to say it will be easy, or that there aren't scary possible futures here)
suitecake (@suitecake.bsky.social) reply parent
Are you anti-energy production in general? Nuclear energy produces fewer greenhouse gasses than even wind and solar. It's not to say there are _no_ environmental impacts, but on balance, nuclear is one of the best we have.
suitecake (@suitecake.bsky.social) reply parent
Birds aren't capable of solving novel problems requiring multi-step problem solving and explaining the line of thinking and result.
suitecake (@suitecake.bsky.social) reply parent
Environmental impact of AI is far, far lower than is commonly believed. One hamburger uses as much water as hundreds of thousands of ChatGPT queries.
suitecake (@suitecake.bsky.social) reply parent
It's highly misleading to share this as representative of AI capabilities in general.
suitecake (@suitecake.bsky.social) reply parent
Agree that this and Google AI's overview are both garbage, but: 1. Meta's AI models have been well-behind SotA for awhile now, so this isn't surprising 2. Much like Google's AI overview, this is presumably a micro model relative to standard LLMs (Claude, 4o, Gemini 2.5), hurting quality
suitecake (@suitecake.bsky.social) reply parent
Seems weird to blame the speaker here rather than the platformer. Why SHOULDN'T he speak his mind?
suitecake (@suitecake.bsky.social) reply parent
Valid, but that hasn't been my experience
suitecake (@suitecake.bsky.social) reply parent
The switch between God as responsible for all the good things and not responsible for all the bad things never ceases to be jarring. Hyperagentic, except for when he's completely not
suitecake (@suitecake.bsky.social) reply parent
I take it you're not open to trying it for coding? Anecdotally, Claude routinely one-shots smaller scripts for me (and routinely at a better level of quality than I would bother with). I haven't tried it for larger contexts, and I hear it's not great there, but on the small scale it seems great.
suitecake (@suitecake.bsky.social) reply parent
Ah, makes sense. Microsoft Copilot is basically invisible to me now as I've learned to just ignore it, but now that you mention it, it's obnoxious AF. As for commercials, that's surprising. Like, on TV? We talking major firms (OpenAI, Anthropic, etc)?
suitecake (@suitecake.bsky.social) reply parent
Google's AI overview is the worst of the bunch. It's incredible just how bad it is. It's weird because their Gemini model is far better; I haven't to assume it's a compute issue. Why they've even bothered, I have no idea. Seems like a huge PR fail for them.
suitecake (@suitecake.bsky.social) reply parent
How is it being foisted? (This probably has the cadence of disagreement, but I'm not, I just haven't seen it myself)
suitecake (@suitecake.bsky.social) reply parent
Kudos for saying so. Many would not
suitecake (@suitecake.bsky.social) reply parent
Capabilities are getting better over time, and fast. Even if it plateaued off today, a LOT of white collar work is at risk when it's fully exploited. For examples where AI is excelling at heretofore human work, see: medical diagnostics, copy-editing, transcribing and translation.
suitecake (@suitecake.bsky.social) reply parent
At this point, I'm pretty sure they're trolling. Either that, or they're too partisan-brained to reason with. Weird to see it on the blue side of things, but here we are.
suitecake (@suitecake.bsky.social) reply parent
Of course it can, just as much as it can explain the theme of moral responsibility in The Brothers Karamazov. It might hallucinate, but there's a long history of getting AIs to talk about or even regurgitate their system prompts. Whether LLMs have qualia (they obviously don't) is irrelevant
suitecake (@suitecake.bsky.social) reply parent
Yeah, I don't think that. At all.
suitecake (@suitecake.bsky.social) reply parent
This has the cadence of disagreeing, but I don't think we do
suitecake (@suitecake.bsky.social) reply parent
I'm not distracting. YOU are getting distracted. That's a you problem. The rest of us can walk and chew bubblegum.
suitecake (@suitecake.bsky.social) reply parent
It's not a good sign if your best effort here is "don't believe what your eyes and ears tell you; here. Read the transcript."
suitecake (@suitecake.bsky.social) reply parent
Yeah, Trump is a lunatic, no argument here
suitecake (@suitecake.bsky.social) reply parent
And yes, people who are in decline are capable of bursts of lucidity. It's not an all or nothing affair. Some days a grandparent doesn't recognize their grandchildren, but the next day they may.
suitecake (@suitecake.bsky.social) reply parent
Seems fine for it to be a story? If both Trump and Biden are senile, why can't we just say both things are true? I for one want to know what exactly was going on in the Biden admin, and what they were doing to manage his obvious decline.
suitecake (@suitecake.bsky.social) reply parent
What we all saw was not a stutter. The overseas travel that purportedly caused jet lag was from 12 days prior. He had moments of salience here and there; I remember him being pretty good on abortion. But it was overwhelmingly a sad thing to watch.
suitecake (@suitecake.bsky.social) reply parent
C'mon now. We all saw the debate. We've all seen the weird, stumbling answers at press conferences and interviews. The Biden of 2024 was very obviously not the Biden of 2015, or even 2020. Agree that Trump is senile. But the decline is less obvious because he's never been that impressive. Biden was
suitecake (@suitecake.bsky.social) reply parent
I voted for Hillary, voted for Biden, voted for Harris, and hate everything Trump is and stands for. I would have happily voted for Biden a second time over Trump. He is, nevertheless, senile.
suitecake (@suitecake.bsky.social) reply parent
Joe Biden absolutely is senile. I'm not saying it as an insult, it's just...true. He clearly declined cognitively, and he's an old man. It's what the word literally means
More Abstract Popehat (@kenwhite.bsky.social) reposted
I enthusiastically aid and abet terrorism.
suitecake (@suitecake.bsky.social) reply parent
I'll take your word for it; I don't recall seeing that myself
suitecake (@suitecake.bsky.social) reply parent
I don't think I've ever met someone who thinks Trump is bad because he is orange.
suitecake (@suitecake.bsky.social) reply parent
Maybe we swim in different circles; I don't surround myself with people who have shallow political views, but I do have a lot of friends who enjoy mockery
suitecake (@suitecake.bsky.social) reply parent
No it doesn't?
suitecake (@suitecake.bsky.social) reply parent
I'm talking about how fucked the prices are, not whether walkable Canadian neighborhoods in cities exist (they of course do)
suitecake (@suitecake.bsky.social) reply parent
The housing market in Canada for walkable cities is even more fucked than it is in the US
suitecake (@suitecake.bsky.social) reply parent
Important subject, but dramatically underestimates the existential risk of AGI itself
suitecake (@suitecake.bsky.social)
passing the ball to the ref is like calling your teacher mom
suitecake (@suitecake.bsky.social) reply parent
He's mine and he's great
suitecake (@suitecake.bsky.social) reply parent
If that's how it actually plays out, I'm all for it
suitecake (@suitecake.bsky.social) reply parent
Scorn is no good, but IMO we should view a desire to make a sacrifice to accomplish something as a limited resource, and make sure we spend that resource in the most consequential way. Catharsis can be gotten too cheaply
suitecake (@suitecake.bsky.social) reply parent
It's back up.
suitecake (@suitecake.bsky.social) reply parent
It was. That's what we have to come to terms with. Tens of millions of Americans know exactly who he is and voted for him anyway. DC doesn't fuck with that shit though. No love for Trump here.
suitecake (@suitecake.bsky.social) reply parent
Blegh. What're the odds the Eleventh Circuit resolves this before inauguration? What's their likely finding?
suitecake (@suitecake.bsky.social) reply parent
They're still there, at least in Firefox, in the browser, as of now. Shown below: mousing over the date of the tweet
suitecake (@suitecake.bsky.social) reply parent
Do we have reason to believe this is real and not made up?
suitecake (@suitecake.bsky.social) reply parent
/Claude I find increasingly that questions I have about the world and how to proceed are well-served by asking AI, and the limiter is generally my mindfulness about that fact, as well as figuring out the right way to ask the question. Time to build up my Anki decks
suitecake (@suitecake.bsky.social) reply parent
I chose these subjects because they: * Have frequent practical applications * Serve as building blocks for more complex knowledge * Help with both personal development and helping others * Support better decision-making * Aid in understanding and navigating the modern world 9/n