Hacker News
Bullshit Machines
MarkusQ
|next
[-]
redlewel
|next
|previous
[-]
dTal
|next
|previous
[-]
thegrim33
|next
|previous
[-]
Arodex
|next
|previous
[-]
jrflo
|next
|previous
[-]
shric
|root
|parent
[-]
jhbadger
|root
|parent
[-]
kerblang
|next
|previous
[-]
> How did he reach that conclusion? Basically, he asked “Are you conscious?”, the machine responded “Yes”, and that was that.
Oh, come on now. This is referring to Blake Lemoine, and while I doubt his conclusions, he wasn't being as simplistic as all that. He's not completely stupid.
damnitbuilds
|next
|previous
[-]
See what they did there ?
simianwords
|next
|previous
[-]
> LLMs operate in the plane of words, not in the world of physical phenomena that science investigates. They don’t reason, synthesize evidence, or draw upon the previous literature. They can generate text that looks like a paper but mistaking this for science is a cargo-cult fallacy.
This is clearly wrong
djhn
|root
|parent
|next
[-]
Plane of words: broadly correct. Everything is flattened to tokens and token sequences, and the training data is dominated by text tokens.
Reasoning: CoT tokens are mostly just tokens, more appropriately called intermediate tokens, and are largely disconnected from the end result. Including them improves the end result (user satisfaction), but does not imply reasoning. See for example Turpin 2023, Mirzadeh 2024, Pournemat 2025, Palod 2025.
Synthesising evidence: You can achieve SOTA summaries with LLMs, but this involves, for example, using a harness to generate dozens of summaries with different models, separately using some kind of vector embedding model to compare results to the original, and selecting the best match. This is not how most people are using LLMs for summaries. While this is being slowly RLVR’d in post-training, a one-shot naive summary underperforms more complex methods significantly.
simianwords
|root
|parent
[-]
djhn
|root
|parent
|next
[-]
The Erdős problems have turned out to be largely brute force or finding older results.
The Feb 2026 GPT-5.2 theoretical physics paper was a result of “dialogue between physicists and LLMs”, called “grad student level” by experts in the field, used a “custom harnessed” “internal OpenAI” model with “20 hours of reasoning”. Quotes from OpenAI blog.
The Matthew Schwartz physics paper with Claude this March involved “51,248 messages across 270 sessions, producing over 110 draft versions and consuming 36 million tokens”, and the actual contribution was Schwartz finding an error in Claude’s solution.
BrandoElFollito
|previous
[-]
Apparently they did not have enough time to study how to effectively convey that information.
This is the funniest, most useless "science" site I've seen.