Hacker News
Taming LLMs: Using Executable Oracles to Prevent Bad Code
dktoao
|next
[-]
Wouldn't that just be called inventing a new language with all the overhead of the languages we already have? Are we getting to the point where getting LLMs to be productive and also write good code is going to require so much overhead and additional procedures and tools that we might as well write the code ourselves. Hmmm...
virgilp
|root
|parent
|next
[-]
LLMs just take this to the extreme. You can no longer rely on human code reviews (well you can but you give away all the LLM advantages) so then if you take out "human judgement" *from validation*[1], you have to resort to very sophisticated automated validation. This is it - it's not about "inventing a new language", it's about being much more thorough (and innovative, and efficient) in the validation process.
[1] never from design, or specification - you shouldn't outsource that to AI, I don't think we're close to an AI that can do that even moderately effective without human help.
seanw444
|root
|parent
|previous
[-]
voxaai
|next
|previous
[-]
RS-232
|next
|previous
[-]
mapontosevenths
|root
|parent
|next
[-]
That said, with both the test driven development this post describes and the reviewer model (its best to do both) you have to provide an escape hatch or out for the model. If you let the model get inescapably stuck with an impossible test or constraints it will just start deleting tests or rewriting the entire codebase in rust or something.
My escape hatch is "expert advice". I let the weak LLM phone a friend when its stuck and ask a smarter LLM for assistance. Its since stopped going crazy and replacing all my tests with gibberish... mostly.
sanxiyn
|root
|parent
|next
|previous
[-]
https://www.anthropic.com/engineering/harness-design-long-ru...