Hacker News
Ornith-1.0: Self-scaffolding LLMs for agentic coding
SwellJoe
|next
[-]
Balinares
|next
|previous
[-]
Balinares
|root
|parent
|next
[-]
It was fairly good at diagnosing the bugs once informed of their symptoms. However, if I mischaracterized the symptoms, it would weigh my input too heavily and reject its own (correct) hunch about the root cause.
So it's an interesting one. There's definitely some latent capability in there that arguably exceeds Qwen 3.6, which is absolutely no small feat. But that capability seems to come in a somewhat erratic package.
It's probably worth benchmarking it unquantized if you can. I've grown to suspect that quantization damages small models more than perplexity and KL divergence accurately reflect.
I'll also give the 9B weights a shot when I can.
juliangoldsmith
|root
|parent
|next
|previous
[-]
In my brief tests, Ornith 35B performed quite well. It won't replace DeepSeek V4 Flash for me, but if it was fast and cheap enough it might.
I don't remember being super impressed with Ornith 9B, but I could see it being on par with Qwen 3.5 35B.
nzach
|previous
[-]
If that is the case, this isn't just a fancy way to perform prompt optimization?