Hacker News
ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math
Havoc
|next
[-]
LM studio doesn't let me actually run this yet though: "Unsupported safetensors format: null"
throwaw12
|next
|previous
[-]
I think this is very important to eventually become a viable replacement for coding models. Because most of the time coding harnesses are leveraging tool calls to gather the context and then write a solution.
I am hopeful, that one day we can replace Claude and OpenAI models with local SOTA LLMs
2ndorderthought
|root
|parent
|next
[-]
It is more finicky than Claude but if you hand hold it a bit it's crazy.
gchamonlive
|root
|parent
[-]
So yeah, while it's true that qwen3.6 is good for agentic coding, it's not very good for exploring the codebase and coming up with plans. You need to pair it today with a model capable of ingesting the whole context and providing a detailed plan, and even then the implementation might take 10x the amount of time it'd take for sonnet or Gemini 3 to crunch through the plan.
2ndorderthought
|previous
[-]
No I am not saying this model is a drop in Claude replacement. But I think in 2 years we might be really surprised what can be done in a desktop with commodity hardware, no connection to the internet, and a few models that span a subset of tasks.
Really happy to see amd put their hat in the ring. It's a good day for amd investors. I know a lot of AI bros will scoff at this, but having your first training run is a big deal for a new lab. AMD is on their way despite Nvidia having years of runway
zimi-24-imiz
|root
|parent
|next
[-]
same thing with smol local LLMs versus the big ones in the sky. your smol local LLM will only be able to tackle projects which are not comercially valuable anymore, because people expect 100x scope and features. which is fine as a hobby/art project
yes, we'll do amazing things with local LLMs in 2 years, but the big LLMs will do things beyond imagination (assembly vs C)
2ndorderthought
|root
|parent
[-]
I think we are going to see a surge in software claiming to do everything and becoming bloated and unsustainable.
I already see 1gpu local models 1 shotting games via vibe coding. I see people doing agentic programming, granted more slowly and cheaply than 12 Claude sessions.
The difference isn't as big as it was 2 months ago. In the past 45 days so many model releases have happened. Meanwhile frontier performance has stagnated and degraded. If it's a taste of what is to come I welcome it.
hparadiz
|root
|parent
[-]
steveharing1
|root
|parent
|previous
[-]
zimi-24-imiz
|root
|parent
[-]
steveharing1
|root
|parent
[-]
adrian_b
|root
|parent
|next
[-]
OpenAI has provided in the past a couple of open-weights models, but it does not seem to plan the release of any others.
But except for OpenAI and Anthropic, with this announcement Zyphra is the 12th company which has announced new improved open-weights models during the last couple of months.
A half of these 12 companies have launched not only small models with less than 128B parameters, but also big models with a number of parameters ranging from over 200B to over 1T.
So for now there is a healthy competition and the offerings in open-weights models are very diverse and numerous.
2ndorderthought
|root
|parent
|previous
[-]
Deepseek is doing valuations right now.
Moonshot is just getting started. Same with AMD. mistral is still working hard at it and has a customer base.
An Egyptian company dropped their first small model this month, Horus.
There are enough geopolitics at play that I expect this to be a very different outcome from typical startup market dynamics. If anything j worry about the big us labs longevity. The world is fed up with US tech it seems, and even for us citizens it's questionable the frontier labs have their interests in mind as they risk the entire economy.