Hacker News
LFM2-24B-A2B: Scaling Up the LFM2 Architecture
meatmanek
|next
[-]
Otherwise, if you have a GPU with more than like 4GB of VRAM, there are better models. Gemma4 and Qwen3.6 (or Qwen3.5 if you need the smaller dense models that haven't yet been released for 3.6) are a good place to start.
BoredomIsFun
|next
|previous
[-]
1dom
|root
|parent
[-]
I find Gemmas really good for a short conversation with maybe 3 or 4 exchanges of a few paragraphs each, which covers a surprisingly large amount of interactions.
For anything longer form though, particularly with larger code contexts, Qwen is far more useful for me personally.
I'm not an expert in this field, but my understanding is Qwen are hybrid gated attention mechanisms, whereas Gemma is hybrid including a sliding attention attention mechanism which makes it look like it favour the most recent tokens a little too much at times.
This is all in the context of local quantized models, I'm aware both have larger cloud variants that wouldn't suffer as much.