Hacker News
Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs
28 points by darkrishabh
ago
|
5 comments
ssgodderidge
|next
[-]
The example model in the documentation is 4o-mini, you might want to update that to a more recent model.
As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?
stingraycharles
|root
|parent
|next
[-]
It’s an artifact of the documentation being AI generated, they usually pick gpt4-era models, without giving it further thought.
For Gemini it seems to always pick 2.5 despite 3.1 being the latest, Claude the 3.5-era models.
Not sure what’s preventing AI labs on ensuring this stuff is refreshed during training.