Hacker News
Knowledge Distillation of Black-Box Large Language Models (2024)
Alifatisk
|next
[-]
babelfish
|root
|parent
[-]
tough
|root
|parent
[-]
swingboy
|root
|parent
[-]
adrian_b
|root
|parent
[-]
All the results presented in these distillation papers are for very small models.
In order to gain anything, Alibaba or others would need today to use the Anthropic models to improve LLMs at least one hundred times bigger than those tested in these papers.
I assume that the number of queries to the teacher LLM grows superlinearly with the size of the student model, which would mean that billions of queries would be needed. Even for a linear growth, at least hundreds of millions of queries would be needed.
I do not see how any Claude account could do so many queries without being detected. Even if the queries would be distributed over thousands of accounts, it would still be easy for Anthropic to stop any such attempts.
dmezzetti
|next
|previous
[-]
Related paper that's a good read: https://arxiv.org/abs/1908.08962
phantompeace
|next
|previous
[-]
potus_kushner
|next
|previous
[-]
StreamCtx
|next
|previous
[-]
adrian_b
|root
|parent
[-]
If the teacher model is run by an external entity, e.g. Anthropic or OpenAI, then the number of queries to the blackbox model that is required is so great that it should be easy for the owner of the teacher LLM to detect and stop any such attempts.
duendefm
|next
|previous
[-]
Roark66
|root
|parent
|next
[-]
But we're not far.
My requirements: - a terminal app without advanced tui, not written like "a browser running in a terminal" or a game. There is no need to overcomplicated. - ability to manage prompts per model, compress context using alternate models, and minimise token costs better - like the YouTube's Sentdex's Minion mini harness (in fact I'm building on top of his as we speak). - support for agent work fanout - support for MCP, but switchable off/on depending if needed (I use a single MCP aggregator anyway so mcp tool use doesn't eat my context) - support for lsp/tree-sitter, again switchable when needed. - support for OpenAI api and written easily enough so other ones like deepinfra are easy to add.
Nice to have: - have some sort "prompt library" that would store tweaked versions of prompts for different models so it adjusted the harness as needed depending on which model we call.
That's it.
laichzeit0
|root
|parent
|next
|previous
[-]
anax32
|root
|parent
|next
|previous
[-]
nozzlegear
|root
|parent
|next
|previous
[-]
I'm doing my part!
cynicalsecurity
|root
|parent
|previous
[-]
duendefm
|root
|parent
|next
[-]
LNSY
|previous
[-]
tomhow
|root
|parent
[-]
We detached this comment from https://news.ycombinator.com/item?id=48712718 and marked it off topic.