Astro Hacker News - Knowledge Distillation of Black-Box Large Language Models (2024)

Alifatisk |next [-]

Why is this published again? Is this a reference to recent events?

babelfish |root |parent [-]

I just saw some post about it on Threads and found it interesting so decided to share!

tough |root |parent [-]

My best guess is this is a reference to the recent accusations from Anthropic of chinese labs ¨distilling¨ on their models

swingboy |root |parent [-]

And it’s a paper from Alibaba researchers, the company/lab that Anthropic called out by name.

adrian_b |root |parent [-]

I do not find the Anthropic allegations believable.

All the results presented in these distillation papers are for very small models.

In order to gain anything, Alibaba or others would need today to use the Anthropic models to improve LLMs at least one hundred times bigger than those tested in these papers.

I assume that the number of queries to the teacher LLM grows superlinearly with the size of the student model, which would mean that billions of queries would be needed. Even for a linear growth, at least hundreds of millions of queries would be needed.

I do not see how any Claude account could do so many queries without being detected. Even if the queries would be distributed over thousands of accounts, it would still be easy for Anthropic to stop any such attempts.

dmezzetti |next |previous [-]

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Related paper that's a good read: https://arxiv.org/abs/1908.08962

phantompeace |next |previous [-]

Considering the very small difference between just SFT on the student model as compared to SFT + DPO on a proxy, doesn't it make sense to concentrate on ensuring the SFT dataset is perfect rather than sorry about DPO etc? And just train directly on the student model?

|next |previous [-]

potus_kushner |next |previous [-]

probably more interesting (from 01/2026) https://arxiv.org/pdf/2511.10643 "Black-Box On-Policy Distillation of Large Language Models". they got a qwen 2.5 14B model trained to GPT5 level using the described technique "Generative Adversarial Distillation (GAD)".

StreamCtx |next |previous [-]

“Relevant to anyone building failure-attribution systems for agent pipelines — black-box distillation techniques here could feed into causal attribution models without needing white-box access to the underlying model.”

adrian_b |root |parent [-]

That is easy when you can control the teacher model yourself and you want to transfer its capabilities to a smaller model.

If the teacher model is run by an external entity, e.g. Anthropic or OpenAI, then the number of queries to the blackbox model that is required is so great that it should be easy for the owner of the teacher LLM to detect and stop any such attempts.

duendefm |next |previous [-]

The Chinese are really going strong on destroying the American AI economy bubble. Honestly, despite the fact that I'm totally pro USA and anti China, I think we should help them crashing the American AI bubble. They are controlling everything and we can't even buy a new computer nowadays while getting no benefit from this. I wish some influential programmers stimulated coders everywhere to skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

Roark66 |root |parent |next [-]

If we programmers united we had a clouded code alternative that didn't suck :-)

But we're not far.

My requirements: - a terminal app without advanced tui, not written like "a browser running in a terminal" or a game. There is no need to overcomplicated. - ability to manage prompts per model, compress context using alternate models, and minimise token costs better - like the YouTube's Sentdex's Minion mini harness (in fact I'm building on top of his as we speak). - support for agent work fanout - support for MCP, but switchable off/on depending if needed (I use a single MCP aggregator anyway so mcp tool use doesn't eat my context) - support for lsp/tree-sitter, again switchable when needed. - support for OpenAI api and written easily enough so other ones like deepinfra are easy to add.

Nice to have: - have some sort "prompt library" that would store tweaked versions of prompts for different models so it adjusted the harness as needed depending on which model we call.

That's it.

laichzeit0 |root |parent |next |previous [-]

The US government will do the job of destroying the American AI economy through their export controls.

anax32 |root |parent |next |previous [-]

The US "product machine" is so strong. They really know how to do frictionless signup and vendor lock-in on the corporate side.

addedGone |root |parent |next |previous [-]

"anti China", why so? have you lived there?

nozzlegear |root |parent |next |previous [-]

> skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

I'm doing my part!

cynicalsecurity |root |parent |previous [-]

[flagged]

duendefm |root |parent |next [-]

Nvidia, Anthropic and OpenAI are controlling everything, and nothing is improving for everyone, quite the opposite. So I just hope they crash to the ground.

girvo |root |parent |next |previous [-]

Why would I care about Christian morals? In fact from what I can see of the US, you don’t have them either.

anon373839 |root |parent |next |previous [-]

[flagged]

|root |parent |next |previous [-]

gmerc |root |parent |previous [-]

lol Christian Morals. Epstein and his best buddy running the show tells you all about this

big-and-small |root |parent [-]

[flagged]

linolevan |next |previous [-]

Can we note that this is a 2024 paper in the title?

spacebacon |next |previous [-]

[dead]

TimXare |next |previous [-]

[dead]

|next |previous [-]

modgate |next |previous [-]

[flagged]

LNSY |previous [-]

[flagged]

tomhow |root |parent [-]

Please don't post like this on HN. The guidelines make it clear we're trying for something better here. https://news.ycombinator.com/newsguidelines.html.

We detached this comment from https://news.ycombinator.com/item?id=48712718 and marked it off topic.