Hacker News
How an inference provider can prove they're not serving a quantized model
hleszek
|next
[-]
bthornbury
|root
|parent
|next
[-]
bthornbury
|root
|parent
|previous
[-]
whatsupdog
|root
|parent
|next
[-]
jashulma
|root
|parent
|next
[-]
measurablefunc
|root
|parent
|previous
[-]
You'll never get agreement from any major companies on your proposal b/c that would mean they'd have to provide a real SLA for all of their customers & they'll never agree to that.
wongarsu
|next
|previous
[-]
Of course conceptually attestation is neat and wastes less compute with repeated benchmarks. It definitely has its place
Aurornis
|root
|parent
[-]
The last one I bookmarked has already disappeared. I think they’re generally vibe coded by developers who think they’re going to prove something but then realize it’s expensive to spend that money on tokens every day.
They also use limited subsets of big benchmarks because to keep costs down, which increases the noise of the results. The last time someone linked to one of the sites claiming a decline in quality looked like a noisy mostly flat graph that someone had put a regression line on that was very slightly sloping downward.
viraptor
|next
|previous
[-]
You'd need to run a full, public system image with known attestation keys and return some kind of signed response with every request to do that. Which is not impossible, but the remote part seems to be completely missing from the description.
FrasiertheLion
|root
|parent
[-]
Sorry it wasn’t clear from the post!
arboles
|root
|parent
[-]
viraptor
|root
|parent
|next
[-]
That means you can't simulate the hardware in a way that would allow you to cheat (the keys/method won't match). And you can't replace the software part (the measurements won't match).
It all depends on the third party and the hardware keys not leaking, but at long as you can review the software part, you can be sure the validation of the value sent with the response is enough.
FrasiertheLion
|root
|parent
|next
|previous
[-]
1. An HPKE (https://www.rfc-editor.org/rfc/rfc9180.html ) key is generated. This is the key that encrypts communication to the model.
2. The enclave is provisioned a certificate
The certificate is embedded with the HPKE key accessible only inside the enclave. The code for all this is open source and part of the measurement that is being checked against by the client.
So if the provider attempts to send a different attestation or even route to a different enclave, this client side check would fail.
julesdrean
|root
|parent
|previous
[-]
bthornbury
|next
|previous
[-]
arcanemachiner
|next
|previous
[-]
EDUT: I read through the article, and it's a little over my head, but I'm intrigued. Does this actually work?
cmrx64
|next
|previous
[-]
rhodey
|next
|previous
[-]
Two comments so far suggesting otherwise and I guess idk what their deal is
Attestation is taking off
LoganDark
|next
|previous
[-]
rhodey
|root
|parent
|next
[-]
Clients who want to talk to a service which has attestation send a nonce, and get back a doc with the nonce in it, and the clients have somewhere in them a hard coded certificate from Intel, AMD, AWS and they check that the doc has a good sig.
LoganDark
|root
|parent
[-]
In a real attestation scheme you would do something like have the attesting device generate a hardware-backed key to be used for communications to and from it, to ensure it is not possible to use an attestation of one device to authenticate any other device or a man-in-the-middle. Usually for these devices you can verify the integrity of the hardware-backed key as well. Of course all of this is moot though if you can trick an authorized device into signing or encrypting/decrypting anything attacker-provided, which is where many systems fail.
FrasiertheLion
|root
|parent
|previous
[-]
1. The provider open sources the code running in the enclave and pins the measurement to a transparency log such as Sigstore
2. On each connection, the client SDK fetches the measurement of the code actually running (through a process known as remote attestation)
3. The client checks that the measurement that the provider claimed to be running exactly matches the one fetched at runtime.
We explain this more in a previous blog: https://tinfoil.sh/blog/2025-01-13-how-tinfoil-builds-trust
LoganDark
|root
|parent
[-]
Edit: I found https://github.com/tinfoilsh/cvmimage which says AMD SEV-SNP / Intel TDX, which seems almost trustworthy.
FrasiertheLion
|root
|parent
[-]
LoganDark
|root
|parent
[-]
julesdrean
|root
|parent
[-]
Good question is how many lines of code do you need to trust at the end of the day between these different designs.
jMyles
|next
|previous
[-]
FrasiertheLion
|root
|parent
[-]
exceptione
|next
|previous
[-]
I am ignorant about this ecosystem, so I might be missing something obvious.
FrasiertheLion
|root
|parent
[-]
At runtime, the client SDK (also open source: https://docs.tinfoil.sh/sdk/overview) fetches the pinned measurement from Sigstore, and compares it to the attestation from the running enclave, and checks that they’re equal. This previous blog explains it in more detail: https://tinfoil.sh/blog/2025-01-13-how-tinfoil-builds-trust