Hacker News

High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

14 points by jchandra ago | 1 comments

vivahir215 |next [-]

Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?

jchandra |root |parent [-]

[dead]

jchandra |previous [-]

[dead]