Hacker News
KVarN: Native vLLM backend for KV-cache quantization by Huawei
68 points by theanonymousone
ago
|
7 comments
throwa356262
|next
[-]
Better performance than TQ and better quality than FP16?
Am I reading this right??
v3ss0n
|next
|previous
[-]
Why this is not a PR for vLLM ?
esafak
|root
|parent
|next
[-]
It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.
edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.