Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubts about KV Cache and Obtaining LogProbs #2549

Closed
aflah02 opened this issue Jan 22, 2024 · 3 comments
Closed

Doubts about KV Cache and Obtaining LogProbs #2549

aflah02 opened this issue Jan 22, 2024 · 3 comments

Comments

@aflah02
Copy link

aflah02 commented Jan 22, 2024

Hi
Thanks for the great library!
I need to run inference on a ton of sequences and get their log probabilities. I have approximately 100K sequences which can be binned into groups of 100 which share a significant amount of common prefix. For example, I have 100 sequences which start with 'Wikipedia was built in' and have different suffixes.

Does the library automatically figure out the optimal KV Cache? or can I specify it somehow?

If I build batches of say size 200 where one batch might have 100 sequences starting with 'Wikipedia was built in' and another 100 starting with 'Google was built in', will the vLLM engine automatically optimize the KV cache to reuse the computation done for the prefix?

Since I only need the Log Probs and I don't really need the next generated token, I've set max tokens to be generated as 1 but can I somehow eliminate the generation process and only get the log probs?

@aflah02 aflah02 closed this as completed Jan 22, 2024
@linbeyoung
Copy link

same quesiton

@aflah02
Copy link
Author

aflah02 commented Feb 4, 2024

@linbeyoung This might be relevant for you: sgl-project/sglang#81
I don't know how this issue got closed as I still have some of these questions. Reopening it

@aflah02 aflah02 reopened this Feb 4, 2024
@hmellor
Copy link
Collaborator

hmellor commented Apr 4, 2024

This sounds like an issue that's solved with prefix caching #2614

@hmellor hmellor closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants