-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial for Batch Decoding and Obtaining Log Probs #81
Comments
Yes, RadixAttention can help your case a lot. We do not have this interface/tutorial ready, but I can easily make one for you. What do you need specifically? Given an input token sequence [a, b, c, d]
|
@merrymercy Thanks for the quick response
|
Do you also need the logprob for the shared prefix? |
@merrymercy Nope I don't need it for the shared prefix, only for the non-shared portions Like for example if the sentences are "Wikipedia originated in India", "Wikipedia originated in U.S.A", etc. I need it only for "India", "U.S.A" etc. |
Great! This is easier. What do you do with the logprob? Do you compute the normalized logprob for selecting purposes?
|
Yeah I use the normalized logprobs and store them for later analysis. This example looks very relevant. If I understand correctly, something like this would populate the most likely choice from the options, right?
How can I now access the logprobs as well? |
Yes, it will populate the most likely choices from the options based on the normalized logprobs (sum of the logprobs divided by the number of tokens) I am working on some examples and interface updates for you to easily get the logprobs. I will upload them very soon! |
Thank you for taking out the time! That would be really helpful! |
@aflah02 Could you try this with the main branch? Does it meet your needs? https://github.com/sgl-project/sglang/blob/main/examples/usage/choices_logprob.py Output:
|
Thanks a lot for sharing this! I need to install from source and then try this right? |
Yes |
Hi, I see that there is a parameter that can be passed by requests here to return logprobs sglang/python/sglang/srt/managers/router/model_rpc.py Lines 222 to 223 in d3fc86a
Is there a way that we could specify this from the python end with |
@Ja1Zhou It is possible. I can work on an interface for this later. Do you need the logprob of prompts, the logprob of generation, the logprob of selected tokens, or the logprob of top-5 tokens? |
Many thanks! Currently I would need logprobs of top-5 (or top-n passed as parameter) tokens for each generated token. The scenario is essentially the same as passing the An example would be the One related question would be if the regex constraint is going to affect the Thanks again for the swift reply. I would also love to look into supporting this logprobs feature! |
Great! If you are interested, please go ahead. Our bandwidth is limited so your help would be great. You can start from sglang/test/srt/test_httpserver_decode.py Lines 21 to 33 in 0147f94
|
Sorry for the delay @merrymercy |
@merrymercy in your example, it doesn't seem like the sum exp of the log probs sums to one. I've been running this locally with Mistral 7B:
With output:
Am I doing anything wrong? Ideally, I think the exp sum of binary log probs should sum to one. |
@mlinegar They are not binary log probs. It is the log prob over the whole vocab set. The meaning of this log prob is the same as the log prob defined in the OpenAI API. For any new questions. Please open a new issue. |
@aflah02 Did you notice any performance improvement vs vllm or other libraries? |
@merrymercy Yep it's a very significant speed up over vllm for my usecase :) |
We currently do not have the bandwidth to add these models. If you are interested, you can help us contribute them. Adding a new model is very easy. We use an architecture very similar to vLLM. Here are the steps to add a new model
|
Thanks! I'll take a look at this |
Hi how do i get the last_logits out? i dont need logprob for every token, but just the last one. |
As a note for anyone coming to this issue: #1495 merged two days ago adds
|
Hi
Thanks for the great library
I have a usecase which I think will benefit a lot from Radix Attention. I need to obtain log probs for around a 100K sequences which can be binned into groups of 100 having a similar prefix like 'Wikipedia originated in' and having 100 different suffixes. I do not need to generate anything and I only need the log probs for the input. Is there a tutorial for such a usecase?
The text was updated successfully, but these errors were encountered: