-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates after review #3
Updates after review #3
Conversation
6529ee0
to
11d9b36
Compare
…s info, fix logprobs_random
No, here we are separately doing greedy and random sampling. Only the latter is relevant for top p/k.
Hmm interesting, I never thought about sampling a token individually for different sequences after prefill, i.e. the first tokens in each of |
Hello @masahi! Thank you for quick response!
The following is still not clear for me:
As I see you are right: "the first tokens in each of n generations are the same". For me it is strange why we start to randomize tokens from the second ones. And yes it is harder to implement and may be not priority. Who also can discuss and think about it? |
ah yes, you are right.
should be
I will fix it ASAP |
@sunggg What do you think? Currently in parallel sampling, the first tokens in each generation are the same since we just generate one token after prefill, which is then copied into each of generation: |
Formally, there should be different independent sampling n times. And we can do this quite lite way handling exactly n samples for prefill, main logic might not be changed, but since it is only one token, the priority is not high |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the timely PR to address the comments. LGTM except the condition for logprob sampling.
Thanks @vvchernov, I'll merge this PR as it's quite mature and let's continue the discussion thread to the original PR octoml#82. |
4c56eac
into
zxybazh:feature/2023-11-22/enable-mlc-server-logprobs
Interesting. I think both makes sense. What is OpenAI or vllm's behavior? Can we match the behavior with them as both approaches sound reasonable? |
I don't know about OpenAI but vLLM samples |
Updates after Masa's review #82 and some fixes:
Note: I have some doubts:
It seems that
logits_random
should be instead oflogits
2. For parameter
n
> 1, only one token generated after prefill step, why are not n tokens?cc @zxybazh, @masahi