-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbreak zero-temperature sampling #599
Conversation
@@ -155,10 +155,9 @@ def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = Non | |||
def sample( | |||
logits, need_probs: bool, temperature: float = 1.0, top_k: Optional[int] = None | |||
): | |||
# if temperature == 0 and not need_probs: | |||
# _, idx_next = torch.topk(logits, k=1, dim=-1) | |||
# idx_next = idx_next.squeeze(dim=(0, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noob question why this line is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the original version only worked for one token at a time mode. So, I think it just used squeeze to get rid of some dimensions. When I enabled parallel-prefill mode to address performance issues with eager / torch.compile mode, this added another non-1 dimension, and things started to break.
By using the logits [0,1] below, Scott selects the logits for just the one token (rather than using squeeze to wipe out the dimensionality).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so very much!
@@ -155,10 +155,9 @@ def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = Non | |||
def sample( | |||
logits, need_probs: bool, temperature: float = 1.0, top_k: Optional[int] = None | |||
): | |||
# if temperature == 0 and not need_probs: | |||
# _, idx_next = torch.topk(logits, k=1, dim=-1) | |||
# idx_next = idx_next.squeeze(dim=(0, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the original version only worked for one token at a time mode. So, I think it just used squeeze to get rid of some dimensions. When I enabled parallel-prefill mode to address performance issues with eager / torch.compile mode, this added another non-1 dimension, and things started to break.
By using the logits [0,1] below, Scott selects the logits for just the one token (rather than using squeeze to wipe out the dimensionality).
2cc047f
to
5cdd3a5
Compare
Looks like we have some failing tests here. @swolchok are you able to rebase to latest main? |
I rebased a couple times previously and decided to just wait until somebody told me trunk was actually green. I'll try again |
Fixes #581.
5cdd3a5
to
5e96e85
Compare
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
Fixes #581.
Tested by running
python3 torchchat.py generate stories15M --device cpu --prompt "Hello my name is" --temperature 0
; used to crash after uncommenting and now succeeds. Let me know if there's anything else I need to check.