Unbreak zero-temperature sampling #599

swolchok · 2024-04-30T22:02:37Z

Fixes #581.

Tested by running python3 torchchat.py generate stories15M --device cpu --prompt "Hello my name is" --temperature 0; used to crash after uncommenting and now succeeds. Let me know if there's anything else I need to check.

larryliu0820 · 2024-04-30T22:28:41Z

generate.py

@@ -155,10 +155,9 @@ def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = Non
 def sample(
    logits, need_probs: bool, temperature: float = 1.0, top_k: Optional[int] = None
 ):
-    # if temperature == 0 and not need_probs:
-    #     _, idx_next = torch.topk(logits, k=1, dim=-1)
-    #     idx_next = idx_next.squeeze(dim=(0, 1))


Noob question why this line is not needed?

So, the original version only worked for one token at a time mode. So, I think it just used squeeze to get rid of some dimensions. When I enabled parallel-prefill mode to address performance issues with eager / torch.compile mode, this added another non-1 dimension, and things started to break.

By using the logits [0,1] below, Scott selects the logits for just the one token (rather than using squeeze to wipe out the dimensionality).

mikekgfb

Thank you so very much!

mikekgfb · 2024-04-30T22:52:04Z

generate.py

@@ -155,10 +155,9 @@ def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = Non
 def sample(
    logits, need_probs: bool, temperature: float = 1.0, top_k: Optional[int] = None
 ):
-    # if temperature == 0 and not need_probs:
-    #     _, idx_next = torch.topk(logits, k=1, dim=-1)
-    #     idx_next = idx_next.squeeze(dim=(0, 1))


So, the original version only worked for one token at a time mode. So, I think it just used squeeze to get rid of some dimensions. When I enabled parallel-prefill mode to address performance issues with eager / torch.compile mode, this added another non-1 dimension, and things started to break.

By using the logits [0,1] below, Scott selects the logits for just the one token (rather than using squeeze to wipe out the dimensionality).

orionr · 2024-05-02T17:43:17Z

Looks like we have some failing tests here. @swolchok are you able to rebase to latest main?

swolchok · 2024-05-02T19:30:34Z

Looks like we have some failing tests here. @swolchok are you able to rebase to latest main?

I rebased a couple times previously and decided to just wait until somebody told me trunk was actually green. I'll try again

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fixes #581.

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 30, 2024

swolchok requested review from mikekgfb, mergennachin and malfet and removed request for mergennachin and malfet April 30, 2024 22:03

larryliu0820 reviewed Apr 30, 2024

View reviewed changes

mikekgfb approved these changes Apr 30, 2024

View reviewed changes

swolchok force-pushed the unbreak-zero-temp-sampling branch 2 times, most recently from 2cc047f to 5cdd3a5 Compare May 1, 2024 17:20

Unbreak zero-temperature sampling

5e96e85

Fixes #581.

swolchok force-pushed the unbreak-zero-temp-sampling branch from 5cdd3a5 to 5e96e85 Compare May 2, 2024 19:31

swolchok merged commit cd85542 into pytorch:main May 2, 2024
32 checks passed

swolchok deleted the unbreak-zero-temp-sampling branch May 2, 2024 21:24

mikekgfb pushed a commit that referenced this pull request May 4, 2024

Unbreak zero-temperature sampling (#599)

9332f5a

Fixes #581.

malfet pushed a commit that referenced this pull request Jul 17, 2024

Unbreak zero-temperature sampling (#599)

e68068b

Fixes #581.

malfet pushed a commit that referenced this pull request Jul 17, 2024

Unbreak zero-temperature sampling (#599)

2d71efd

Fixes #581.

malfet pushed a commit that referenced this pull request Jul 17, 2024

Unbreak zero-temperature sampling (#599)

6dac41b

Fixes #581.

malfet pushed a commit that referenced this pull request Jul 17, 2024

Unbreak zero-temperature sampling (#599)

cb5f947

Fixes #581.

malfet pushed a commit that referenced this pull request Jul 17, 2024

Unbreak zero-temperature sampling (#599)

86c03e7

Fixes #581.

malfet pushed a commit that referenced this pull request Jul 17, 2024

Unbreak zero-temperature sampling (#599)

f79420d

Fixes #581.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbreak zero-temperature sampling #599

Unbreak zero-temperature sampling #599

swolchok commented Apr 30, 2024

larryliu0820 Apr 30, 2024

mikekgfb Apr 30, 2024

mikekgfb left a comment

mikekgfb Apr 30, 2024

orionr commented May 2, 2024

swolchok commented May 2, 2024

Unbreak zero-temperature sampling #599

Unbreak zero-temperature sampling #599

Conversation

swolchok commented Apr 30, 2024

larryliu0820 Apr 30, 2024

Choose a reason for hiding this comment

mikekgfb Apr 30, 2024

Choose a reason for hiding this comment

mikekgfb left a comment

Choose a reason for hiding this comment

mikekgfb Apr 30, 2024

Choose a reason for hiding this comment

orionr commented May 2, 2024

swolchok commented May 2, 2024