Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbreak zero-temperature sampling #599

Merged
merged 1 commit into from
May 2, 2024

Conversation

swolchok
Copy link
Contributor

Fixes #581.

Tested by running python3 torchchat.py generate stories15M --device cpu --prompt "Hello my name is" --temperature 0; used to crash after uncommenting and now succeeds. Let me know if there's anything else I need to check.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 30, 2024
@swolchok swolchok requested review from mikekgfb, mergennachin and malfet and removed request for mergennachin and malfet April 30, 2024 22:03
@@ -155,10 +155,9 @@ def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = Non
def sample(
logits, need_probs: bool, temperature: float = 1.0, top_k: Optional[int] = None
):
# if temperature == 0 and not need_probs:
# _, idx_next = torch.topk(logits, k=1, dim=-1)
# idx_next = idx_next.squeeze(dim=(0, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question why this line is not needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the original version only worked for one token at a time mode. So, I think it just used squeeze to get rid of some dimensions. When I enabled parallel-prefill mode to address performance issues with eager / torch.compile mode, this added another non-1 dimension, and things started to break.

By using the logits [0,1] below, Scott selects the logits for just the one token (rather than using squeeze to wipe out the dimensionality).

Copy link
Contributor

@mikekgfb mikekgfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so very much!

@@ -155,10 +155,9 @@ def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = Non
def sample(
logits, need_probs: bool, temperature: float = 1.0, top_k: Optional[int] = None
):
# if temperature == 0 and not need_probs:
# _, idx_next = torch.topk(logits, k=1, dim=-1)
# idx_next = idx_next.squeeze(dim=(0, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the original version only worked for one token at a time mode. So, I think it just used squeeze to get rid of some dimensions. When I enabled parallel-prefill mode to address performance issues with eager / torch.compile mode, this added another non-1 dimension, and things started to break.

By using the logits [0,1] below, Scott selects the logits for just the one token (rather than using squeeze to wipe out the dimensionality).

@swolchok swolchok force-pushed the unbreak-zero-temp-sampling branch 2 times, most recently from 2cc047f to 5cdd3a5 Compare May 1, 2024 17:20
@orionr
Copy link
Contributor

orionr commented May 2, 2024

Looks like we have some failing tests here. @swolchok are you able to rebase to latest main?

@swolchok
Copy link
Contributor Author

swolchok commented May 2, 2024

Looks like we have some failing tests here. @swolchok are you able to rebase to latest main?

I rebased a couple times previously and decided to just wait until somebody told me trunk was actually green. I'll try again

@swolchok swolchok force-pushed the unbreak-zero-temp-sampling branch from 5cdd3a5 to 5e96e85 Compare May 2, 2024 19:31
@swolchok swolchok merged commit cd85542 into pytorch:main May 2, 2024
32 checks passed
@swolchok swolchok deleted the unbreak-zero-temp-sampling branch May 2, 2024 21:24
mikekgfb pushed a commit that referenced this pull request May 4, 2024
mikekgfb added a commit that referenced this pull request May 4, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (#599)

Fixes #581.

* Improve process README

* [retake] Add sentencepiece tokenizer (#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PRE-LAUNCH] temperature == 0 special handling
5 participants