Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce delayed sampling mechanism #84

Merged
merged 1 commit into from
Jul 4, 2024

Conversation

lahead
Copy link

@lahead lahead commented Jul 3, 2024

This change introduces a mechanism called "delayed sampling" which aims at minimizing the CPU overhead related to output tokens post-processing and next token scheduling time by overlapping the CPU-active part with device computations.

When delayed sampling is enabled first prompt model execution schedules the model.forward() and logits computation on the device followed by immediately returning an output filled with invalid token ids, not waiting for the computation to complete. The output logits are only gathered and sampled in the subsequent model execution, which again schedules next model.forward() and logits computation invocation not waiting for the results to come back, but rather returning the previously collected and sampled output token ids. This process continues for the entire sequence length resulting in the last token which is computed redundantly being discarded.

Please review @madamczykhabana , @kzawora-intel

Copy link

@madamczykhabana madamczykhabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@madamczykhabana madamczykhabana merged commit 77e1ab8 into HabanaAI:habana_next Jul 4, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 21, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 22, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 22, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 22, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 29, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 29, 2024
tzielinski-habana pushed a commit that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants