Introduce delayed sampling mechanism #84

lahead · 2024-07-03T09:06:30Z

This change introduces a mechanism called "delayed sampling" which aims at minimizing the CPU overhead related to output tokens post-processing and next token scheduling time by overlapping the CPU-active part with device computations.

When delayed sampling is enabled first prompt model execution schedules the model.forward() and logits computation on the device followed by immediately returning an output filled with invalid token ids, not waiting for the computation to complete. The output logits are only gathered and sampled in the subsequent model execution, which again schedules next model.forward() and logits computation invocation not waiting for the results to come back, but rather returning the previously collected and sampled output token ids. This process continues for the entire sequence length resulting in the last token which is computed redundantly being discarded.

Please review @madamczykhabana , @kzawora-intel

madamczykhabana

LGTM

Co-authored-by: Krzysztof Laskowski <[email protected]>

Introduce delayed sampling mechanism

34d8a08

madamczykhabana approved these changes Jul 4, 2024

View reviewed changes

madamczykhabana merged commit 77e1ab8 into HabanaAI:habana_next Jul 4, 2024

tzielinski-habana pushed a commit that referenced this pull request Aug 21, 2024

Introduce delayed sampling mechanism (#84)

cf0057f

Co-authored-by: Krzysztof Laskowski <[email protected]>

tzielinski-habana pushed a commit that referenced this pull request Aug 22, 2024

Introduce delayed sampling mechanism (#84)

d268c7e

Co-authored-by: Krzysztof Laskowski <[email protected]>

tzielinski-habana pushed a commit that referenced this pull request Aug 22, 2024

Introduce delayed sampling mechanism (#84)

03f9c0f

Co-authored-by: Krzysztof Laskowski <[email protected]>

tzielinski-habana pushed a commit that referenced this pull request Aug 22, 2024

Introduce delayed sampling mechanism (#84)

0465fb5

Co-authored-by: Krzysztof Laskowski <[email protected]>

tzielinski-habana pushed a commit that referenced this pull request Aug 29, 2024

Introduce delayed sampling mechanism (#84)

a481237

Co-authored-by: Krzysztof Laskowski <[email protected]>

tzielinski-habana pushed a commit that referenced this pull request Aug 29, 2024

Introduce delayed sampling mechanism (#84)

23c8e20

Co-authored-by: Krzysztof Laskowski <[email protected]>

tzielinski-habana pushed a commit that referenced this pull request Aug 29, 2024

Introduce delayed sampling mechanism (#84)

981a3f0

Co-authored-by: Krzysztof Laskowski <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce delayed sampling mechanism #84

Introduce delayed sampling mechanism #84

lahead commented Jul 3, 2024

madamczykhabana left a comment

Introduce delayed sampling mechanism #84

Introduce delayed sampling mechanism #84

Conversation

lahead commented Jul 3, 2024

madamczykhabana left a comment

Choose a reason for hiding this comment