Remove hardcoded value from softmax in flat_pa #280

madamczykhabana · 2024-09-12T11:44:46Z

This PR removes the hardcoded value used to normalize softmax in flat_pa . Current approach is to use the global maximum as it is very easy to compute, but it has the drawback that other samples in a batch might slightly affect numerical stability.

This is a first step to eliminated some of the INF/NaN issues we see in certain configurations and by no means this is a complete solutions. This needs to be revised in the future.

kzawora-intel

looks good, do we have a noticeable throughput loss with this fix?

madamczykhabana · 2024-09-12T12:17:52Z

looks good, do we have a noticeable throughput loss with this fix?

We had without the workaround to .max() . With it there is a slight drop, but I don't have exact numbers at hand.
Also we're still waiting for mixtral results (already scheduled by @szutenberg)

szutenberg · 2024-09-12T13:53:21Z

I tested this change on mixtral and accuracy is still fine. Merging then.

This reverts commit 35a4a98.

This PR removes the hardcoded value used to normalize softmax in flat_pa . Current approach is to use the global maximum as it is very easy to compute, but it has the drawback that other samples in a batch might slightly affect numerical stability. This is a first step to eliminated some of the INF/NaN issues we see in certain configurations and by no means this is a complete solutions. This needs to be revised in the future.

madamczykhabana added 3 commits September 12, 2024 14:29

Softmax: subtract global maximum instead of hardcoded value

92cd6f6

Softmax: add WA for slow tensor.amax()

2db905e

Formatting fixes

a1898f7

madamczykhabana requested review from szutenberg and kzawora-intel September 12, 2024 11:50

kzawora-intel approved these changes Sep 12, 2024

View reviewed changes

szutenberg merged commit 35a4a98 into habana_main Sep 12, 2024
13 checks passed

szutenberg mentioned this pull request Sep 12, 2024

[Bug]: block_softmax accuracy issue in flat_pa kernel, qwen2-7B model #275

Open

xuechendi added a commit to xuechendi/vllm-fork that referenced this pull request Sep 12, 2024

Revert "Remove hardcoded value from softmax in flat_pa (HabanaAI#280)"

1f94b52

This reverts commit 35a4a98.

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove hardcoded value from softmax in flat_pa #280

Remove hardcoded value from softmax in flat_pa #280

madamczykhabana commented Sep 12, 2024

kzawora-intel left a comment

madamczykhabana commented Sep 12, 2024

szutenberg commented Sep 12, 2024

Remove hardcoded value from softmax in flat_pa #280

Remove hardcoded value from softmax in flat_pa #280

Conversation

madamczykhabana commented Sep 12, 2024

kzawora-intel left a comment

Choose a reason for hiding this comment

madamczykhabana commented Sep 12, 2024

szutenberg commented Sep 12, 2024