Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hardcoded value from softmax in flat_pa #280

Merged
merged 3 commits into from
Sep 12, 2024

Conversation

madamczykhabana
Copy link

This PR removes the hardcoded value used to normalize softmax in flat_pa . Current approach is to use the global maximum as it is very easy to compute, but it has the drawback that other samples in a batch might slightly affect numerical stability.

This is a first step to eliminated some of the INF/NaN issues we see in certain configurations and by no means this is a complete solutions. This needs to be revised in the future.

Copy link

@kzawora-intel kzawora-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, do we have a noticeable throughput loss with this fix?

@madamczykhabana
Copy link
Author

looks good, do we have a noticeable throughput loss with this fix?

We had without the workaround to .max() . With it there is a slight drop, but I don't have exact numbers at hand.
Also we're still waiting for mixtral results (already scheduled by @szutenberg)

@szutenberg
Copy link

I tested this change on mixtral and accuracy is still fine. Merging then.

@szutenberg szutenberg merged commit 35a4a98 into habana_main Sep 12, 2024
13 checks passed
xuechendi added a commit to xuechendi/vllm-fork that referenced this pull request Sep 12, 2024
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 13, 2024
This PR removes the hardcoded value used to normalize softmax in flat_pa
. Current approach is to use the global maximum as it is very easy to
compute, but it has the drawback that other samples in a batch might
slightly affect numerical stability.

This is a first step to eliminated some of the INF/NaN issues we see in
certain configurations and by no means this is a complete solutions.
This needs to be revised in the future.
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024
This PR removes the hardcoded value used to normalize softmax in flat_pa
. Current approach is to use the global maximum as it is very easy to
compute, but it has the drawback that other samples in a batch might
slightly affect numerical stability.

This is a first step to eliminated some of the INF/NaN issues we see in
certain configurations and by no means this is a complete solutions.
This needs to be revised in the future.
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants