Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove hardcoded value from softmax in flat_pa (HabanaAI#280)
This PR removes the hardcoded value used to normalize softmax in flat_pa . Current approach is to use the global maximum as it is very easy to compute, but it has the drawback that other samples in a batch might slightly affect numerical stability. This is a first step to eliminated some of the INF/NaN issues we see in certain configurations and by no means this is a complete solutions. This needs to be revised in the future.
- Loading branch information