Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ROCm] torch.sum optimization by increasing min_values_per_thread (#1591
) Follow-up to pytorch#135397. AMD gpus perform better with fewer thread blocks. So increase the min_values_per_thread as well. This helped improved [CvT](https://github.com/facebookresearch/FAMBench/tree/main/benchmarks/cvt) benchmark performance on MI300X Co-author: @carlobertolli
- Loading branch information