Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance bugs in scalar reductions (#509) #543

Merged
merged 1 commit into from
Aug 17, 2022

Commits on Aug 17, 2022

  1. Fix performance bugs in scalar reductions (nv-legate#509)

    * Unify the template for device reduction tree and do some cleanup
    
    * Fix performance bugs in scalar reduction kernels:
    
    * Use unsigned 64-bit integers instead of signed integers wherever
      possible; CUDA hasn't added an atomic intrinsic for the latter yet.
    
    * Move reduction buffers from zero-copy memory to framebuffer. This
      makes the slow atomic update code path in reduction operators
      run much more efficiently.
    
    * Use thew new scalar reduction buffer in binary reductions as well
    
    * Use only the RHS type in the reduction buffer as we never call apply
    
    * Minor clean up per review
    
    * Rename the buffer class and method to make the intent explicit
    
    * Flip the polarity of reduce's template parameter
    magnatelee authored and marcinz committed Aug 17, 2022
    Configuration menu
    Copy the full SHA
    f38c829 View commit details
    Browse the repository at this point in the history