Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds API for non-caching-store and load #1476

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

ranapratap55
Copy link

RCCL provides "low latency" protocols for communication between agents, where the entire message consisting of data and flags is packed into a single L2 cache line. This is usually accomplished using atomic relaxed instructions in LLVM. But the 128-byte version of this protocol (LL-128) requires 128-bit load or store instructions that bypass the cache and are not broken up into multiple instructions. The nontemporal builtin is not always suitable for this use case.

The proposed approach is to provide a C++ function template that encapsulates an inline assembly call. This asm is intended to use the appropriate load/store parameters for each combination of data size and architecture.

@ranapratap55 ranapratap55 changed the title Adds API for non-caching-store Adds API for non-caching-store and load Jan 2, 2025
@ranapratap55
Copy link
Author

ranapratap55 commented Jan 2, 2025

created a new PR with both load and store. Added test cases for byte, 2-byte, 4, 8, 16.

closed the old pr #1289

@wenkaidu
Copy link
Collaborator

wenkaidu commented Jan 2, 2025

@ranapratap55 can you clarify if the non-caching load and store can work over coarse grained device memory, which is the memory type used in your tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants