[FEA] Improve GDS spill performance on NVIDIA EGX #2592
Labels
epic
Issue that encompasses a significant feature or body of work
performance
A performance related task/issue
Is your feature request related to a problem? Please describe.
GDS spill does not perform as well on EGX clusters as compared to DGX. There are a couple of potential causes tied to hardware configuration:
Describe the solution you'd like
One possibility is to combine GDS spill with host memory. Device buffers can be spilled to host memory first, and when the host memory spill storage limit is reached, then spill through GDS to disk.
Describe alternatives you've considered
We can try to continue improve GDS spill performance, but in a head to head comparison GDS is probably never going to be faster than host memory.
Additional context
Current spilling logic is mostly in
RapidsHostMemoryStore
andRapidsGdsStore
.The text was updated successfully, but these errors were encountered: