support CUDA async memory resource in JNI #9201

rongou · 2021-09-09T02:52:34Z

CUDA 11.2 introduced stream ordered memory allocator that can potentially resolve memory fragmentation issues. See https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/

jlowe

It would be nice to have at least a smoke test of the new allocator type in RmmTest that sets up the allocator, allocates and frees memory to exercise it. Bonus points if it also sets up the allocator with a small limit and verifies it gets an OOM if it tries to allocate just beyond that size.

java/src/main/java/ai/rapids/cudf/Rmm.java

java/src/main/native/src/RmmJni.cpp

rongou

Added smoke test, which will be skipped if cuda < 11.2.

java/src/main/java/ai/rapids/cudf/Rmm.java

java/src/main/native/src/RmmJni.cpp

abellina · 2021-09-09T21:52:19Z

I just filed another pretty much exact PR to this one: #9208. The main difference is that I am wrapping the async allocator with limiting_resource_adaptor.

I did not know @rongou's PR was up, for some reason. I closed mine in favor of @rongou's PR, since folks already spent time reviewing his.

java/src/main/native/src/RmmJni.cpp

java/src/test/java/ai/rapids/cudf/RmmTest.java

jlowe · 2021-09-10T13:08:47Z

Note that we now support Java in the CI, so Java PRs should not skip ci.

jlowe · 2021-09-10T13:09:06Z

rerun tests

rongou · 2021-09-10T16:37:09Z

rerun tests

rongou · 2021-09-10T19:02:41Z

rerun tests

jlowe · 2021-09-13T13:10:14Z

rerun tests

jrhemstad · 2021-09-13T14:10:13Z

fyi, you could also experiment with using cuda_async_resource as the upstream for arena. That might give you best of both worlds of what you're looking for.

codecov · 2021-09-13T14:35:53Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@4349232). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 6e8cf91 differs from pull request most recent head 5d91944. Consider uploading reports for the commit 5d91944 to get more accurate results

@@               Coverage Diff               @@
##             branch-21.10    #9201   +/-   ##
===============================================
  Coverage                ?   10.82%           
===============================================
  Files                   ?      115           
  Lines                   ?    19166           
  Branches                ?        0           
===============================================
  Hits                    ?     2074           
  Misses                  ?    17092           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4349232...5d91944. Read the comment docs.

rongou · 2021-09-13T14:58:56Z

@gpucibot merge

rongou · 2021-09-13T15:01:05Z

fyi, you could also experiment with using cuda_async_resource as the upstream for arena. That might give you best of both worlds of what you're looking for.

@jrhemstad Yeah that's something we can try if it turns out small allocations are too expensive with async.

abellina · 2021-09-13T15:47:50Z

@jrhemstad filed this: rapidsai/rmm#868, we need to fix this before we start using the async allocator. He thought it was a quick fix, and that it could be included in 21.10. FYI @sameerz

jlowe · 2021-09-13T19:03:07Z

you could also experiment with using cuda_async_resource as the upstream for arena

It seems that circumvents the fragmentation-solving feature we want from the async allocator. If arena only allocates large chunks from the async allocator, won't we still have fragmentation within the arena blocks that the async allocator cannot solve since the async allocator will be unaware of the sub-utilization of the allocations it sees?

rongou · 2021-09-14T17:03:56Z

The per-thread arenas are just caches for small allocations. If cuda async proves to be slow for small allocations, we can use the arena allocator to speed up these, as in a typical job there are tons of small allocations. The number of free blocks are now capped in each per-thread arena, so in theory it shouldn't cause too much additional fragmentation. If/when we decide to try this, we can probably further tweak the algorithm.

support cuda async memory resource in jni

0531784

rongou added feature request New feature or request 3 - Ready for Review Ready for review by team RMM Performance Performance related issue Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Sep 9, 2021

rongou requested review from jlowe and abellina September 9, 2021 02:52

rongou self-assigned this Sep 9, 2021

rongou requested a review from a team as a code owner September 9, 2021 02:52

jlowe reviewed Sep 9, 2021

View reviewed changes

java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/branch-21.10' into cuda-async-mr

445656f

review feedback

75ac47b

rongou commented Sep 9, 2021

View reviewed changes

java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved

abellina mentioned this pull request Sep 9, 2021

Add access to cuda_async_memory_resource #9208

Closed

abellina requested changes Sep 9, 2021

View reviewed changes

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved

java/src/test/java/ai/rapids/cudf/RmmTest.java Outdated Show resolved Hide resolved

rongou added 2 commits September 9, 2021 17:17

add hard limit

989a43b

Merge remote-tracking branch 'upstream/branch-21.10' into cuda-async-mr

5d91944

abellina approved these changes Sep 10, 2021

View reviewed changes

jlowe approved these changes Sep 10, 2021

View reviewed changes

jlowe changed the title ~~support cuda async memory resource in jni [skip ci]~~ support CUDA async memory resource in JNI Sep 10, 2021

rongou mentioned this pull request Sep 10, 2021

Add CUDA async memory resource as an option NVIDIA/spark-rapids#3447

Merged

rapids-bot bot merged commit c6ddd46 into rapidsai:branch-21.10 Sep 13, 2021

abellina mentioned this pull request Sep 22, 2021

[FEA] [Java] Add a way to allocate via cudaMalloc for device memory buffers #9270

Closed

rongou deleted the cuda-async-mr branch November 23, 2021 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support CUDA async memory resource in JNI #9201

support CUDA async memory resource in JNI #9201

rongou commented Sep 9, 2021

jlowe left a comment

rongou left a comment

abellina commented Sep 9, 2021 •

edited

Loading

jlowe commented Sep 10, 2021

jlowe commented Sep 10, 2021

rongou commented Sep 10, 2021

rongou commented Sep 10, 2021

jlowe commented Sep 13, 2021

jrhemstad commented Sep 13, 2021

codecov bot commented Sep 13, 2021 •

edited

Loading

rongou commented Sep 13, 2021

rongou commented Sep 13, 2021

abellina commented Sep 13, 2021 •

edited

Loading

jlowe commented Sep 13, 2021

rongou commented Sep 14, 2021

support CUDA async memory resource in JNI #9201

support CUDA async memory resource in JNI #9201

Conversation

rongou commented Sep 9, 2021

jlowe left a comment

Choose a reason for hiding this comment

rongou left a comment

Choose a reason for hiding this comment

abellina commented Sep 9, 2021 • edited Loading

jlowe commented Sep 10, 2021

jlowe commented Sep 10, 2021

rongou commented Sep 10, 2021

rongou commented Sep 10, 2021

jlowe commented Sep 13, 2021

jrhemstad commented Sep 13, 2021

codecov bot commented Sep 13, 2021 • edited Loading

Codecov Report

rongou commented Sep 13, 2021

rongou commented Sep 13, 2021

abellina commented Sep 13, 2021 • edited Loading

jlowe commented Sep 13, 2021

rongou commented Sep 14, 2021

abellina commented Sep 9, 2021 •

edited

Loading

codecov bot commented Sep 13, 2021 •

edited

Loading

abellina commented Sep 13, 2021 •

edited

Loading