Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Need a Python interface for accessing the C++ cuda_async_view_memory_resource #1611

Open
lilohuang opened this issue Jul 15, 2024 · 2 comments
Assignees
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@lilohuang
Copy link
Contributor

lilohuang commented Jul 15, 2024

Hi @leofang and all,

Currently, the librmm library provides a CudaAsyncMemoryResource Python interface (https://docs.rapids.ai/api/rmm/stable/python_api/#rmm.mr.CudaAsyncMemoryResource), which lets users access the C++ cuda_async_memory_resource class. However, there is no equivalent Python interface for accessing the C++ cuda_async_view_memory_resource class.

import rmm
import cupy
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.mr.set_current_device_resource(rmm.mr.CudaAsyncMemoryResource(1024, 1024))
cupy.cuda.set_allocator(rmm_cupy_allocator)

The CudaAsyncMemoryResource always requires creating a new memory pool through cudaMemPoolCreate (https://github.com/rapidsai/rmm/blob/branch-24.08/include/rmm/mr/device/cuda_async_memory_resource.hpp#L107), which is problematic when integrating librmm and cuDF usage into an existing GPU-accelerated application.

An existing GPU-accelerated application needs to use the default CUDA memory pool for stream-ordered memory allocation and deallocation through cudaMallocAsync and cudaFreeAsync. If the librmm library always creates a separate pool, the memory pool allocated by librmm can only be used by itself or by cuPy and cuDF, causing some GPU memory resource wastage.

We hope that librmm can provide a CudaAsyncViewMemoryResource Python interface, similar to the one shown below, to access the C++ cuda_async_view_memory_resource class. This would allow us to pass the default memory pool handle (obtained from cudaDeviceGetDefaultMemPool) to librmm:

import rmm
import cupy
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.mr.set_current_device_resource(rmm.mr.CudaAsyncViewMemoryResource(pool_handle))
cupy.cuda.set_allocator(rmm_cupy_allocator)

Alternatively, introduce rmm.mr.CudaAsyncDefaultMemoryResource(), which automatically obtains the pool handle from cudaDeviceGetDefaultMemPool:

import rmm
import cupy
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.mr.set_current_device_resource(rmm.mr.CudaAsyncDefaultMemoryResource())
cupy.cuda.set_allocator(rmm_cupy_allocator)

Thanks,
Lilo

@lilohuang lilohuang added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jul 15, 2024
@wence-
Copy link
Contributor

wence- commented Jul 17, 2024

Thanks, this should be quite doable. We need to think about what type the pool_handle should have in python, probably a cuda-python cudart.cudaMemPool_t?

@lilohuang
Copy link
Contributor Author

@wence- The cuda-python cudart.cudaMemPool_t appears to be a viable option for me. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
Status: Todo
Development

No branches or pull requests

3 participants