Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize C interface for stream exchange #65

Open
kkraus14 opened this issue Mar 17, 2021 · 4 comments
Open

Standardize C interface for stream exchange #65

kkraus14 opened this issue Mar 17, 2021 · 4 comments

Comments

@kkraus14
Copy link

Following up on #57 where we figured out the correct stream exchange and synchronization semantics and a Python interface for doing so, we need to do the same for a C interface.

TLDR from #57:

  • Frameworks aren't necessarily equipped to share streams with lifetime controls outside of their own internal execution, so adding a stream member to the dlpack struct won't suffice
  • Consumer can optionally provide a stream to the producer when asking for the dlpack object
  • Producer is responsible for guaranteeing that the data in the dlpack object they produce is safe to operate on in the stream provided by the Consumer

cc @tqchen @harrism @jrhemstad @rgommers @leofang @oleksandr-pavlyk @szha @veritas9872

@leofang
Copy link
Contributor

leofang commented Oct 2, 2022

During the 20220930 ABI change meeting we (@tqchen @kkraus14 @seberg and I) briefly discussed this issue. We don't have a concrete proposal yet, but something along this line could be a starter:

/*
 * Each producer library implements this function, which is similar to __dlpack__ in Python.
 * Assuming we have a way for the consumer to query DLDevice (similar to __dlpack_device__ in Python),
 * by calling this function the consumer offers a "stream" object to the producer to properly return a PyCapsule.
 */
PyObject* to_dlpack(const void* stream);

@seberg
Copy link
Contributor

seberg commented Oct 4, 2022

I guess you mean DLVManagedTensor * as a return value. I think there are basically two approaches to this API. The above, or moving the DLVManagedTensor to be a consumer problem:

to_dlpack(SelfType self, DLVManagedTensor *in_tensor, const void *stream)

where the caller/consumer passes in in_tensor. That decouples the DLVManagedTensor structs lifetime from the data lifetime (which it currently is not).
That might be inconvenient (if the consumer always wants to keep the struct around) or convenient (It removes the need for the capsule rename dance. NumPy still needs its internal capsule, but it already has needs that now).

@leofang
Copy link
Contributor

leofang commented Oct 4, 2022

I was assuming we need to return a PyCapsule too, with the C interface. But yeah if we wanna facilitate pure C-to-C exchange we don't need to involve any Python objects.

@mzient
Copy link

mzient commented Oct 15, 2024

I've run into an issue with steam semantics and object deletion.

  1. The consumer calls __dlpack__, providing a (non-default) stream handle S.
  2. The producer synchronizes with the stream and produces a DLManagedTensor.
  3. The consumer (in this case: PyTorch) schedules some work on S and calls the deleter (the GPU work is still happening on S)
  4. The producer can't synchronize properly before recycling the memory (i.e. with cudaFreeAsync) because the liveness of stream S is not guaranteed by DLPack contract.

There are several ways to solve the issue.

  1. Require that the stream handle passed to __dlpack__ remains valid until the deleter is called.
    pros:
  • most frameworks will likely guarantee that anyway
  • no API/ABI change (it's a documentation-level solution)
    cons:
  • we might run into issues with destruction order when the objects are collected by Python's GC
  1. Pass the stream to the deleter:
    pros:
  • foolproof
    cons:
  • API break, old code will be stuck with old behavior
  1. Require that the caller effectively calls cudaDeviceSyncrhonize before destroying the memory
    pros:
  • foolproof
    cons:
  • poor performance
  • false dependency on unrelated streams, resulting in keeping memory alive for longer than needed.

This needs to be addressed somehow (perhaps we should be more restrictive about semantics in newer DLPack versions?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants