Support complex numbers in DLPack protocol #50

hawkinsp · 2020-01-31T15:46:21Z

The DLPack protocol doesn't seem to have a representation of complex numbers. I don't know whether this is intentional (e.g., is the intent that clients use a structure-of-arrays representation of the real and imaginary parts?) or whether it is a missing feature.

(I therefore chose to leave this case unimplemented when adding DLPack support to JAX, but it would be nice to add it once the protocol is fixed.)

junrushao · 2020-01-31T18:04:59Z

Thank you Peter for bringing this up! This is something missing for a long time.

There are essentially two different memory layouts to store NDArrays of complex numbers.

For each complex number, the real and imaginary parts are placed contiguously in memory. For example, a complex 32-bit complex fp can be seen as: (f32, f32) -> f64.
A complex tensor is a tuple of two tensors, one represents the real part, and the other stands for the imaginary part.

This would be great if we take both representations into consideration :-)

leofang · 2020-12-28T07:50:39Z

This would be great if we take both representations into consideration :-)

I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice? Could you name a few notable examples? It violates any principle we'd learn in CS 101 to preserve memory contiguity in order to maximize the memory load performance. HPC people certainly wouldn't do this at all.

In practice, AFAIK none of the programming languages (C/C++/Fortran/Python/etc) or libraries (CUDA/HIP/NumPy/PyTorch/CuPy/etc) follows the 2nd approach; complex numbers are always implemented as an (aligned) struct of two real numbers so that a complex array is an array of structs (AoS).

In many other applications structs of arrays (SoA) are preferred, yes, but not in this case due to the memory load requirement and the arithmetic intensity associated with complex numbers.

hawkinsp · 2020-12-28T14:32:57Z

@leofang Yes, I know of at least one real software/hardware stack that uses a SoA-ish representations for complex numbers. The hardware in question makes arbitrary indexing expensive (e.g., an offset of one float), but indexing over tiles of numbers cheap (e.g., a vector length worth of floats). So on that hardware you should use tiles of real and imaginary numbers (i.e., pretty much a structure of arrays) for alignment reasons.

That said, there's no DLPack support on that hardware, but it's something to think about.

tqchen · 2020-12-28T14:45:29Z

Thanks @leofang @hawkinsp for great discussions. From the computation's PoV structure of array is actually better in many cases. This is due to the presence of vector/tensor instructions. Most of the modern architetcure have vector units for floating points, but not necessarily complex numbers.

Think about complex multiplications operations as follows:

c.real = a.real * b.real - a.img * b.img
c.img = a.real * b.img + a.img * b.real

If we want to make the best use of a vector unit to carry out the above computation, we will inevitably need to put the real and img part into different vectors (so above operation can be vectorized). In a typical CPU setting, the implementation of such AoS complex operations are more involved, and usually require load of two vectors that have mixed real and img parts and shuffle them to get the real and img vector. Storing data as SoA actually simplifies the implementation for both libraries and compilers as the parts are already stored separately.

This can be a real concern when we start to support accelerators that operates on vectors or even tiles of matrices(@hawkinsp 's example ) and vector shuffle instruction is not necessary available.

From a locality point of view, storing data in SoA does not bring as such a burden, because the access to both real and img part are still local to the cache. The reality is that SoA would remove the need of vector shuffling thus achieving better memory loading perf.

So while from the conventional's pov AoS is certainly more commonly used as a storage format, SoA does bring benefit in most cases in terms of the computational PoV

grlee77 · 2020-12-28T17:04:29Z

I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice?

Matlab is the one I am familiar with, because it was annoying to convert to/from interleaved complex when I was working with C MEX files several years ago. However, it looks like now they also support the interleaved representation.

leofang · 2020-12-30T16:21:56Z

Thanks @hawkinsp @tqchen @grlee77 for replies!

On the special hardware/software stack that uses SoA: Is that a real thing or still a project on paper? I'd love to learn more specifics (say, the name) if possible 🙂 But, while it makes sense to consider the benefit of vectorization instructions in such a setup, in more complex situations such as implementing mathematical functions for complex numbers, it's much more difficult than AoS as far as I understand.

Let me reiterate that the vast majority of languages, libraries, and frameworks out there support the AoS storage by default. @grlee77's Matlab example is a perfect addition to the examples I gathered above: Matlab is moving to abandon SoA in favor of AoS. Adding support for such a common case should be a low hanging fruit and I don't see why this has to be a blocker to move on and close this issue. For example, we could just add a new entry kDLComplex to DLDataTypeCode, and it'd be straightforward to define DLDataType to match single complex and double complex (as in C99).

If there is a serious need to support SoA complex types, we can always revisit and revise the DLPack standard in the future, though it's nontrivial to me at the moment to support such a thing in DLPack...(Another justification for leaving it aside for now.)

tqchen · 2020-12-31T01:06:14Z

In terms of complexity, I don't think math function will be more complex in either forms. SoA approach would simply correspond to load from both arrays, compute then store. Again in such cases SoA is more friendly for vectorization.

The use of SoA is quite real in many projects. @hawkinsp might be able to say more but I believe this is the scheme used by JAX. Technically, this would apply to all specialized accelerators that does not have vector shuffle instructions as well (e.g. TPU).

Additionally, given that there is an increasing amount of floating point representations(e.g. float32, float64, bfloat16, posit). Choosing SoA repr might help simplify the repr a bit. So that we do not have to grow a corresponding complex number support per floating point data type.

The above issues are only going to be a bigger problem as more specialized accelerators are being produced. So if we really want to consider support for more hardwares, I would give SoA a serious consideration.

Standardization should be buyin from the frameworks and hardware. In this case, it would be great to have a conversation about it.

leofang · 2021-01-13T16:19:16Z

@tqchen Sorry I dropped the ball.

Performance on the SoA/AoS differences aside, in practice how would you adopt DLPack for SoA? As I said earlier I think it's completely capable of describing AoS given that vector types can be natively handled by using lanes, but I still don't see a straightforward way to include SoA. Perhaps you need to tweak shape (and/or strides) together with lanes in order to achieve it?

Regardless, I think we'll need at least two more types added to DLDataTypeCode, say kDLComplexSoA and kDLComplexAoS. (The naming is admittedly poor but you get the point 😂) The reason is that SoA and AoS cannot be exchanged with zero-copy. So effectively what I was saying above is that adding kDLComplexAoS is straightforward and can be done as of today (it's literally just a PR with one-line change I can do right away).

@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

rgommers · 2021-01-13T18:59:35Z

@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.

Regardless, I think we'll need at least two more types added to DLDataTypeCode, say kDLComplexSoA and kDLComplexAoS.

Adding two distinct data types seems to make sense if there's two relatively common representations with their own advantages. And then if the consumer doesn't support the format the producer offers, it gets a choice of doing the copy (perhaps while emitting a warning for efficiency) or raising an error.

Making a choice of AoS vs. SoA seems to only make sense if there's a high chance that existing libraries will change their implementation (which seems unlikely).

leofang · 2021-01-13T21:40:27Z

Thanks, @rgommers.

With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.

Yes, especially describing AoS is so simple and can be immediately used for NumPy/CuPy/Numba/etc that I just don't see why we have to block it before we figure out a way to describe SoA in DLPack. Let's move with the momentum 🚀

kDLComplexSoA and kDLComplexAoS

@rgommers So do you have suggestion for better names? 🙂

rgommers · 2021-01-23T20:42:36Z

The names in gh-58 (kDLComplex and kDLComplexS) seems reasonable to me, better than appending SoA and AoS probably.

tqchen · 2021-01-24T15:22:51Z

Thanks, it would be great to discuss how could we handle different underlying data types though. Do we want to support complex number that are backed by f32, f64 or (f16, bf16) in the future?

leofang · 2021-01-24T16:01:52Z

@tqchen I think you brought this up earlier but I still don't understood your question. Complex numbers can already be supported in exactly the same way as integers and floats with different bit widths. The only (obvious) requirement is both real and imaginary parts need to have the same format.

rgommers · 2021-01-24T16:04:55Z

f32 and f64 definitely, and that seems straightforward. An 8 byte kDLComplex being backed by f32 seems unambiguous (and same for a 16 byte kDLComplex). For a 4 byte one, it's unclear if that'd use f16 or bf16 given the current DLPack structure. As far as I can tell, that cannot be expressed right now, and given how rare complex32 still is that's probably worth putting out of scope now.

leofang · 2021-01-24T16:29:19Z

Ah, in this case we should avoid potential ambiguity and prefer using fp16 for complex32 for consistency. If guided by the requirement of zero copy, it seems necessary to me to define an additional kDLBComplex type for complex numbers backed by bfloats.

tqchen · 2021-02-06T14:10:05Z

Thanks @leofang @rgommers . I would suggest we move on with kDLComplex for now then. The SOA convention would need a bit more thoughts as so far we can only see it being well-defined for compact format. Perhaps worth open another thread to do so

leofang · 2021-02-07T07:59:31Z

Sounds good. Will create a new issue after #58 is in.

leofang · 2021-02-08T20:21:12Z

Sounds good. Will create a new issue after #58 is in.

See #60.

This was referenced Dec 28, 2020

Support __cuda_array_interface__ on GPU jax-ml/jax#1100

Closed

Make DLPack a submodule? cupy/cupy#4513

Closed

leofang mentioned this issue Jan 21, 2021

Support complex numbers in DLPack #58

Merged

tqchen closed this as completed in #58 Feb 8, 2021

leofang mentioned this issue Feb 8, 2021

Support "struct of arrays" based complex numbers #60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support complex numbers in DLPack protocol #50

Support complex numbers in DLPack protocol #50

hawkinsp commented Jan 31, 2020

junrushao commented Jan 31, 2020

leofang commented Dec 28, 2020 •

edited

Loading

hawkinsp commented Dec 28, 2020

tqchen commented Dec 28, 2020 •

edited

Loading

grlee77 commented Dec 28, 2020

leofang commented Dec 30, 2020

tqchen commented Dec 31, 2020

leofang commented Jan 13, 2021

rgommers commented Jan 13, 2021

leofang commented Jan 13, 2021 •

edited

Loading

rgommers commented Jan 23, 2021

tqchen commented Jan 24, 2021 •

edited

Loading

leofang commented Jan 24, 2021

rgommers commented Jan 24, 2021

leofang commented Jan 24, 2021

tqchen commented Feb 6, 2021

leofang commented Feb 7, 2021

leofang commented Feb 8, 2021

Support complex numbers in DLPack protocol #50

Support complex numbers in DLPack protocol #50

Comments

hawkinsp commented Jan 31, 2020

junrushao commented Jan 31, 2020

leofang commented Dec 28, 2020 • edited Loading

hawkinsp commented Dec 28, 2020

tqchen commented Dec 28, 2020 • edited Loading

grlee77 commented Dec 28, 2020

leofang commented Dec 30, 2020

tqchen commented Dec 31, 2020

leofang commented Jan 13, 2021

rgommers commented Jan 13, 2021

leofang commented Jan 13, 2021 • edited Loading

rgommers commented Jan 23, 2021

tqchen commented Jan 24, 2021 • edited Loading

leofang commented Jan 24, 2021

rgommers commented Jan 24, 2021

leofang commented Jan 24, 2021

tqchen commented Feb 6, 2021

leofang commented Feb 7, 2021

leofang commented Feb 8, 2021

leofang commented Dec 28, 2020 •

edited

Loading

tqchen commented Dec 28, 2020 •

edited

Loading

leofang commented Jan 13, 2021 •

edited

Loading

tqchen commented Jan 24, 2021 •

edited

Loading