Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support complex numbers in DLPack protocol #50

Closed
hawkinsp opened this issue Jan 31, 2020 · 18 comments · Fixed by #58
Closed

Support complex numbers in DLPack protocol #50

hawkinsp opened this issue Jan 31, 2020 · 18 comments · Fixed by #58

Comments

@hawkinsp
Copy link

The DLPack protocol doesn't seem to have a representation of complex numbers. I don't know whether this is intentional (e.g., is the intent that clients use a structure-of-arrays representation of the real and imaginary parts?) or whether it is a missing feature.

(I therefore chose to leave this case unimplemented when adding DLPack support to JAX, but it would be nice to add it once the protocol is fixed.)

@junrushao
Copy link
Member

Thank you Peter for bringing this up! This is something missing for a long time.

There are essentially two different memory layouts to store NDArrays of complex numbers.

  1. For each complex number, the real and imaginary parts are placed contiguously in memory. For example, a complex 32-bit complex fp can be seen as: (f32, f32) -> f64.
  2. A complex tensor is a tuple of two tensors, one represents the real part, and the other stands for the imaginary part.

This would be great if we take both representations into consideration :-)

@leofang
Copy link
Contributor

leofang commented Dec 28, 2020

This would be great if we take both representations into consideration :-)

I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice? Could you name a few notable examples? It violates any principle we'd learn in CS 101 to preserve memory contiguity in order to maximize the memory load performance. HPC people certainly wouldn't do this at all.

In practice, AFAIK none of the programming languages (C/C++/Fortran/Python/etc) or libraries (CUDA/HIP/NumPy/PyTorch/CuPy/etc) follows the 2nd approach; complex numbers are always implemented as an (aligned) struct of two real numbers so that a complex array is an array of structs (AoS).

In many other applications structs of arrays (SoA) are preferred, yes, but not in this case due to the memory load requirement and the arithmetic intensity associated with complex numbers.

@hawkinsp
Copy link
Author

@leofang Yes, I know of at least one real software/hardware stack that uses a SoA-ish representations for complex numbers. The hardware in question makes arbitrary indexing expensive (e.g., an offset of one float), but indexing over tiles of numbers cheap (e.g., a vector length worth of floats). So on that hardware you should use tiles of real and imaginary numbers (i.e., pretty much a structure of arrays) for alignment reasons.

That said, there's no DLPack support on that hardware, but it's something to think about.

@tqchen
Copy link
Member

tqchen commented Dec 28, 2020

Thanks @leofang @hawkinsp for great discussions. From the computation's PoV structure of array is actually better in many cases. This is due to the presence of vector/tensor instructions. Most of the modern architetcure have vector units for floating points, but not necessarily complex numbers.

Think about complex multiplications operations as follows:

c.real = a.real * b.real - a.img * b.img
c.img = a.real * b.img + a.img * b.real

If we want to make the best use of a vector unit to carry out the above computation, we will inevitably need to put the real and img part into different vectors (so above operation can be vectorized). In a typical CPU setting, the implementation of such AoS complex operations are more involved, and usually require load of two vectors that have mixed real and img parts and shuffle them to get the real and img vector. Storing data as SoA actually simplifies the implementation for both libraries and compilers as the parts are already stored separately.

This can be a real concern when we start to support accelerators that operates on vectors or even tiles of matrices(@hawkinsp 's example ) and vector shuffle instruction is not necessary available.

From a locality point of view, storing data in SoA does not bring as such a burden, because the access to both real and img part are still local to the cache. The reality is that SoA would remove the need of vector shuffling thus achieving better memory loading perf.

So while from the conventional's pov AoS is certainly more commonly used as a storage format, SoA does bring benefit in most cases in terms of the computational PoV

@grlee77
Copy link

grlee77 commented Dec 28, 2020

I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice?

Matlab is the one I am familiar with, because it was annoying to convert to/from interleaved complex when I was working with C MEX files several years ago. However, it looks like now they also support the interleaved representation.

@leofang
Copy link
Contributor

leofang commented Dec 30, 2020

Thanks @hawkinsp @tqchen @grlee77 for replies!

On the special hardware/software stack that uses SoA: Is that a real thing or still a project on paper? I'd love to learn more specifics (say, the name) if possible 🙂 But, while it makes sense to consider the benefit of vectorization instructions in such a setup, in more complex situations such as implementing mathematical functions for complex numbers, it's much more difficult than AoS as far as I understand.

Let me reiterate that the vast majority of languages, libraries, and frameworks out there support the AoS storage by default. @grlee77's Matlab example is a perfect addition to the examples I gathered above: Matlab is moving to abandon SoA in favor of AoS. Adding support for such a common case should be a low hanging fruit and I don't see why this has to be a blocker to move on and close this issue. For example, we could just add a new entry kDLComplex to DLDataTypeCode, and it'd be straightforward to define DLDataType to match single complex and double complex (as in C99).

If there is a serious need to support SoA complex types, we can always revisit and revise the DLPack standard in the future, though it's nontrivial to me at the moment to support such a thing in DLPack...(Another justification for leaving it aside for now.)

@tqchen
Copy link
Member

tqchen commented Dec 31, 2020

In terms of complexity, I don't think math function will be more complex in either forms. SoA approach would simply correspond to load from both arrays, compute then store. Again in such cases SoA is more friendly for vectorization.

The use of SoA is quite real in many projects. @hawkinsp might be able to say more but I believe this is the scheme used by JAX. Technically, this would apply to all specialized accelerators that does not have vector shuffle instructions as well (e.g. TPU).

Additionally, given that there is an increasing amount of floating point representations(e.g. float32, float64, bfloat16, posit). Choosing SoA repr might help simplify the repr a bit. So that we do not have to grow a corresponding complex number support per floating point data type.

The above issues are only going to be a bigger problem as more specialized accelerators are being produced. So if we really want to consider support for more hardwares, I would give SoA a serious consideration.

Standardization should be buyin from the frameworks and hardware. In this case, it would be great to have a conversation about it.

@leofang
Copy link
Contributor

leofang commented Jan 13, 2021

@tqchen Sorry I dropped the ball.

Performance on the SoA/AoS differences aside, in practice how would you adopt DLPack for SoA? As I said earlier I think it's completely capable of describing AoS given that vector types can be natively handled by using lanes, but I still don't see a straightforward way to include SoA. Perhaps you need to tweak shape (and/or strides) together with lanes in order to achieve it?

Regardless, I think we'll need at least two more types added to DLDataTypeCode, say kDLComplexSoA and kDLComplexAoS. (The naming is admittedly poor but you get the point 😂) The reason is that SoA and AoS cannot be exchanged with zero-copy. So effectively what I was saying above is that adding kDLComplexAoS is straightforward and can be done as of today (it's literally just a PR with one-line change I can do right away).

@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

@rgommers
Copy link
Contributor

@rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.

Regardless, I think we'll need at least two more types added to DLDataTypeCode, say kDLComplexSoA and kDLComplexAoS.

Adding two distinct data types seems to make sense if there's two relatively common representations with their own advantages. And then if the consumer doesn't support the format the producer offers, it gets a choice of doing the copy (perhaps while emitting a warning for efficiency) or raising an error.

Making a choice of AoS vs. SoA seems to only make sense if there's a high chance that existing libraries will change their implementation (which seems unlikely).

@leofang
Copy link
Contributor

leofang commented Jan 13, 2021

Thanks, @rgommers.

With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy.

Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.

Yes, especially describing AoS is so simple and can be immediately used for NumPy/CuPy/Numba/etc that I just don't see why we have to block it before we figure out a way to describe SoA in DLPack. Let's move with the momentum 🚀

kDLComplexSoA and kDLComplexAoS

@rgommers So do you have suggestion for better names? 🙂

@rgommers
Copy link
Contributor

The names in gh-58 (kDLComplex and kDLComplexS) seems reasonable to me, better than appending SoA and AoS probably.

@tqchen
Copy link
Member

tqchen commented Jan 24, 2021

Thanks, it would be great to discuss how could we handle different underlying data types though. Do we want to support complex number that are backed by f32, f64 or (f16, bf16) in the future?

@leofang
Copy link
Contributor

leofang commented Jan 24, 2021

@tqchen I think you brought this up earlier but I still don't understood your question. Complex numbers can already be supported in exactly the same way as integers and floats with different bit widths. The only (obvious) requirement is both real and imaginary parts need to have the same format.

@rgommers
Copy link
Contributor

f32 and f64 definitely, and that seems straightforward. An 8 byte kDLComplex being backed by f32 seems unambiguous (and same for a 16 byte kDLComplex). For a 4 byte one, it's unclear if that'd use f16 or bf16 given the current DLPack structure. As far as I can tell, that cannot be expressed right now, and given how rare complex32 still is that's probably worth putting out of scope now.

@leofang
Copy link
Contributor

leofang commented Jan 24, 2021

Ah, in this case we should avoid potential ambiguity and prefer using fp16 for complex32 for consistency. If guided by the requirement of zero copy, it seems necessary to me to define an additional kDLBComplex type for complex numbers backed by bfloats.

@tqchen
Copy link
Member

tqchen commented Feb 6, 2021

Thanks @leofang @rgommers . I would suggest we move on with kDLComplex for now then. The SOA convention would need a bit more thoughts as so far we can only see it being well-defined for compact format. Perhaps worth open another thread to do so

@leofang
Copy link
Contributor

leofang commented Feb 7, 2021

Sounds good. Will create a new issue after #58 is in.

@leofang
Copy link
Contributor

leofang commented Feb 8, 2021

Sounds good. Will create a new issue after #58 is in.

See #60.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants