-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support complex numbers in DLPack protocol #50
Comments
Thank you Peter for bringing this up! This is something missing for a long time. There are essentially two different memory layouts to store NDArrays of complex numbers.
This would be great if we take both representations into consideration :-) |
I am sorry, but is the second layout (splitting a complex array into two real arrays) ever seen in practice? Could you name a few notable examples? It violates any principle we'd learn in CS 101 to preserve memory contiguity in order to maximize the memory load performance. HPC people certainly wouldn't do this at all. In practice, AFAIK none of the programming languages (C/C++/Fortran/Python/etc) or libraries (CUDA/HIP/NumPy/PyTorch/CuPy/etc) follows the 2nd approach; complex numbers are always implemented as an (aligned) struct of two real numbers so that a complex array is an array of structs (AoS). In many other applications structs of arrays (SoA) are preferred, yes, but not in this case due to the memory load requirement and the arithmetic intensity associated with complex numbers. |
@leofang Yes, I know of at least one real software/hardware stack that uses a SoA-ish representations for complex numbers. The hardware in question makes arbitrary indexing expensive (e.g., an offset of one float), but indexing over tiles of numbers cheap (e.g., a vector length worth of floats). So on that hardware you should use tiles of real and imaginary numbers (i.e., pretty much a structure of arrays) for alignment reasons. That said, there's no DLPack support on that hardware, but it's something to think about. |
Thanks @leofang @hawkinsp for great discussions. From the computation's PoV structure of array is actually better in many cases. This is due to the presence of vector/tensor instructions. Most of the modern architetcure have vector units for floating points, but not necessarily complex numbers. Think about complex multiplications operations as follows:
If we want to make the best use of a vector unit to carry out the above computation, we will inevitably need to put the real and img part into different vectors (so above operation can be vectorized). In a typical CPU setting, the implementation of such AoS complex operations are more involved, and usually require load of two vectors that have mixed real and img parts and shuffle them to get the real and img vector. Storing data as SoA actually simplifies the implementation for both libraries and compilers as the parts are already stored separately. This can be a real concern when we start to support accelerators that operates on vectors or even tiles of matrices(@hawkinsp 's example ) and vector shuffle instruction is not necessary available. From a locality point of view, storing data in SoA does not bring as such a burden, because the access to both real and img part are still local to the cache. The reality is that SoA would remove the need of vector shuffling thus achieving better memory loading perf. So while from the conventional's pov AoS is certainly more commonly used as a storage format, SoA does bring benefit in most cases in terms of the computational PoV |
Matlab is the one I am familiar with, because it was annoying to convert to/from interleaved complex when I was working with C MEX files several years ago. However, it looks like now they also support the interleaved representation. |
Thanks @hawkinsp @tqchen @grlee77 for replies! On the special hardware/software stack that uses SoA: Is that a real thing or still a project on paper? I'd love to learn more specifics (say, the name) if possible 🙂 But, while it makes sense to consider the benefit of vectorization instructions in such a setup, in more complex situations such as implementing mathematical functions for complex numbers, it's much more difficult than AoS as far as I understand. Let me reiterate that the vast majority of languages, libraries, and frameworks out there support the AoS storage by default. @grlee77's Matlab example is a perfect addition to the examples I gathered above: Matlab is moving to abandon SoA in favor of AoS. Adding support for such a common case should be a low hanging fruit and I don't see why this has to be a blocker to move on and close this issue. For example, we could just add a new entry If there is a serious need to support SoA complex types, we can always revisit and revise the DLPack standard in the future, though it's nontrivial to me at the moment to support such a thing in DLPack...(Another justification for leaving it aside for now.) |
In terms of complexity, I don't think math function will be more complex in either forms. SoA approach would simply correspond to load from both arrays, compute then store. Again in such cases SoA is more friendly for vectorization. The use of SoA is quite real in many projects. @hawkinsp might be able to say more but I believe this is the scheme used by JAX. Technically, this would apply to all specialized accelerators that does not have vector shuffle instructions as well (e.g. TPU). Additionally, given that there is an increasing amount of floating point representations(e.g. float32, float64, bfloat16, posit). Choosing SoA repr might help simplify the repr a bit. So that we do not have to grow a corresponding complex number support per floating point data type. The above issues are only going to be a bigger problem as more specialized accelerators are being produced. So if we really want to consider support for more hardwares, I would give SoA a serious consideration. Standardization should be buyin from the frameworks and hardware. In this case, it would be great to have a conversation about it. |
@tqchen Sorry I dropped the ball. Performance on the SoA/AoS differences aside, in practice how would you adopt DLPack for SoA? As I said earlier I think it's completely capable of describing AoS given that vector types can be natively handled by using Regardless, I think we'll need at least two more types added to @rgommers With data-apis/array-api#105 in mind, I'd like to see this issue resolved asap for the benefit of Array API standard, especially if you're gonna add DLPack support to NumPy. |
Thanks for the ping, I had missed this thread. I think there's some time (6-12 months for next version I'd say), but yes would be nice to keep progressing this now that we have some good momentum.
Adding two distinct data types seems to make sense if there's two relatively common representations with their own advantages. And then if the consumer doesn't support the format the producer offers, it gets a choice of doing the copy (perhaps while emitting a warning for efficiency) or raising an error. Making a choice of AoS vs. SoA seems to only make sense if there's a high chance that existing libraries will change their implementation (which seems unlikely). |
Thanks, @rgommers.
Yes, especially describing AoS is so simple and can be immediately used for NumPy/CuPy/Numba/etc that I just don't see why we have to block it before we figure out a way to describe SoA in DLPack. Let's move with the momentum 🚀
@rgommers So do you have suggestion for better names? 🙂 |
The names in gh-58 ( |
Thanks, it would be great to discuss how could we handle different underlying data types though. Do we want to support complex number that are backed by |
@tqchen I think you brought this up earlier but I still don't understood your question. Complex numbers can already be supported in exactly the same way as integers and floats with different bit widths. The only (obvious) requirement is both real and imaginary parts need to have the same format. |
|
Ah, in this case we should avoid potential ambiguity and prefer using fp16 for complex32 for consistency. If guided by the requirement of zero copy, it seems necessary to me to define an additional |
Sounds good. Will create a new issue after #58 is in. |
The DLPack protocol doesn't seem to have a representation of complex numbers. I don't know whether this is intentional (e.g., is the intent that clients use a structure-of-arrays representation of the real and imaginary parts?) or whether it is a missing feature.
(I therefore chose to leave this case unimplemented when adding DLPack support to JAX, but it would be nice to add it once the protocol is fixed.)
The text was updated successfully, but these errors were encountered: