-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numpy Data Indexing Convention #75
Comments
My use-cases for pynrrd are pretty much identical to nibabel; so, the existing situation which matches both data-on-disk and the nibabel choice, works very well for me.
It's not just the arrays. Images can have associated frame-of-reference characteristics which also must be accounted for in any reordering. For example, diffusion images may have associated acquisition gradients in a specific frame of reference. In FSL-style Nifti DWI, gradients are always reoriented so that the reference is the image plane, but even that can get dicey very quickly!. NRRD DWI is more flexible but also more complicated, because the gradients may be stored relative to the |
Thanks for starting the discussion by the way. This is valuable to have discussed and documented in any case. |
Yeah, the fix is not as 'easy' as I thought it would be. My current thoughts are that we keep pynrrd the same but include documentation similar to what other Python libraries have. It'll be up to the user to transpose the data if they desire. This will probably be what I do because there are a lot of libraries where you end up transposing anyway to get it to work correctly. Edit: Upon further thought, most of the fields would be easy to flip. I'm envisioning sometime in the future having an You bring up a good point in terms of NRRD DWIs. If anyone has any ideas, I'm open to hearing them for support of DWIs and other 'frame-of-reference characteristics'. |
Thanks for inviting me to the conversation, sadly my knowledge of medical imaging is limited. But I think I can help sharing how I used it so far. When I was using only python, I used the this code, after reading the nrrd and before applying any calculations or method to the matrix:
But then they asked me to port everything to Slicer 3D, I basically removed those lines to get the same result (I was still loading some files using pynrrd). Most libraries I use are a wrap around C/C++ so it makes sense on using its orientation, but when working with nrrd I was just following the info on the header and acting accordingly. Something like:
I would break it in frames(24), and change the volume with the first code with pure python, but on Slicer those steps are not needed. In any case I like the idea of having the possibility to use both, because it could be more friendly for people coming from other libraries or mathlab, as long as it is easy to identify on code what is happening. |
Thanks for your input! It's helpful to see how pynrrd is being used in the "field". My current thoughts are to keep pynrrd the same but to include some documentation about C-order vs Fortran order because it can be very confusing to think about. Plus, a lot of the information out there can be sort of misleading in terms of how to deal with it. As a summary, pynrrd currently uses Fortran-order indexing because that is what the NRRD specifications say to have it as. Thus, a 3D volume would be organized such that you would index it as (x, y, z). Most Python libraries out there, including Numpy being the most widely used, use C-order indexing where a 3D volume would be indexed as (z, y, x). The one exception is Nibabel (used for loading Nifti files) which uses Fortran-ordering. Since the other libraries use C-order, I want to show how simple it is to "convert" between the two. In this case, everything is the same except for I had to flip the spacing around.
|
My problem is that I have a 3D array that I want to keep the shape of, and that I need to be C-contiguous, but I want to use pynrrd for saving and loading. As far as I know, it is impossible to get something C-contiguous directly from nrrd.read() (does not matter how I save the data). Would be great if this could be handled within pynrrd instead. Note that converting a C-contiguous array with numpy.asfortranarray() before saving with nrrd.write() seems significantly faster than writing the C-contiguous array directly. |
Thanks for the input @AAAArcus. It's always helpful to get second opinions... My inclination at this point is to not support adding any new functionality, but documenting the current functionality and mentioning ways to convert between the two. I'm happy to hear arguments/comments/suggestions for otherwise though! As I mentioned, the NRRD specification itself specifies that Fortran order should be used (see here). Thus, I think it makes sense to read the metadata information in Fortran order as well. For example, it makes much more sense that the However, from your comment, it seems that you want the data to be C-order in memory whilst keeping the shape the same, which is different from what I am talking about. So, if I'm understanding correctly, you want your data to have a shape of (x, y, z) but actually be accessing the data in the reverse order (z, y, x) under the hood. In your case, the reason why it doesn't matter how you save the data, is that the Numpy data is converted to Fortran order before being saved (see here). Then, when the data is loaded with So, let me make a few points... Personally, I'm surprised that converting to a Fortran array using In terms of reading a C-contiguous array, what do you propose for adding this feature to the library? Is it worth adding the feature when it can be done with one additional function call in the Numpy library? Also, is your use case for requiring C-contiguous data common in your opinion? We're discussing two different problems, yours is related to how the data is saved in memory and mine is how the data is handled in terms of the axes and such. |
Hi, Interesting discussion (and I like the project as well) and maybe I'm a bit late but I had a brief discussion with @AAAArcus about using pynrrd in another project. I was a bit surprised about the F-order indexing at first. My thought is that you should move to C-order indexing since this is what numpy uses (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html). What you are doing now is that you read a C-contiguous array (from an NRRD-file) and flag it as a Fortran-array. Basically tricking numpy into using Fortran-order. As for the discussion about NRRD using Fortran-order, I think you got it wrong. NRRD uses Fortran-order only in the data fields used for meta-data, the same would go for OpenCV (width, height, depth) and the likes. The image data in itself is specified in C-contiguous order which makes sense since the reference implementation of NRRD is originally written in C. Also, if you stick with the Fortran-ordering I think it's important to be very clear about this. For instance, right now the |
Hello @simeks You're not late to the conversation since nothing has been decided on what to do yet. I'm always open to comments and suggestions. I want to say that your comment was very enlightening to me. I've spent a lot of time reading about Fortran and C-ordering and it can easily get confusing. Yes, after further thought you are correct that the data is stored as C-contiguous. The current approach for reading the array is to reshape it into Fortran order like you mentioned. Saving the data is to convert the data to bytes in Fortran order once again. The confusion for me lies mainly in how Numpy handles ordering. In certain cases Numpy uses the order parameter to indicate the memory ordering, or how the data is laid out in memory. However, there are a few functions where the order parameter indicates the index ordering. My thoughts are now that maybe using the What are your thoughts? Here's the current lines of code that are causing problems and that we need to rethink. reader.py
writer.py
Edit: To be clear, this is what I'm kinda envisioning. Curious to see if there are any problems with this method or maybe a more optimal approach. reader.py
writer.py
|
Glad to help. This ordering business has always been confusing for me as well. I think one source of confusion is the fact that index ordering and memory ordering are inherently the same. You flip one and the other follows.
The idea with |
Sounds good. You're welcome to make a PR if you'd like. Otherwise, it'll be awhile before I can get some time. We should also document the new argument to prevent confusion in the future. |
Sure, I should be able to take a look at it tomorrow. |
Thanks for the feedback @addisonElliott. Sorry for being inactive, but it seems we have some progress! It's easy to get confused with contiguous here and order there (I'm still a bit confused...), so it would be great if we could make it less confusing for the user of pynrrd. Thanks for getting involved @simeks! |
Please provide your feedback for this idea. I'm interested in starting a discussion and getting your input on the issue
Background
A vast majority of Python numerical libraries use what is called C-ordered indexing rather than Fortran-ordered indexing. In simple terms, C-ordered indexing is where the dimensions go from slowest varying to fastest varying. Fortran-ordered indexing is exactly the opposite. I've cited some sources that talk about this and discuss it in much greater detail than I am going to.
Long story short, let's say you have a 4D image with x, y, z, and time.
Here's how you would index it with C-order:
data[t, z, y, x]
Here's how you would index it with Fortran-order:
data[x, y, z, t]
MATLAB & Fortran use Fortran-order, obviously and the benefit is that it is intuitive to have the dimensions in that order. C-order is used in C/C++ because you typically declare multi-dimensional arrays as such:
Issue
NRRD specifications use Fortran-order (see here). I've tested with 3D Slicer and it adheres to the standard where it stores the data shape as X, Y, Z.
However, most of the Python community seems to be using C-order and I don't think pynrrd appropriately handles it.
List of Python libraries that use C-order:
The only exception I found was Nibabel which uses Fortran ordering.
Sources & Additional Information:
If we read in a NRRD file from 3D Slicer that has the shape being (x, y, z). The shape will be the same when read into Numpy. If we were using C-ordering, the shape should be (z, y, x).
Solution
It's an easy fix but I'm more interested in what would be the best approach for users of pynrrd. I'm thinking of adding some sort of parameter/default option that can be set to transpose the data array.
The downside is that all of the dimension related arrays will need to be flipped to appropriately correspond to the right axes.
@ihnorton
@tashrifbillah
@jcnils
The text was updated successfully, but these errors were encountered: