Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byteskip -1 fails #70

Closed
tashrifbillah opened this issue Nov 5, 2018 · 8 comments
Closed

byteskip -1 fails #70

tashrifbillah opened this issue Nov 5, 2018 · 8 comments

Comments

@tashrifbillah
Copy link
Contributor

When I set byteskip: -1 in the header, the data file should load from the end. Instead, I see the following error message.

May be we need to add a condition to check if byteskip is -1, and then rewrite line 383.

@ihnorton , UKFTractography ran well, generated .vtk file fine. We just need to make this fix to run other Pynrrd dependent CLIs.

In [2]: nrrd.read('EddyCorrect-DWI.nhdr')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-d45a7d091315> in <module>()
----> 1 nrrd.read('EddyCorrect-DWI.nhdr')

~/anaconda3/lib/python3.6/site-packages/nrrd/reader.py in read(filename, custom_field_map)
    427     with open(filename, 'rb') as fh:
    428         header = read_header(fh, custom_field_map)
--> 429         data = read_data(header, fh, filename)
    430 
    431         return data, header

~/anaconda3/lib/python3.6/site-packages/nrrd/reader.py in read_data(header, fh, filename)
    381         # Byte skip is applied AFTER the decompression. Skip first x bytes of the decompressed data and parse it using
    382         # NumPy
--> 383         data = np.frombuffer(decompressed_data[byte_skip:], dtype)
    384 
    385     # Close the file

ValueError: buffer size must be a multiple of element size

@addisonElliott , any idea/commit would be appreciated.

@addisonElliott
Copy link
Collaborator

addisonElliott commented Nov 6, 2018

NRRD documentation states the following about byte skip (reference):

If skip is greater than or equal to zero, it tells how many bytes to skip in a data file in order to get to the beginning of the data. As an idiom copied from the MetaImage file format, the value of skip can be -1. This is valid only with raw encoding. The action of this byte skip is to fseek() backwards from the end of the data file, to the beginning of the data. The distance to seek is calculated from the nrrd type and axis sizes. This is a useful trick for getting at binary data in other formats with unknown (or variable) length binary headers, such as DICOM, TIFF, and BMP, but only if the data is uncompressed, and only if the end of the data is contiguous with the end of the file (which can fail to be the case in DICOM and TIFF).

Therefore, line 383 shouldn't be edited because that only occurs for compressed data. Rather, line 353 is more appropriate.

The pseudocode should be something like:

if byte_skip == -1:
    fh.seek(-1 * sizeof(data_type) * np.prod(shape), os.SEEK_END)
else:
    fh.seek(byte_skip, os.SEEK_CUR)

With this in mind, maybe it would be a good idea to check if byte_skip is negative for anything besides raw encoded data and throw an error. Furthermore, even throw an error if the byte skip is negative and not equal to -1, which is undefined. Can even apply this to line skip which can only be >= 0.

This is a good catch, I never noticed this about the NRRD documentation. Pull requests are welcome! (Not sure when I would have time to create a PR on this).

@tashrifbillah
Copy link
Contributor Author

I'll try to work on this PR this week.

Just a heads up, even if data is gzip encoded, it was correctly loaded by Slicer (may be itk/vtk takes care of it) when byteskip is set to -1.

@addisonElliott
Copy link
Collaborator

Oh okay, wasn't aware of this, then I think we should add support for byteskip = -1 for raw and compressed data. Maybe even do this for ASCII as well?

Slicer uses VTK/ITK to load NRRD's I believe which uses Teem's implementation of NRRD format. Maybe they support byteskip = -1 for compressed data then?

@tashrifbillah
Copy link
Contributor Author

@addisonElliott , I have been working on this for a while. It looks like you have put a comment before _READ_CHUNKSIZE. If I follow that comment and try to read backward chunk by chunk, it probably needs a nontrivial programming. Given that, do you think it's okay to uncompress the whole gzipped data at once?

To avoid seeking from end, we have to set byte_skip for the first chunk:

# whole data - just image data + 1
byte_skip= decompobj.decompress(fh.read( )) - dtype.itemsize * total_data_points +1

and recondition Ln 389.

@addisonElliott
Copy link
Collaborator

I probably wouldn't revert the changes of decompressing chunk by chunk because this was likely an issue in the past someone had for large NRRD files.

With that in mind, does Slicer/VTK apply the byteskip for compressed data before or after decompressing? I thought it was applied after decompression in which case the chunk size is irrelevant, you just change the index into decompressed data.

In other words, why aren't you doing something like:

if byteskip == -1:
    data = np.fromstring(decompressed_data[-dtype.itemsize * total_data_points + 1:], dtype)

@tashrifbillah
Copy link
Contributor Author

Sorry, if it was not clear, I exactly meant that.
I used the following:

data = np.frombuffer(decompressed_data[-total_data_points*dtype.itemsize: ], dtype)

It works and I'll submit a PR soon.

@tashrifbillah
Copy link
Contributor Author

@addisonElliott , once I am done, shall I run

python -m unittest discover -v nrrd/tests

Before submitting a PR?

@addisonElliott
Copy link
Collaborator

Preferably, yes. It saves effort that Travis CI has to do and I don't get spammed with messages. I say this from experience where I did 15 more pushes to try and fix my tests...

This was referenced Nov 10, 2018
addisonElliott pushed a commit that referenced this issue Nov 12, 2018
1. Fixed byte_skip -1 reading to work on compressed data (issue #70)
2. Add error checking for line_skip and byte_skip < -1
3. Added tests for error checking and byte_skip functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants