Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NPZ #308

Closed
vincentsarago opened this issue Dec 3, 2020 · 6 comments · Fixed by #309
Closed

Add NPZ #308

vincentsarago opened this issue Dec 3, 2020 · 6 comments · Fixed by #309

Comments

@vincentsarago
Copy link
Member

if img_format == "NPY":
# If mask is not None we add it as the last band
if mask is not None:
mask = numpy.expand_dims(mask, axis=0)
tile = numpy.concatenate((tile, mask))
bio = BytesIO()
numpy.save(bio, tile)
bio.seek(0)
return bio.getvalue()

ref: https://numpy.org/doc/stable/reference/generated/numpy.savez.html#numpy.savez

@vincentsarago vincentsarago changed the title Add NPZ! Add NPZ Dec 3, 2020
@kylebarron
Copy link
Member

What's the purpose of this? So that you save data and mask as separate arrays?

@vincentsarago
Copy link
Member Author

@kylebarron I first thought that the default .npy wasn't compressed and that npz meant NPY + compression... but I was wrong.

import numpy as np

arr = np.random.randint(0, 255, size=(3, 512, 512), dtype=np.uint8)
mask = np.zeros((512, 512), dtype=np.uint8)
np.save("test.npy", arr)
np.savez_compressed("test.npz", data=arr)

There is no difference in size for the one saved with savez_compressed.

BUT I see an advantage of using savez_compressed, is to be able to save multiple arrays in one file natively.

np.savez_compressed("test.npz", data=arr, mask=mask)
arrz = np.load("test.npz")
print(arrz.files)
>> ['data', 'mask']

data, mask = arrz["data"], arrz["mask"]

This will be less hacky than happening the mask as the last band

if mask is not None:
mask = numpy.expand_dims(mask, axis=0)
tile = numpy.concatenate((tile, mask))

This was referenced Dec 3, 2020
@kylebarron
Copy link
Member

.npy isn't compressed. np.savez creates an uncompressed zip archive. np.savez_compressed creates a compressed zip archive. Your file sizes aren't different because you're creating random data... If you use np.ones(size=(3, 512, 512), dtype=np.uint8), you'll see the .npy and uncompressed .npz will be the same size as now, and the compressed .npz should be very small.

BUT I see an advantage of using savez_compressed, is to be able to save multiple arrays in one file natively.

In general, I don't think it's a big deal to return the mask as the last band, especially when you have an API like titiler's with the return_mask param. The one situation where a mask as a separate file might be useful is where you really want the mask to be a uint8 and the main data is something else like float32. But since the mask consists of two values, it should compress very well regardless.

@vincentsarago
Copy link
Member Author

ok ok thanks @kylebarron

so summary

  • np.save: creates .npy file without compression
  • np.savez: create .npz file without compression but enable multiple arrays
  • np. savez_compressed: create .npz file with compression and enable multiple arrays

@kylebarron
Copy link
Member

kylebarron commented Dec 3, 2020

Note that you can also save multiple arrays to an .npy file, in a list-like fashion. They're just unnamed in comparison to savez. This also means that the reader doesn't need to have a dependency on a Zip archive opener.

https://numpy.org/doc/stable/reference/generated/numpy.save.html

Any data saved to the file is appended to the end of the file.

with open('test.npy', 'wb') as f:
    np.save(f, np.array([1, 2]))
    np.save(f, np.array([1, 3]))
with open('test.npy', 'rb') as f:
    a = np.load(f)
    b = np.load(f)
print(a, b)
# [1 2] [1 3]

@kylebarron
Copy link
Member

Note that you can also save multiple arrays to an .npy file, in a list-like fashion

This also goes to my point about the mask being a different dtype... if you want the mask to always be uint8, you can just save them as two different arrays in sequence in the .npy file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants