You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the follow on to #131. (and an updated #132)
In comparing the source and target zarr stores from my regression tests, I noticed that the fill_value changed between my source and target data. I guess that it's not preserved in the rechunk, but this can lead to much larger than needed output stores.
This is an updated script from my previous test script that creates a degenerate case of almost all the same data being rechunked.
If you run this script you will see the fillvalue of "foo/bar/.zarray" changes from "fill_value": 1.0, to "fill_value": null, between the source and target zarr stores. And the output disk size of the stores is significantly different, an order of magnitude.
Thanks,
Matt
❯ du -hs *
36K source.zarr
3.1M target.zarr
Here's a script that demonstrates the issue.
importzarrfromrechunkerimportrechunkimportshutildefrun_create_input_store():
shutil.rmtree('testoutput/', ignore_errors=True)
store=zarr.DirectoryStore('testoutput/source.zarr')
root=zarr.group(store=store, overwrite=True)
foo=root.create_group('foo')
root.attrs['description'] ='root description'foo.attrs['description'] ='foo description'bar=foo.ones('bar', shape=(10000, 10000))
bar[5000, 5000] =3bar.attrs['description'] ='foo description'zarr.consolidate_metadata(store)
defrechunkit():
openstore=zarr.open_consolidated('testoutput/source.zarr')
array_plan=rechunk(openstore, {'foo/bar': (1000, 1000)},
'1GB',
'testoutput/target.zarr',
temp_store='testoutput/temp.zarr')
array_plan.execute()
zarr.consolidate_metadata('testoutput/target.zarr')
if__name__=='__main__':
run_create_input_store()
rechunkit()
print('Compare the .zmetadata files in both your source.zarr and target.zarr directories')
print('You will see that the "fill_value" in the source is 1.0 and it is null in the target.')
source=zarr.open('testoutput/source.zarr')
target=zarr.open('testoutput/target.zarr')
print(source['foo']['bar'].fill_value)
print(target['foo']['bar'].fill_value)
The text was updated successfully, but these errors were encountered:
Hi,
This is the follow on to #131. (and an updated #132)
In comparing the source and target zarr stores from my regression tests, I noticed that the
fill_value
changed between my source and target data. I guess that it's not preserved in the rechunk, but this can lead to much larger than needed output stores.This is an updated script from my previous test script that creates a degenerate case of almost all the same data being rechunked.
If you run this script you will see the fillvalue of "foo/bar/.zarray" changes from "fill_value": 1.0, to "fill_value": null, between the source and target zarr stores. And the output disk size of the stores is significantly different, an order of magnitude.
Thanks,
Matt
❯ du -hs * 36K source.zarr 3.1M target.zarr
Here's a script that demonstrates the issue.
The text was updated successfully, but these errors were encountered: