-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More thorough store handling during combine/append #488
Conversation
In [1]: from kerchunk.combine import MultiZarrToZarr
...: ppath = "/Users/mdurant/Downloads/append.parquet"
...: from fsspec.implementations.reference import LazyReferenceMapper
...: out = LazyReferenceMapper(root=ppath)
...: import re
...: import datetime
...: def fn_to_time(index, fs, var, fn):
...: match = re.search(r'CLDPROP_D3_VIIRS_SNPP\.A(\d{4})(\d{3})\.', fn)
...: year = int(match.group(1))
...: day_of_year = int(match.group(2))
...: return datetime.datetime(year, 1, 1) + datetime.timedelta(days=day_of_year - 1)
...: import numpy as np
In [2]: MultiZarrToZarr.append(
...: ["CLDPROP_D3_VIIRS_SNPP.A2024174.011.2024178005308.json"],
...: original_refs=out,
...: coo_map={'time': fn_to_time},
...: coo_dtypes={'time': np.dtype('M8[s]')},
...: concat_dims=['time'],
...: remote_protocol="file"
...: ).translate()
/Users/mdurant/conda/envs/py310/lib/python3.10/site-packages/xarray/backends/plugins.py:80: RuntimeWarning: Engine 'gribberish' loading failed:
No module named 'gribberish'
warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
Out[2]: <fsspec.implementations.reference.LazyReferenceMapper at 0x102b78250>
In [3]: import xarray as xr
In [4]: ds = xr.open_dataset(ppath, engine='kerchunk', group = "Cloud_Optical_Thickness_1621_PCL_Log_Liquid")
In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions: (time: 2, longitude: 360, latitude: 180)
Dimensions without coordinates: time, longitude, latitude
Data variables:
Mean (time, longitude, latitude) float64 ...
Pixel_Counts (time, longitude, latitude) float64 ...
Standard_Deviation (time, longitude, latitude) float64 ...
Sum (time, longitude, latitude) float64 ...
Sum_Squares (time, longitude, latitude) float64 ...
Attributes:
add_offset: 0.0
long_name: Cloud Optical Thickness Log10 for Liquid Water Clouds (1.6...
scale_factor: 1.0
units: none
valid_max: 2.176
valid_min: -2.0 |
Also required the following in fsspec: --- a/fsspec/implementations/reference.py
+++ b/fsspec/implementations/reference.py
@@ -1085,7 +1085,7 @@ class ReferenceFileSystem(AsyncFileSystem):
if self.dircache:
return path in self.dircache
elif isinstance(self.references, LazyReferenceMapper):
- return path in self.references.listdir("")
+ return path in self.references.listdir()
else: |
@martindurant Can you please share the whole notebook if possible? My append function is not working as expected. Please see screenshot below. The version of kerchunk I installed
CODE: |
Are these changes pushed ? |
No, I haven't done that yet |
Also, if possible, can you share your notebook? append() functionality is not working out for me. I made the changes to fsspec as well. Same error as below
|
Notebook for reference... in your code you have used
|
@martindurant Am I doing anything wrong in creating the reference files ? I am not able to figure it out why my parquet is not appending |
@martindurant Can you please share your example notebook, along with versions of kerchunk and fsspec installed? |
I will copy here what I have later today. |
Complete workflow:
kerchunk at 429d1df (append_deep branch, this PR) |
I think it is working as expected @martindurant. You can go ahead and merge all the branches. Thank you. Will do a thorough analysis one more time in my project and let you know if any questions. |
Fixes #487