Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passing / overriding kwargs when opening datasets #52

Closed
rabernat opened this issue Sep 12, 2019 · 7 comments
Closed

passing / overriding kwargs when opening datasets #52

rabernat opened this issue Sep 12, 2019 · 7 comments

Comments

@rabernat
Copy link

rabernat commented Sep 12, 2019

I would like to be able to optionally override or pass additional keyword arguments to xarray when opening datasets from intake.

Consider our cannonical example

import fsspec
import xarray as xr
mapper = fsspec.get_mapper("gcs://pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt")
ds = xr.open_zarr(mapper, consolidated=True)

There are many options I might want to pass to open_zarr. For example

  • chunks=None: skip auto chunking
  • decode_cf=False: don't decode cf
  • etc. etc.

If I open this file via intake:

import intake
cat = intake.Catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml")
ds = cat["sea_surface_height"].to_dask()

there is no way to choose any of those options.

Could we make .to_dask() (itself a strange syntax...why not .to_xarray()) accept arbitrary keyword arguments which are passed to the reader function?

@martindurant
Copy link
Member

Yes, you can do this, but it may not be obvious...
For a zarr source, the class takes arbitrary kwargs and passes them to xr.open_zarr. You can override any arguments passed to the source class upon instantiation. In this case, it would look like

ds = cat["sea_surface_height"](chunks=None).to_dask()

The netCDF source class explicitly has chunks as a kwarg and xarray_kwargs for any extra kwargs, like

ds = cat["netcdfthing"](chunks=None, xarray_kwargs={"decode_cf": False}).to_dask()

why not .to_xarray()

The container type is xarray; like all other sources, when you do read() or to_dask(), you get a version of the container that the source promises. With read(), it is in-memory, with to_dask() it is chunked.

@rabernat
Copy link
Author

Ok good to know.

I find the syntax a bit confusing. Why would I be "calling" the catalog entry?

Also, where would I find this information in the intake documentation?

@martindurant
Copy link
Member

entry(...) -> source instance. This allows, exactly as shown, applying of parameters, whether explicitly given in the cat or as overrides. When you do .to_dask(), you explicitly call with no args, to get the default parameters.

@jbednar
Copy link
Contributor

jbednar commented Sep 12, 2019

I'd consider that an implicit call, myself! :-)

@martindurant
Copy link
Member

right

@jsignell
Copy link
Member

I also can't find where this is discussed in the docs. @rabernat is there a specific place where you'd expect it to be?

@martindurant
Copy link
Member

Closing this as old, and we have plans to unify the "entry" and "source" classes to avoid the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants