Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xray raises error when opening datasets with multi-dimensional coordinate variables #457

Closed
markelg opened this issue Jul 9, 2015 · 2 comments

Comments

@markelg
Copy link
Contributor

markelg commented Jul 9, 2015

Hello and thank you for this great package.

I have a (opendap) dataset where one coordinate (time24), is attached to a 2-dimensional coordinate variable. The reason is that it contains a set of forecasts that overlap in time, so the value of time24 depends on the run. Unfortunately it's not open so I can't share it for tests.

The main variable is:

float32 mean2t24(run, member, time24, lat, lon)
    long_name: Mean temperature at 2 metres since last 24 hours @ Ground or water surface

And the coordinate variables are:

int32 run(run)
    long_name: Run time for ForecastModelRunCollection
    standard_name: forecast_reference_time
    units: hours since 1981-01-01T00:00:00
    _CoordinateAxisType: RunTime

|S1 member(member, maxStrlen64)
    standard_name: realization
    _CoordinateAxisType: Ensemble

int32 time24(run, time24)
    long_name: Forecast time for ForecastModelRunCollection
    standard_name: time
    units: hours since 1981-01-01T00:00:00
    _CoordinateAxisType: Time

float32 lon(lon)
    units: degrees_east

float32 lat(lat)
    units: degrees_north

xray is currently unable to open this dataset:

ValueError: an index variable must be defined with 1-dimensional data

Which its OK, this looks like something difficult to support, but it will be fine if at least I could simply exclude the variable time24 for being read by xray. A flag like "exclude_variable=(var1, var2, ...)". And then xray would fill the coordinate with the default int64 values (0, 1, 2, 3, 4...) that uses when there is no coordinate for a dimension. This would be very useful also to exclude troublesome variables (e.g. corrupt, with weird data types, inconsistent when concatenating) that are present in many datasets. Another way to go could be to issue a warning instead of an error, and then fill the variable with the default values (0, 1, 2, 3, 4...)

I am looking at the code to see if I can implement this by myself, but I am not sure about how to proceed.

@shoyer
Copy link
Member

shoyer commented Jul 9, 2015

This is a bug with dask.array: dask/dask#391

@shoyer
Copy link
Member

shoyer commented Jul 15, 2015

Ooops, I was confused and posted my previous comment on the wrong issue! This is not a dask bug at all.

I agree that an option like exclude_variables or perhaps drop_variables would be a nice option to add to open_dataset.

If you're interested in putting together a PR, the place to put this is in open_dataset (found in xray/backends/api.py), which can pass it on to decode_cf (found in xray/conventions.py). In the later function (prior to this line), we should simply loop over vars and exclude any variables in drop_variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants