-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add THREDDSMergedSource
implementation
#3
Conversation
for name in cat: | ||
if fnmatch.fnmatch(name, patterns[0]): | ||
if len(patterns) == 1: | ||
out.append(cat[name](chunks={})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example takes a THREDDS URL and a path to descend down, and calls the combine function on all of the datasets found, e.g.,
import intake import intake_thredds s = intake_thredds.source.THREDDSMergedSource('http://dap.nci.org.au/thredds/catalog.xml', ['eMAST TERN', 'eMAST TERN - files', 'ASCAT', 'ASCAT_v1-0_soil-moisture_daily_0-05deg_2007-2011', '00000000', '*.nc']) s.to_dask()results in (and this takes a while)
Using chunks={}
appears to speed the xarray's combine up:
In [1]: import intake
In [2]: url = 'http://dap.nci.org.au/thredds/catalog.xml'
In [3]: paths = ['eMAST TERN', 'eMAST TERN - files', 'ASCAT', 'ASCAT_v1-0_soil-moisture_daily_0-05deg_2007-2011', '
...: 00000000', '*.nc']
In [4]: s = intake.open_thredds_merged(url, paths)
In [5]: %%time
...: ds = s.to_dask()
...:
...:
Dataset(s): 100%|██████████████████████████████| 60/60 [02:36<00:00, 2.60s/it]
CPU times: user 875 ms, sys: 235 ms, total: 1.11 s
Wall time: 2min 41s
THREDDSMergedSource
implementation
@martindurant, I added a few changes to this PR. When you get a moment, can you take a look and let me know what you think. Currently the data loading is dispatched to |
data = [ds.to_dask() for ds in tqdm(_match(cat, path), desc='Dataset(s)', ncols=79)] | ||
else: | ||
data = [ds.to_dask() for ds in _match(cat, path)] | ||
self._ds = xr.combine_by_coords(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto_combine()
function was deprecated in the most recent versions of xarray.
because of the auth options? I don't know the typical use for thredds, so I leave it up to you. The metadata describes the endpoint as dap, if I remember. |
I would be interested in this PR merged. Any progress here? @andersy005 @martindurant @larsbuntemeyer |
I honestly don't remember where this was up to. |
Alternatively, happy to merge if there are no more comments. @aaronspring , sound good? |
👍🏽 for merging this as is, and addressing issues + adding new features in separate PRs.... |
xref intake/intake-xarray#29 @rabernat
This example takes a THREDDS URL and a path to descend down, and calls the combine function on all of the datasets found, e.g.,
results in (and this takes a while)
(a randomly-chosen .nc file has 31 timepoints - days of a month, I think)