You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@jhamman just presented on some updates to xbatcher including the new data loader interfaces from #25. I tried to find a documented way of using it and I don't see one. If some could be added that would be great because I've been helping some people at my work use Satpy to prepare data for their machine learning projects and I think the data loader could be a nice optimization. Their preparation work has always ended with saving to NetCDF or zarr. My understanding of these interfaces in xbatcher is that that saving to disk step shouldn't be needed (except for future caching functionality). Is that correct?
The psuedo-code of the most recent project I helped looks something like this:
And then they do their ML work based on those NetCDF files. Satpy is all xarray[dask]-based and the actual code for the above does a lot of client.map work (distributed's Client) to do the individual pieces. I can't speak for the researcher I'm helping, but I think if there is an optimization step here by using a data loader to give these "patches" (that's what they call them) of data to pytorch/tensorflow without needing to save to NetCDF that would be a really good example for a certain NASA project we're a part of.
The text was updated successfully, but these errors were encountered:
@jhamman just presented on some updates to xbatcher including the new data loader interfaces from #25. I tried to find a documented way of using it and I don't see one. If some could be added that would be great because I've been helping some people at my work use Satpy to prepare data for their machine learning projects and I think the data loader could be a nice optimization. Their preparation work has always ended with saving to NetCDF or zarr. My understanding of these interfaces in xbatcher is that that saving to disk step shouldn't be needed (except for future caching functionality). Is that correct?
The psuedo-code of the most recent project I helped looks something like this:
And then they do their ML work based on those NetCDF files. Satpy is all
xarray[dask]
-based and the actual code for the above does a lot ofclient.map
work (distributed'sClient
) to do the individual pieces. I can't speak for the researcher I'm helping, but I think if there is an optimization step here by using a data loader to give these "patches" (that's what they call them) of data to pytorch/tensorflow without needing to save to NetCDF that would be a really good example for a certain NASA project we're a part of.The text was updated successfully, but these errors were encountered: