You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently only scalar and 3-vectors are handled by the translation from CDF to xarray.Dataset (here). 3-vectors are given a dimensions label "dim", and I just implemented that to handle the MAG-B_NEC data.
The proper solution is that every dimension in the data is given an appropriate label - I am not sure if this information is in the original CDF files, otherwise it will just have to be hard-coded in for every variable. It would make sense to do this on the server and build and send a netCDF from the server instead.
The same applies to adding metadata (units etc - e.g. cdf.varattsget("F") -> {'DESCRIPTION': 'Magnetic field intensity', 'UNITS': 'nT'}, and global attributes for ORIGINAL_PRODUCT_NAMES, MAGNETIC_MODELS ...). This is particularly useful as it will be used by xarray for plotting: http://xarray.pydata.org/en/stable/plotting.html#one-dimension
The xarray.Dataset/netCDF (xarray.Dataset is a direct mapping to a netCDF file) should probably follow the netCDF-CF conventions - this is in line with Aeolus (I think).
It would also be good to look at making the xarray.Dataset creation faster. The main slowdown is probably the pandas.to_datetime() call (same applies to the pandas.Dataframe conversion). Also, with very large datasets where xarray.concat() is used, it is very slow - I found that a file of a few GB took longer than 30 minutes to create the xarray.Dataset. This is further justification to build this on the server instead.
In the (probably far) future, I think we can make use of sparse xarray so that the (empty except for one point) Lat/Lon/Rad dimensions can be filled in, instead of just using a "flat" time series, so that we build a "data cube" and 2D plotting and other things can be done directly. (I could be wrong here, or most likely there is some other way to achieve this)
The text was updated successfully, but these errors were encountered:
Meta data is now included in the produced xarray.Dataset.
Global attributes (accessible as ds.attrs): "Sources", "MagneticModels", "RangeFilters"
Variable attributes (ds[x].attrs): "units", "description"
Multi-dimensional variables are now set up with appropriate xarray dimensions and coordinate labels (
Currently only scalar and 3-vectors are handled by the translation from CDF to xarray.Dataset (here). 3-vectors are given a dimensions label "dim", and I just implemented that to handle the MAG-B_NEC data.
The proper solution is that every dimension in the data is given an appropriate label - I am not sure if this information is in the original CDF files, otherwise it will just have to be hard-coded in for every variable. It would make sense to do this on the server and build and send a netCDF from the server instead.
The same applies to adding metadata (units etc - e.g.
cdf.varattsget("F") -> {'DESCRIPTION': 'Magnetic field intensity', 'UNITS': 'nT'}
, and global attributes forORIGINAL_PRODUCT_NAMES
,MAGNETIC_MODELS
...). This is particularly useful as it will be used by xarray for plotting: http://xarray.pydata.org/en/stable/plotting.html#one-dimensionThe xarray.Dataset/netCDF (xarray.Dataset is a direct mapping to a netCDF file) should probably follow the netCDF-CF conventions - this is in line with Aeolus (I think).
See also: http://xarray.pydata.org/en/stable/faq.html#what-is-your-approach-to-metadata
It would also be good to look at making the xarray.Dataset creation faster. The main slowdown is probably the
pandas.to_datetime()
call (same applies to the pandas.Dataframe conversion). Also, with very large datasets wherexarray.concat()
is used, it is very slow - I found that a file of a few GB took longer than 30 minutes to create the xarray.Dataset. This is further justification to build this on the server instead.In the (probably far) future, I think we can make use of sparse xarray so that the (empty except for one point) Lat/Lon/Rad dimensions can be filled in, instead of just using a "flat" time series, so that we build a "data cube" and 2D plotting and other things can be done directly. (I could be wrong here, or most likely there is some other way to achieve this)
The text was updated successfully, but these errors were encountered: