-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle mask_and_scale ourselves instead of using netCDF4 #20
Conversation
This lets us NaNs instead of masked arrays to indicate missing values.
@@ -59,36 +60,6 @@ def sync(self): | |||
pass | |||
|
|||
|
|||
def convert_to_cf_variable(array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to conflict with my change https://github.com/akleeman/xray/pull/21 but I'm definitely in favor of moving it to conventions.
Notes:
Failing tests: I added an "encoded_dtype" attribute to keep of the original dtype of variables loaded from a netCDF file. Unfortunately, this means most of the round-trip tests currently fail, because no variables have "encoded_dtype" attributes until they are loaded from netCDF files. I think we will need to make some ugly trade-off to get round tripping working both directions, but I'm not yet sure what the best option is. We could:
FWIW, I think it's OK if we don't preserve data types in the round-trip process, as long as the data itself is equivalent. I'm not entirely opposed to trying, but in general it is very hard to guarantee that serialized/unserialized data is exactly equivalent. There is somewhat of a conflict between preserving the original data (netCDF-like) and representing the data in a sane format in-memory (which should not be exactly like a netCDF). IMHO, we should focus on the later. |
if any(k in attributes for k in ['add_offset', 'scale_factor']): | ||
data = np.array(data, dtype=float, copy=True) | ||
if 'add_offset' in attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the variable has been scaled and offset should we remove those attributes (or add underscores)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the for serializing to netCDF logic, so we need these attributes around still if we want to round trip things properly.
On the other hand, I'm not so certain now that exactly faithful round-trip behavior netCDF->xray->netCDF is behavior we want (see above).
OK, the last commit implements the suggested approach in #26 and all tests pass. There is not yet a test for toggling Please take a look when you get the chance :). |
if 'scale_factor' in encoding: | ||
data /= encoding['scale_factor'] | ||
attributes['add_offset'] = encoding['add_offset'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_offset -> scale_factor
Miscelaneous bugs in mask/scale serialization also fixed.
See the new commits for more comprehensive tests of encoding/decoding. |
Datasets now have an optional constructor argument which determines whether CF-variables are converted or stored raw.
@@ -118,6 +118,7 @@ def __init__(self, variables=None, attributes=None): | |||
attributes : dict-like, optional | |||
Global attributes to save on this dataset. | |||
""" | |||
self._decode_cf = decode_cf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_decode_cf
should not add additional state to Dataset. Instead it should be passed on to _set_variables
and _as_variable
. If we want the option to use CF decoding later, we should expose set_variables
as a public method.
Handle mask_and_scale ourselves instead of using netCDF4
* Define _get_backends_cls function inside apiv2.py to read engines from plugins.py * Read open_backends_dataset_* from entrypoints. * Add backend entrypoints in setup.cfg * Pass apiv2.py isort and black formatting tests. * add dependencies * add backend entrypoints and check on conflicts * black * removed global variable EMGINES add class for entrypointys * black isort * add detect_engines in __all__ init.py * removed entrypoints in py36-bare-minimum.yml and py36-min-all-deps.yml * add entrypoints in IGNORE_DEPS * Plugins test (#20) - replace entrypoints with pkg_resources - add tests * fix typo Co-authored-by: keewis <[email protected]> * style Co-authored-by: keewis <[email protected]> * style * Code style * Code style * fix: updated plugins.ENGINES with plugins.list_engines() * fix * One more correctness fix of the latest merge from master Co-authored-by: TheRed86 <[email protected]> Co-authored-by: keewis <[email protected]> Co-authored-by: Alessandro Amici <[email protected]>
Add basic CI setup
This lets us use NaNs instead of masked arrays to indicate missing values.