When and where to get missing information for active storage #31

bnlawrence · 2022-10-24T10:53:09Z

I don't think that it makes sense to pass in a netCDF.Dataset instance, as on principle we don't want those hanging around as open file handles, but ...

But we need access to netcdf attributes inside the active storage (to get all the information about missing values, compression etc).

Where and when do we think we should open the dataset and get that info?

The text was updated successfully, but these errors were encountered:

bnlawrence · 2022-10-24T10:56:49Z

It appears that we can do it in one of three places:

outside Active Storage, and initialise with a Dataset instance
when instantiating
or during the operation

Given the objection above, it looks like 2 is the right answer?

bnlawrence · 2022-10-24T11:00:18Z

(I think doing it at instantiation would make ncvar a required attribute, not a keyword, since these are per-variable properties.)

valeriupredoi · 2022-10-24T11:07:25Z

unless the Active Storage device makes the metadata available for reading locally (in some way), I reckon the best way is to have it passed to the client (just the metadata) and loaded outside the active call, since that's needed both for active and passive cases. Note that we will need the metadata not only for missing/fill values, but also for such magnificent things like cell measures, various other attributes, fixing units, fixing units of coordinates etc - a whole lot of metadata that we should think of a mechanism to be passed/loaded/used that is general enough to accommodate all those

davidhassell · 2022-10-24T11:13:59Z

I would for go for "2. when instantiating", and agrees that ncvarwould then be a required attribute. However, I would also allow missing data info to be optionally set at instantiation time - thereby saving opening and parsing the file if that information is already to hand (which it will be in cf-python)

V - what's the use case for passing other metadata (like cell measures) to the active storage? Perhaps I have misunderstood!

valeriupredoi · 2022-10-24T11:22:07Z

@davidhassell a use case: we need to compute a mean of a variable that is masked with a cell measure (eg areacella or areacello) - we can't really get a reliable mean without masking first since the info the mask carries is then destroyed if the data is not masked first, then some statistic is computed. In the same vein, data that has incorrect units needs first be fixed (eg apply a fixed factor to it to bring it to correct units) then and only then a computation can be done on it

davidhassell · 2022-10-24T11:38:24Z

Hi V - I think that use case is out of scope, as we can't use active storage to do the work unless it's the first operation in the stack, and something like x = where(cell_measure < 1e6, np.ma.masked, x) is definitely an operation ...

bnlawrence · 2022-10-24T12:09:50Z

Ok, so we have a consensus on 2., but I am not sure how to handle the "allow missing data", as there are a lot of options just for missing data alone, let alone filters and compression, so would we assume that if any keyword attributes were present then all the keywords had be seen set appropriately? For the moment I'm going to ignore this, we can put that in a future version, since. by default that'd preserve backwards compatibility.

bnlawrence · 2022-10-24T12:31:30Z

Actually, I'm wrong, since we get the compression and filter info from the zarr metadata, we're just left with the missing stuff, which is well posed ... so I'll put that in now.

valeriupredoi · 2022-10-24T12:45:18Z

For the moment I'm going to ignore this, we can put that in a future version, since. by default that'd preserve backwards compatibility.

yeah my thought too about backwards compatibility - hence me wondering about a design scheme to have all this mostly preserved when we start doing more complex stuff 🍺

bnlawrence self-assigned this Oct 27, 2022

bnlawrence modified the milestones: Post-Prototype, Prototype Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When and where to get missing information for active storage #31

When and where to get missing information for active storage #31

bnlawrence commented Oct 24, 2022 •

edited

Loading

bnlawrence commented Oct 24, 2022

bnlawrence commented Oct 24, 2022

valeriupredoi commented Oct 24, 2022 •

edited

Loading

davidhassell commented Oct 24, 2022

valeriupredoi commented Oct 24, 2022 •

edited

Loading

davidhassell commented Oct 24, 2022

bnlawrence commented Oct 24, 2022

bnlawrence commented Oct 24, 2022

valeriupredoi commented Oct 24, 2022

When and where to get missing information for active storage #31

When and where to get missing information for active storage #31

Comments

bnlawrence commented Oct 24, 2022 • edited Loading

bnlawrence commented Oct 24, 2022

bnlawrence commented Oct 24, 2022

valeriupredoi commented Oct 24, 2022 • edited Loading

davidhassell commented Oct 24, 2022

valeriupredoi commented Oct 24, 2022 • edited Loading

davidhassell commented Oct 24, 2022

bnlawrence commented Oct 24, 2022

bnlawrence commented Oct 24, 2022

valeriupredoi commented Oct 24, 2022

bnlawrence commented Oct 24, 2022 •

edited

Loading

valeriupredoi commented Oct 24, 2022 •

edited

Loading

valeriupredoi commented Oct 24, 2022 •

edited

Loading