Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include AncillaryVariables and restructure Cube metadata. #3422

Merged
merged 7 commits into from
Oct 23, 2019

Conversation

lbdreyer
Copy link
Member

This includes the functionality to represent CF Ancillary variables on a cube.

This PR also includes a restructuring of dimensional metadata objects (coords, cell measures and ancillary dataset) in a hope to avoid code duplication.

Below is a rough diagram of the new structure:
IrisCoordsUML

lib/iris/coords.py Outdated Show resolved Hide resolved
lib/iris/tests/unit/coords/test_CellMeasure.py Outdated Show resolved Hide resolved
lib/iris/coords.py Outdated Show resolved Hide resolved
lib/iris/coords.py Outdated Show resolved Hide resolved
lib/iris/coords.py Outdated Show resolved Hide resolved
lib/iris/coords.py Show resolved Hide resolved
@pp-mo
Copy link
Member

pp-mo commented Oct 14, 2019

Hi @lbdreyer
Just visiting, I know this isn't ready for review yet.
But as I'm reviewing #3378, I'm seeing that changing the cube print function means you will also need to tweak the content handling in iris.experimental.representation : This provides html output for the Jupyter _repr_html_() handler -- but it does it by reprocessing the cube.__str__() output.

For convenience, I suggest you could get myself or @stephenworsley to do that, as we have both been in that area recently, and it's a bit involved.

BTW maybe that shouldn't be in iris.experimental anymore ?

@lbdreyer
Copy link
Member Author

Thanks for raising this @pp-mo. I also noticed that CellMeasure don't seem to be handled either?

I am happy for this work to be included in a separate PR, and have raised an issue to address it: #3467

lib/iris/coords.py Outdated Show resolved Hide resolved
values = values.copy()

# If the metadata is a coordinate and it has bounds, repeat the above
# with the bounds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in the former Coord code here also added "This will not realise lazy data."
I think that is still a useful hint + could usefully be added back.

lib/iris/coords.py Outdated Show resolved Hide resolved
lib/iris/cube.py Outdated
ancillary_variable_summary, cube_header = vector_summary(
[], cube_header, max_line_offset,
ancillary_variables=vector_ancillary_variables)
summary += '\n Ancillary Datasets:\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of consistency, the 'D' in Datasets should be lower case. This also applies to the 'M' in Measures in the above summary += '\n Cell Measures:\n'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue it should go the other way; we should update the other section headings (Dimension coordinates -> Dimension Coordinates) since they are headings and so should be capitalised

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I lean toward @lbdreyer here : "All Title Case" makes sense to me.

Copy link
Contributor

@stephenworsley stephenworsley Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say Ancillary Variables?

@pp-mo
Copy link
Member

pp-mo commented Oct 22, 2019

this method ... pass a 'bounds' keyword to the copy function. This reads oddly, as the local 'copy' definition, immediately below, does not support a bounds keyword.
... But all the other _DimensionMetadata methods, ... use an "override, call parent + extend" strategy instead.
... Hmmmm. Still not quite sure what I think of this yet.

After some extensive discussion !

I observed that the worst of the 'mixed' approach is that, where a Coord method "extends" the _DimensionalMetadata method to do "the same thing to the bounds", we have code which is effectively copied from one class to the other : This not only breaks DRY but puts the 'copy' a long way away where it is not obvious !
E.G. the worst case is probably the __binary_operator__ method.

I think we should either

  1. "separate" : kick all the bounds-handling code out of the _DimensionalMetadata class, and put it all in Coord
    * ( see the idea in Ancil var strict lbdreyer/iris#6 )
    * or ..
  2. "combine" : include all the bounds-handling code in _DimensionalMetadata, but only provide means to set the bounds in the Coord class.
    * ( see the idea in Merge bounds handling into _DimensionalMetadata class. lbdreyer/iris#7 )

Comparison of "separate" to "combine":
Actually I think "combine" is probably slicker, because then several methods don't need to be overriden in Coord at all.
It's certainly easier to make work, starting from where we are.
Also the "separate" approach has the problem that the _DimensionalMetadata is rather "overkill" for just storing a bounds array (and might give slower operation ?)
But "combine" is a bit weaselly, and it takes some explaining...

pp-mo and others added 2 commits October 23, 2019 12:46
* Merge bounds handling into _DimensionalMetadata class.

* Fix cube arithmetic bug.

* Code style fix.
lib/iris/cube.py Outdated Show resolved Hide resolved
@pp-mo
Copy link
Member

pp-mo commented Oct 23, 2019

Hi @lbdreyer, and anyone else watching.
I finally got round to doing some more practical checks, and checking out the testing provision.
It's now rather late to make new objections, as we really want to merge this so we can move on to subsequent tasks for the following 2-week development cycle.

So, I'm happy that there are at least some tests to exercise all the new methods.
However, I did some across some problems, and I have identified some missing testing that could usefully be included here :

  1. simple tests for cube.summary() and repr/str, showing ancillary variable with correct dimension mapping
  2. merge example showing mismatched ancillary variables preventing merge
  3. merge example identical ancillary variables shared in result cube
  4. also concatenate equivalents of those 2 merge tests

The problem of cube summary getting the dimension mappings wrong I demonstrated like this:

        >>> cube = Cube(np.zeros((2,2,3)))
        >>> cube.add_dim_coord(DimCoord(np.arange(2), long_name='t'), 0)
        >>> cube.add_dim_coord(DimCoord(np.arange(2), long_name='x'), 1)
        >>> cube.add_dim_coord(DimCoord(np.arange(3), long_name='y'), 2)
        >>> cube.add_ancillary_variable(AncillaryVariable(np.zeros((2,3))), (1,2))
        >>> print(cube)
        unknown / (unknown)                 (t: 2; x: 2; y: 3)
             Dimension coordinates:
                  t                           x     -     -
                  x                           -     x     -
                  y                           -     -     x
             Ancillary Datasets:
                  unknown                     x     x     x

Something like this should be in the above test.
However, this problem is already being addressed anyway : see "Fix cube summary of ancil vars" commit above. Nevertheless, a test exercising this will be useful..

@pp-mo
Copy link
Member

pp-mo commented Oct 23, 2019

However, I did some across some problems

It seems a considerable number of cube operations don't handle ancillary variables as they should do, of which the most worrying are ...

  • indexing (__getitem__) discards AVs
  • cube copy discards AVs
  • equality testing (__eq__) ignores them

So right now, if a cube contains AVS, cube.copy() is clearly different (and prints differently) but it tests equal !

But none of this will affect cubes that *don't * contain AVs, so in the interests of progress I think we'd better ignore this for now + defer all those problems to a new issue to tidy up the cube operations later.

lib/iris/tests/unit/cube/test_Cube.py Outdated Show resolved Hide resolved
lib/iris/tests/unit/cube/test_Cube.py Outdated Show resolved Hide resolved
@lbdreyer
Copy link
Member Author

@pp-mo I have added a test for ancillary variables in cube.summary (Note! we have very minimal testing for cube.summary in place! we should address, possibly if we also get a change to improve the html repr)

The tests for concatenate/merge made me realise that it wasn't being handled correctly (and consequently CellMeasures are also handled incorrectly when merging/concatenating). Unfortunately the work to get this done seems substantial and not something we will achieve in the next few hours. I have decided to back out the changes I previously made to _concatenate.py and _merge.py and instead fit that work into #3483

@pp-mo pp-mo merged commit 25617ca into SciTools:master Oct 23, 2019
@pp-mo
Copy link
Member

pp-mo commented Oct 23, 2019

I think what's done here is ok, though a lot remains to be dealt with.
I won't be happy if we don't fix the obvious problems in the next few days! #3483

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants