Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assumptions made in _discontiguity_in_bounds #3345

Closed
cpelley opened this issue Jun 27, 2019 · 13 comments
Closed

BUG: Assumptions made in _discontiguity_in_bounds #3345

cpelley opened this issue Jun 27, 2019 · 13 comments
Assignees

Comments

@cpelley
Copy link

cpelley commented Jun 27, 2019

Issue1

Introduced in #3144, _discontiguity_in_bounds(lib/iris/coords.py#L1053) assumes the starting location of the bounds for 2D grids (bottom left). This means that iris 2.* plotting of data with 2D coordinates breaks with failed contiguity check (for contiguous bounds) unless the ordering of bounds is as iris currently assumes. iris 1.* wasn't subject to this contiguity function check and so worked fine for us. Here is the stack trace:

...
    qplt.pcolormesh(cube)
  File "/path/to/iris/quickplot.py", line 260, in pcolormesh
    result = iplt.pcolormesh(cube, *args, **kwargs)
  File "/path/to/iris/plot.py", line 1190, in pcolormesh
    result = _draw_2d_from_bounds('pcolormesh', cube, *args, **kwargs)
  File "/path/to/iris/plot.py", line 380, in _draw_2d_from_bounds
    atol=contig_tol)
  File "/path/to/iris/plot.py", line 360, in _check_bounds_contiguity_and_mask
    coord.name()))
ValueError: The bounds of the latitude coordinate are not contiguous and data is not masked where the discontiguity occurs. Not able to create a suitable grid to plot. You can use iris.util.find_discontiguities() to identify discontiguities in your x and y coordinate bounds arrays, and then mask them with iris.util.mask_cube()

The relevant code problem is here:

iris/lib/iris/coords.py

Lines 1053 to 1058 in b209642

if compare_axis == 'x':
upper_bounds = bounds[:, :-1, 1]
lower_bounds = bounds[:, 1:, 0]
elif compare_axis == 'y':
upper_bounds = bounds[:-1, :, 3]
lower_bounds = bounds[1:, :, 0]

So this assumes the following ordering:

3---2
|   |
0---1

The following should be equally applicable to CF:

Counter-clockwise (extended):

2---1    1---0    0---3
|   |    |   |    |   |
3---0    2---3    1---2

Clockwise:

1---2    0---1    3---0    2---3
|   |    |   |    |   |    |   |
0---3    3---2    2---1    1---0

CF specification relating is here which doesn't enforce the starting location (corner) of the bound. Neither does is enforce anticlockwise for 4-sided cells. Since iris assumes the starting location, this makes iris also sensitive to the direction even though it determines the absolute difference. This means it can be even checking x rather than y and checking y rather than x under certain starting locations/orderings.

Issue2

Assumptions are made about the coordinates bounds under the two conditionals here:

if self.ndim == 1:

elif self.ndim == 2:

This should really be constrained to something like:

if self.ndim == 1 and `self.bounds.ndim == 2:
    ...
elif self.ndim == 2 and `self.bounds.ndim == 4:
    ...
else:
    msg = 'Contiguity check on coord.ndim {} bounds.ndim {} not currently supported'
    raise RuntimeError(msg.format(self.ndim, self.bounds.ndim))

Note that a 1D coordinate with 4 bounds to each cell is a perfectly reasonable coordinate to define and iris allows us to instantiate such coordinates:

>>> iris.coords.AuxCoord(range(2), bounds=np.zeros((2, 4)))
AuxCoord(array([0, 1]), bounds=array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]]), standard_name=None, units=Unit('1'))

However, the contiguity check works under this assumption above that the bounds are 2D and proceeds 'thanks' to numpy broadcasting, even though bounds are 4D and should not be treated this way.

@cpelley cpelley changed the title #3144 introduced a bug with assuming the handedness of the coordinate system Assumed handedness of the coordinate system bug in _discontiguity_in_bounds Jun 28, 2019
@cpelley cpelley changed the title Assumed handedness of the coordinate system bug in _discontiguity_in_bounds Assumed starting location of the 2D bound in _discontiguity_in_bounds Jun 28, 2019
@cpelley cpelley changed the title Assumed starting location of the 2D bound in _discontiguity_in_bounds BUG: Assumptions made in _discontiguity_in_bounds Jun 28, 2019
@cpelley
Copy link
Author

cpelley commented Jun 28, 2019

ping @hdyson, @jkettleb

@hdyson
Copy link
Contributor

hdyson commented Jun 28, 2019

I think the diagrams make issue 1 pretty clear.

For issue 2, this has real implications for the UGrid work related to LFRic, using the 2D flexible mesh topology (the full model is 3D layered mesh, but for simplicity, let's stick to a single layer here): http://ugrid-conventions.github.io/ugrid-conventions/#2d-flexible-mesh-mixed-triangles-quadrilaterals-etc-topology

Since the latitudes of the cell centres are 1-dimensional but the nodes are 4-dimensional*, the obvious iris representation is a coordinate with the points representing the cell centres and 4-D bounds being derived from the nodes. Using an example from a C4 mesh:

print("Latitude points: ", cube.coord('latitude').points[:5])
('Latitude points: ', array([ 29.28072119,  33.30129185,  33.30129185,  29.28072119,   9.4918277 ]))
print("Latitude bounds: ", cube.coord('latitude').bounds[:5])
('Latitude bounds: ', array([[ 16.32494994,  20.94102047,  42.7342096 ,  35.26438968],
       [ 20.94102047,  22.5       ,  45.        ,  42.7342096 ],
       [ 22.5       ,  20.94102047,  42.7342096 ,  45.        ],
       [ 20.94102047,  16.32494994,  35.26438968,  42.7342096 ],
       [ -0.        ,  -0.        ,  20.94102047,  16.32494994]]))

(and a similar representation for longitude).

The complete C4 mesh does tessellate the globe, and is contiguous. Extending the example from @cpelley, it'd be great if the logic was:

if self.ndim == 1 and `self.bounds.ndim == 2:
    ...
elif self.ndim ==1 and self.bounds.ndim == 4:
    # Some UGrid-aware contiguity check
    ....
elif self.ndim == 2 and `self.bounds.ndim == 4:
    ...
else:
    msg = 'Contiguity check on coord.ndim {} bounds.ndim {} not currently supported'
    raise RuntimeError(msg.format(self.ndim, self.bounds.ndim))

* Strictly speaking, the convention is N-dimensional bounds - our use case is 4-dimensional, but maybe a more general solution is appropriate?

@bjlittle bjlittle self-assigned this Jun 28, 2019
@cpelley
Copy link
Author

cpelley commented Jul 1, 2019

Thanks for your input @hdyson
Happy to take this on, but it would be great to have confirmation from core members of the team to whether bugfixes and a potential enhancement for the above usecase above is desirable.

@pp-mo
Copy link
Member

pp-mo commented Jul 8, 2019

I don't get this bit

CF specification relating is here which doesn't enforce the starting location (corner) of the bound. Neither does is enforce anticlockwise for 4-sided cells...

From my reading, CF does enforce the starting position and order. As it says ...

In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude lat(n,m) and longitude lon(n,m), and the associated cells are four-sided,
then the boundary variables are given in the form latbnd(n,m,4) and lonbnd(n,m,4), where the trailing index runs over the four vertices of the cells.
Let us call the side of cell (j,i) facing cell (j,i-1) the "i-1" side, the side facing cell (j,i+1) the "i+1" side, and similarly for "j-1" and "j+1". Then we can refer to the vertex formed by sides i-1 and j-1 as (j-1,i-1).
With this notation, the four vertices are indexed as follows: 0=(j-1,i-1), 1=(j-1,i+1), 2=(j+1,i+1), 3=(j+1,i-1).
If i-j-upward is a right-handed coordinate system (like lon-lat-upward), this ordering means the vertices will be traversed anticlockwise on the lon-lat surface seen from above. ...

@pp-mo
Copy link
Member

pp-mo commented Jul 8, 2019

Re:

Assumptions are made about the coordinates bounds under the two conditionals here ...
it'd be great if the logic was:

I do agree that it is right for the code to be strict with the relationship between coordinate dims and bounds, but I believe that relationship is actually enforced anyway, in the means of creating coords, i.e. "coord.bounds.ndim == coord.ndim + 1" : E.G. here

IIRC the only flexibility is the size of the bounds dimension. I think that is arbitrary but, as I said above, plotting behaviour only works when it is as expected.
So I don't think re-checking coord.bounds.ndim is right at all.
What it maybe ought to do is allow only the two valid cases : if coord.ndim == 1 and coord,nbounds == 2: and if coord.ndim == 2 and coord,nbounds == 4 .

I don't think we should be adding any additional interpretations here, whether based on UGRID or not.
It would just be 'inventing' supposedly useful behaviour, but it is not grounded in the CF spec.

@pp-mo
Copy link
Member

pp-mo commented Jul 8, 2019

Thanks @hdyson @cpelley
More generally..
Just my personal view, but I think all this is speculation too far, and I'm not keen on adding behaviour.
@bjlittle any view on this ???

For instance ...

Note that a 1D coordinate with 4 bounds to each cell is a perfectly reasonable coordinate to define and iris allows us to instantiate such coordinates:

That's perfectly true, but there is no CF interpretation of what that should actually mean.
Therefore, I wouldn't expect plotting it to do anything sensible. I don't think we even have a good suggestion for what this might usefully do.

If you think valid cases are missing, we'd better focus on specific testcases.

@hdyson
Copy link
Contributor

hdyson commented Jul 8, 2019

Thanks @pp-mo . I'm going to be sticking to the particular use case of UGrid data, because that's where my thinking has been. @cpelley may contribute to the more general case.

UGrid specifies that corner nodes (the UGrid equivalent of the bounds) should be:

consistent with the CF-convention for bounds of p-sided cells

http://ugrid-conventions.github.io/ugrid-conventions/#2d-triangular-mesh-topology

So for the bounds we're discussing here, UGrid is effectively the same as CF.

CF specification relating is here which doesn't enforce the starting location (corner) of the bound. Neither does is enforce anticlockwise for 4-sided cells...

From my reading, CF does enforce the starting position and order. As it says ...

For our specific case, p-sided cells, CF-conventions has:

The vertices must be traversed anticlockwise in the lon-lat plane as viewed from above. The starting vertex is not specified.

So for the UGrid case, anticlockwise bounds is fine, but the starting vertex/node/bound can be arbitrary. For what it's worth, I notice the next sentence after your quote ends is:

If i-j-upward is left-handed, they will be traversed clockwise on the lon-lat surface.

Which I think means that the general case may be different?

I don't think we should be adding any additional interpretations here, whether based on UGRID or not.
It would just be 'inventing' supposedly useful behaviour, but it is not grounded in the CF spec.

I don't believe the issue @cpelley identified was originally with UGrid files. I'm hoping the solution for his issues can also avoid future issues with UGrid data too - two birds with one stone - but please don't let UGrid distract unnecessarily.

That's perfectly true, but there is no CF interpretation of what that should actually mean.

I think the "Bounds for multi-dimensional coordinate variables with p-sided cells" subsection of the CF conventions (I don't see a way to link directly to it?) covers this reasonably well? Consider a cell that extends from 0 degrees lat/lon to 1 degree lat lon - it could be represented by a cell with with latitude bounds of [0, 1] and longitude bounds of [0,1], or in the p-sided cell representation by a cell with latitude bounds of [0, 0, 1, 1] and longitude bounds of [0, 1, 1, 0]. The area defined by the cell is the same in both cases.

If the cell isn't exactly rectangular, though, the first representation is no longer valid, and only the p-sided cell representation works. This is the case for UGrid.

@cpelley
Copy link
Author

cpelley commented Jul 9, 2019

From my reading, CF does enforce the starting position and order. As it says ..

That isn't my reading. I read this as illustration of the way in which the bounds 'can be' indexed.
There is no reason I can think of why it would specifically constrain the 2D case OR that it would be written in such a way as to leave this to interpretation. The starting vertex is explicitly stated to be NOT specified for either 1D or the p-sided cell case which suggests that you read that the 2D is a special case.
Regarding the direction (clockwise/counterclockwise), you miss the next sentence in your quote which states that it can be clockwise or counterclockwise.

I do agree that it is right for the code to be strict with the relationship between coordinate dims and bounds, but I believe that relationship is actually enforced anyway...

Sorry, my demonstration of constraint has mislead the point I was trying to make. I meant bounds.shape[-1] rather than bounds.ndim. The code assumes the shape of the bounds. This isn't enforced as part of coord creation, hence the example I provided demonstrating the instantiation of a 1D coordinate with 2D bounds with 4 bounds per cell.

That is, my constraint example should have been something like:

if self.ndim == 1 and self.bounds.shape[-1] == 2:
    ...
elif self.ndim ==1 and self.bounds.shape[-1] == 4:
    # Some UGrid-aware contiguity check
    ....
elif self.ndim == 2 and `self.bounds.shape[-1] == 4:
    ...
else:
    msg = 'Contiguity check on coord.ndim {} bounds.ndim {} not currently supported'
    raise RuntimeError(msg.format(self.ndim, self.bounds.ndim))

Hopefully I worded this right this time...

If you think valid cases are missing, we'd better focus on specific testcases.

To clarify, there is one main motivating usecase that I attempted to highlight in the ticket description. This is of providing 2D points and 3D bounds (4-bounds per cell). Our usecase starts at a different corner than iris expects so forces us to reform the bounds for us to plot it. We could plot the data in iris1 but cannot with iris2. Sorry if this wasn't clear from the description.

The other issues I'm pointing out are endemic of the same problem that gives rise to my usecase, which is the lack of constraints (no checks on the shape of the bounds), assumed direction and assumed starting vertex.

The second usecase is driven by Harold in #3345 (comment) which is in supporting contiguity checks for UGRID (1D points with 2D (4-bounds per cell)). At the very least, iris should be constrained and raising a suitable exception (which I cover under issue2). At best, we would support this at the same time as making the other fixes here - I don't think this is major work.

@cpelley
Copy link
Author

cpelley commented Jan 22, 2020

bump

@cpelley
Copy link
Author

cpelley commented Jan 22, 2020

Workaround for iris bounds handling is to have a conditional in our code to convert direction based on a conditional:

clockwise = not LinearRing([(lon, lat) for lon, lat in zip(lons, lats)]).is_ccw
if clockwise:
    ...

A more generalised workaround for iris is to readjust the bounds starting index so that it too iris' expectations.

@pp-mo
Copy link
Member

pp-mo commented Jan 23, 2020

I've been looking at this + re-reading.
The spec, as usual, is not that concrete.
My re-reading leads me to think that it maybe is within spec to have clockwise cells, but there are only two specific "correct" roles of the 0,1,2,3 points. Also, I wouldn't expect the choice to change between cells but be the same for all.

But, in saying all that, I am specifically describing the "case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude lat(n,m) and longitude lon(n,m)...".
I was reading the spec as saying that if the data matches that description, then it is dealt with in that way and is not treated under the subsequently-described "p-sided cells" case.

So I don't think the reading of that is at all clear.
But in any case, we obviously only support some of the options, and at least the clockwise case should maybe be added.

From a practical point of view, we are very limited on resource here at present, so I'm not sure how to proceed with this. I'm concerned at the potential cost of having to 'rotate' each cell individually, if we really have to support that, and the coding is not obvious.
Can you possibly provide a PR to add the usages you need ?

@pp-mo
Copy link
Member

pp-mo commented Jan 23, 2020

Also, FWIW Another point I'd push back on is the idea that this "worked in Iris 1".
I can't work out what plot you think did work in this case -- I can only guess you mean pcolormesh : Can you explain ?

IIRC Iris 1 would plot a pcolormesh of data with unbounded 2d coords : however, the plot was actually not correct -- matplotlib would interprete the coordinate point arrays as corners and quietly ignore the "extra" row+column in the provided data array.
With bounded 2d coords, I think pcolormesh should fail, as pcolormesh --> _draw_2d_from_bounds --> _map_common(mode=BOUNDS_MODE) --> Coord._contiguous_bounds --> Coord._sanity_check_contiguous ?

@cpelley
Copy link
Author

cpelley commented Jan 24, 2020

Thanks @pp-mo

Can you explain ?

You're right, the plots look to me like they were using index. So "worked in Iris 1" is perhaps not strictly true.

Can you possibly provide a PR to add the usages you need ?
So I don't think the reading of that is at all clear.

I hadn't until now as I felt we needed to agree on what behavioural change would be accepted.

I don't disagree that CF isn't well worded - as usual :p. However, I think a case can be made that if you can understand how a dataset is described, then iris should be made to accommodate. Specifically, it's clear how to support 2D coord (4bounds per cell) clockwise/counterclockwise bounds with any starting index with no huge performance implication if all cells are assumed to be ordered in the same way.

In my opinion, it's on save that iris should more strictly constrain its behaviour to safely/clearly understood parts of the CF spec. An approach to achieve this might be to reform bounds on load or even on save.

I think there is potentially a difference in our motivations over making such a change in iris so I think it makes sense to take this to our own library (Issue1). On looking back at Issue2, unless I'm mistaken, looks like is a non-issue since #3153 (I must have been testing against 1.13 at the time??).

Cheers

@cpelley cpelley closed this as completed Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants