Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Shared data between cube slices #2549

Merged
merged 3 commits into from
May 16, 2017

Conversation

cpelley
Copy link

@cpelley cpelley commented May 12, 2017

  • When indexing a cube when the data is reaslised (not lazy), a view of the original array is returned where possible (subject to the rules when slicing in numpy).
  • When indexing a cube when the data is not reaslised (lazy), realising the data on one will still not realise the data on the other.
  • Optimisation of coord copy when replacing the points is to shallow copy the points and bounds before replacing them to avoid unnecessary copies.
  • Existing behaviour is that slicing coordinates returns views of the original points and bounds (where possible). This was likely chosen behaviour on the basis that DimCoords at least is not writeable. This is not the same however for Auxiliary coordinates and likely raises the likely case for this being a bug (i.e. one can modify AuxCoord object points and bounds).
  • DimCoord slicing will now return views of its data like AuxCoords. DimCoords will continue to realise data unlike AuxCoords due to the validation necessary for being monotonically increasing.

I have put up the changes here to demonstrate the UI (tests will follow an agreement in principle of the changes).

Demonstrate the UI of this PR:

>>> cube = iris.cube.Cube([1, 2, 3, 4])
>>> cube2 = cube[::-1]
>>> cube2.data 
>>> cube2.data.base is None
True
>>> cube.share_data = True
>>> cube2 = cube[::-1]
>>> cube2.data.base is None
False
>>> cube2.data.base is cube.data
True

Replaces #2261, #1992
These changes are shown to be readily applicable when switching to dask too link

NOT FOR MERGING

@cpelley cpelley added this to the v1.13.0 milestone May 12, 2017
@cpelley cpelley requested a review from marqh May 12, 2017 11:34
@cpelley cpelley self-assigned this May 12, 2017
@marqh
Copy link
Member

marqh commented May 12, 2017

Hi @cpelley

i'm interested to understand

======================================================================

ERROR: test_guess_bounds (iris.tests.test_coord_api.TestGuessBounds)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/miniconda/envs/test-environment/lib/python3.5/site-packages/Iris-1.12.0.dev0-py3.5-linux-x86_64.egg/iris/tests/test_coord_api.py", line 665, in test_guess_bounds

    points[2] = 32

ValueError: assignment destination is read-only

----------------------------------------------------------------------

why is this test affected, when none of the others seems to be

more to follow soon
...

@cpelley
Copy link
Author

cpelley commented May 12, 2017

...i'm interested to understand

The points has been passed to point.setter of the DimCoord so it has been made read-only (the other tests involving DimCoord take a copy of points). This PR makes setting the points take a view of the points supplied, making it consistent with the AuxCoords.

Here is the current behaviour of master:

>>> import iris.coords          
>>> import numpy as np
>>> points = np.array([1, 2, 3, 4])
>>> auxcoord = iris.coords.AuxCoord(points)
>>> points[0] = 10
>>> auxcoord.points
array([10,  2,  3,  4])

>>> points = np.array([1, 2, 3, 4])
>>> points = np.array([1, 2, 3, 4])
>>> dimcoord = iris.coords.DimCoord(points)
>>> points[0] = -1
>>> dimcoord.points
array([1, 2, 3, 4])

We make DimCoord consistent in returning a view as the AuxCoord, except that DimCoord arrays are read-only:

>>> points = np.array([1, 2, 3, 4])
>>> dimxcoord = iris.coords.DimCoord(points)
>>> points[0]
1
>>> points[0] = -10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: assignment destination is read-only

The fact that AuxCoord arrays are not read-only is a bug (see the ticket description). I can happily make it read-only as part of this PR.

@cpelley
Copy link
Author

cpelley commented May 12, 2017

@marqh perhaps if I re-introduced the tests would clarify?

@marqh
Copy link
Member

marqh commented May 12, 2017

hi @cpelley

i agree with the principal of this approach, I think that providing the user with the option to opt into passing array references rather than copies brings benefit whist protecting naive users.

I'm not sure about the name cube.share_data as data is explicitly the data array. I'd like something that conveys the 'share data array and coordinate arrays' message, but without the level of verbosity that involved.

could share_arrays be used for this purpose?
are there further options we could explore for this behaviour flag?

@cpelley
Copy link
Author

cpelley commented May 12, 2017

@marqh, as requested I have removed the changes which make the DimCoord setter for points and bounds views of their supplied arrays.

@cpelley
Copy link
Author

cpelley commented May 15, 2017

From an offline discussion, given that coordinate data views has been ruled out and the motivation for them is the same as this (cube data views). I think it's safe to say that getting this into iris for this milestone is unlikely. Pulling it out for now so as not to detract from putting out the release.

@cpelley cpelley removed this from the v1.13.0 milestone May 15, 2017
@cpelley
Copy link
Author

cpelley commented May 16, 2017

Based on off-line discussion:

  • share_data is being revisited as it refers only to the cube and an alternative suggests passing this flag down to other objects which is complex and fraught with problems and complexity.
  • Setting share_data realises the data when it was lazy to aid expected behaviour.

Thanks @marqh

@cpelley cpelley added this to the v1.13.0 milestone May 16, 2017
Copy link
Member

@marqh marqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of this change

I have one minor requested change, them I aim to merge this

@@ -1521,7 +1526,7 @@ def points(self):

@points.setter
def points(self, points):
points = np.array(points, ndmin=1)
points = np.array(points, ndmin=1, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that I would prefer to not introduce this copy flag to dim_coords just yet

i think the benefit is less, so perhaps we can look at this again

lib/iris/cube.py Outdated
# Realise the data if is hasn't already been as sharing lazy data is
# not right now possible or a usecase understood.
if self.has_lazy_data():
self.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will/may show the data, i woudl prefer

_ = cube.data

@marqh marqh merged commit 661dfc6 into SciTools:master May 16, 2017
@cpelley
Copy link
Author

cpelley commented May 17, 2017

😃 Thanks @marqh

@share_data.setter
def share_data(self, value):
# Realise the data if is hasn't already been as sharing lazy data is
# not right now possible or a usecase understood.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpelley This comment doesn't make sense.

# Realise the data if is hasn't already been as sharing lazy data is
# not right now possible or a usecase understood.
if self.has_lazy_data():
_ = self.data
Copy link
Member

@bjlittle bjlittle May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpelley I really disagree that if you want to share the data, you need to realise the data here at this point in the property. This is bad practice IMHO.

You're effectively applying a side-effect here to suit your specific needs. Surely, if you do require to realise the data, then it should only occur in the code base where share_data is True and it actually needs the data at that point in time. We're now in the situation where we've thrown away any laziness, all because we've set this property .... and the kicker is that the data is loaded regardless of the value passed ... I find that totally bizarre!

Loading the data is a BIG deal, and it's not even advertised in the doc-string. Why?

... and where's the documentation to cover this change, so that users know what's happening? No doc-string, no documentation. We really need to be always conscious of our users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @bjlittle

I explicitly asked for this to be included, as part of my review of the code, so I am more culpable than @cpelley

I think that you are right that this should only be implemented for setting to True
I think that a better docstring would help

I have created #2584, pointed at 1.13.x, as a proposed 'bug fix' for these concerns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants