Cube indexing should use numpy views when possible #914

shoyer · 2014-01-09T03:18:55Z

When looking back over the PR (#900) to add biggus, I encountered some old code that really threw me for a loop: Iris currently always copies data arrays when indexing a cube (Cube.__getitem__). It never uses numpy memory views.

In my opinion, this is a poor design choice and a fix should be prioritized (maybe even for Iris 1.6, if that's when biggus will be arriving?). Memory views are essential to efficient programming with numpy and Python, and these redundant memory copies make Iris unnecessarily slow. I expect that using memory views instead of copies would provide a larger performance increase overall than switching to biggus. The fact that data operations are already being redone for biggus makes this actually the perfect time to make this switch.

Particularly problematic (for me) is that there is essentially no way to work around these extra copy operations. In contrast, if views were used by default, it's always possible to copy (or deepcopy) a cube later if you really want to mutate the state of its data.

It's true that defaulting to deep-copying is convenient for novice programmers to (ab)use mutability -- they will never run into a situation where an old code's data changes because they modified the data on a new cube. But this shouldn't be done at the price of making it impossible to write performant code. Notably, this is not the choice that the authors of numpy made, so it is also surprising behavior for experienced Python programmers.

On a similar note, the cube transpose method should also use views of the data array instead of making a copy. (In fact, it really should be modified to return a new object like np.transpose instead of doing the transpose operation in-place.)

Thoughts? This would, of course, probably end up breaking some client code, but from my (cursory) examination, slices making deep-copies is not actually mentioned in the public API. I would be very happy to elaborate on any in more detail on any of these points.

The text was updated successfully, but these errors were encountered:

rhattersley · 2014-01-10T16:39:53Z

Iris currently always copies data arrays when indexing a cube

Yeah - that's been bugging me too. I'd like to experiment with the impact of getting rid of the copies, but so far I've been too busy. 😞

This is a first attempt at removing unnecessary (and very slow) deepcopy operations with slicing cubes. See SciTools#914. Note: A few of the unit tests are failing, because they insist on checking the order (Fortran or C) of numpy arrays. I think these checks should be removed, because it is a waste of computational effort to always ensure arrays are contiguous. If some code needs to interface with external modules code that require continguous arrays, it should use np.ascontiguousarray or np.asfortranarray at the immediate level of the wrapper.

This is a first attempt at removing unnecessary (and very slow) deepcopy operations with slicing or otherwise manipulating cubes and coordinates. See SciTools#914. Note: A few of the unit tests are failing, because they insist on checking the order (Fortran or C) of numpy arrays. I think these checks should be removed, because it is a waste of computational effort to always ensure arrays are contiguous. If some code needs to interface with external modules code that require continguous arrays, it should use np.ascontiguousarray or np.asfortranarray at the immediate level of the wrapper.

bblay · 2014-01-22T16:15:12Z

Nice to see this is being thought about again.
We did discuss this, and shared coords, in the very early days of Iris.
I seem to remember it was difficult to balance risk against efficiency so we played it safe at the time.

This is a first attempt at removing unnecessary (and very slow) deepcopy operations with slicing or otherwise manipulating cubes and coordinates. See SciTools#914. Note: A few of the unit tests are failing, because they insist on checking the order (Fortran or C) of numpy arrays. I think these checks should be removed, because it is a waste of computational effort to always ensure arrays are contiguous. If some code needs to interface with external modules code that require continguous arrays, it should use np.ascontiguousarray or np.asfortranarray at the immediate level of the wrapper.

github-actions · 2021-08-05T00:45:21Z

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions · 2021-09-03T00:45:18Z

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

Re-open this issue, if you have sufficient permissions, or
Add a comment pinging @SciTools/iris-devs who will re-open on your behalf.

shoyer mentioned this issue Jan 14, 2014

Remove deepcopies when slicing or manipulating cubes #939

Closed

pp-mo mentioned this issue May 6, 2016

Remove deepcopies when slicing cubes and copying coords #1992

Closed

pelson added Type: Enhancement Experience: High labels Oct 24, 2017

pp-mo added Votable Feature and removed Votable Feature labels May 1, 2019

github-actions bot added the Stale A stale issue/pull-request label Aug 5, 2021

github-actions bot closed this as completed Sep 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cube indexing should use numpy views when possible #914

Cube indexing should use numpy views when possible #914

shoyer commented Jan 9, 2014

rhattersley commented Jan 10, 2014

bblay commented Jan 22, 2014

github-actions bot commented Aug 5, 2021

github-actions bot commented Sep 3, 2021

Cube indexing should use numpy views when possible #914

Cube indexing should use numpy views when possible #914

Comments

shoyer commented Jan 9, 2014

rhattersley commented Jan 10, 2014

bblay commented Jan 22, 2014

github-actions bot commented Aug 5, 2021

github-actions bot commented Sep 3, 2021