Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cube indexing should use numpy views when possible #914

Closed
shoyer opened this issue Jan 9, 2014 · 4 comments
Closed

Cube indexing should use numpy views when possible #914

shoyer opened this issue Jan 9, 2014 · 4 comments
Labels

Comments

@shoyer
Copy link
Contributor

shoyer commented Jan 9, 2014

When looking back over the PR (#900) to add biggus, I encountered some old code that really threw me for a loop: Iris currently always copies data arrays when indexing a cube (Cube.__getitem__). It never uses numpy memory views.

In my opinion, this is a poor design choice and a fix should be prioritized (maybe even for Iris 1.6, if that's when biggus will be arriving?). Memory views are essential to efficient programming with numpy and Python, and these redundant memory copies make Iris unnecessarily slow. I expect that using memory views instead of copies would provide a larger performance increase overall than switching to biggus. The fact that data operations are already being redone for biggus makes this actually the perfect time to make this switch.

Particularly problematic (for me) is that there is essentially no way to work around these extra copy operations. In contrast, if views were used by default, it's always possible to copy (or deepcopy) a cube later if you really want to mutate the state of its data.

It's true that defaulting to deep-copying is convenient for novice programmers to (ab)use mutability -- they will never run into a situation where an old code's data changes because they modified the data on a new cube. But this shouldn't be done at the price of making it impossible to write performant code. Notably, this is not the choice that the authors of numpy made, so it is also surprising behavior for experienced Python programmers.

On a similar note, the cube transpose method should also use views of the data array instead of making a copy. (In fact, it really should be modified to return a new object like np.transpose instead of doing the transpose operation in-place.)

Thoughts? This would, of course, probably end up breaking some client code, but from my (cursory) examination, slices making deep-copies is not actually mentioned in the public API. I would be very happy to elaborate on any in more detail on any of these points.

@rhattersley
Copy link
Member

Iris currently always copies data arrays when indexing a cube

Yeah - that's been bugging me too. I'd like to experiment with the impact of getting rid of the copies, but so far I've been too busy. 😞

shoyer added a commit to TheClimateCorporation/iris that referenced this issue Jan 14, 2014
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing cubes. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
shoyer added a commit to TheClimateCorporation/iris that referenced this issue Jan 14, 2014
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
shoyer added a commit to TheClimateCorporation/iris that referenced this issue Jan 17, 2014
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
shoyer added a commit to TheClimateCorporation/iris that referenced this issue Jan 21, 2014
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
shoyer added a commit to TheClimateCorporation/iris that referenced this issue Jan 21, 2014
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
@bblay
Copy link
Contributor

bblay commented Jan 22, 2014

Nice to see this is being thought about again.
We did discuss this, and shared coords, in the very early days of Iris.
I seem to remember it was difficult to balance risk against efficiency so we played it safe at the time.

rhattersley pushed a commit to rhattersley/iris that referenced this issue Apr 28, 2016
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
rhattersley pushed a commit to rhattersley/iris that referenced this issue Apr 28, 2016
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
rhattersley pushed a commit to rhattersley/iris that referenced this issue Apr 28, 2016
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
cpelley pushed a commit to cpelley/iris that referenced this issue Dec 8, 2016
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
cpelley pushed a commit to cpelley/iris that referenced this issue Jan 9, 2017
This is a first attempt at removing unnecessary (and very slow) deepcopy operations
with slicing or otherwise manipulating cubes and coordinates. See SciTools#914.

Note: A few of the unit tests are failing, because they insist on checking the
order (Fortran or C) of numpy arrays. I think these checks should be removed,
because it is a waste of computational effort to always ensure arrays are
contiguous. If some code needs to interface with external modules code that
require continguous arrays, it should use np.ascontiguousarray or
np.asfortranarray at the immediate level of the wrapper.
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2021

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Aug 5, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 3, 2021

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

  • Re-open this issue, if you have sufficient permissions, or
  • Add a comment pinging @SciTools/iris-devs who will re-open on your behalf.

@github-actions github-actions bot closed this as completed Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants