Add an in-place rotate method for slices to libcore #41670

scottmcm · 2017-05-01T08:25:15Z

A helpful primitive for moving chunks of data around inside a slice.

For example, if you have a range selected and are drag-and-dropping it somewhere else (Example from Sean Parent's talk).

(If this should be an RFC instead of a PR, please let me know.)

Edit: changed example

rust-highfive · 2017-05-01T08:25:27Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

kennytm · 2017-05-01T08:42:49Z

If the motivation is inserting an iterator in the middle of a vector, why not use .splice?

#![feature(splice)]
fn main() {
    let mut v = (0..10).collect::<Vec<_>>();
    v.splice(7..7, 100..104);
    assert_eq!(&v, &[0, 1, 2, 3, 4, 5, 6, 100, 101, 102, 103, 7, 8, 9]);
}

leonardo-m · 2017-05-01T11:20:21Z

Having an efficient rotate algorithm in the standard library of a system language is quite good, even expected.

scottmcm · 2017-05-01T16:39:56Z

@kennytm Well, that'll teach me not to only read the stable docs when picking example use cases 😅 I updated the doctest & the first post above to a different example instead.

That said, it would be nice to have a rotate method when implementing splice, as today it works by allocating a separate buffer for items that don't fit in the drain range instead of using excess capacity already in the vector. (Of course, for I:TrustedLen it can do even better than rotate.) Back when I first wrote this, I just did it the easy allocate+3*copy way, but benchmarking to propose for std showed that to actually be slower than using flipping hands for large rotations, presumably due to cache effects.

Diggsey · 2017-05-01T21:23:09Z

This would be useful for me. Currently I just use the following method:

// In-place rotation algorithm (shifts to the right)
fn rotate_slice<T>(slice: &mut [T], places: usize) {
    // Rotation can be implemented by reversing the slice,
    // splitting the slice in two, and then reversing the
    // two sub-slices.
    slice.reverse();
    let (a, b) = slice.split_at_mut(places);
    a.reverse();
    b.reverse();
}

How does this compare performance wise?

scottmcm · 2017-05-02T09:44:11Z

How does this compare performance wise?

The implementation in this PR is about 10% faster in general, 50% faster for certain mid/len combinations, and 90% faster for certain types.

The most obvious advantage is for small rotates, where it uses ptr::copy and a few stack variables. But that optimization can also apply after a few rounds of flipping, and it turns out [u64; 5*1024*1024].rotate(9199) is one of the cases that happens to be about twice as fast as the three-reverses algorithm.
[u8].reverse() has terrible performance. It's reordering single bytes, so LLVM can do some unrolling, but it's still byte load and byte store. Swapping slices, however, is always moving forward and never rearranging, so LLVM vectorizes it to deal in <16 x i8>, which is much faster. (This vectorizing can also help other small types, but is of course most noticeable for bytes.) Playground link to look at LLVM IR and ASM: https://is.gd/CS2skq
The flipping algorithm in the PR is also more memory friendly than the reversing. While both do about n swaps total, the reversing is two sequential reads over the whole slice. The PR does n/d rounds of d swaps, and in each round, puts d elements into their final location so they won't be needed to be looked at again. The other d elements that were loaded are needed again soon, often in the very next round, so when 2*d < cache < n, things go faster than using the reverses.

Raw benchmark numbers: https://gist.github.com/scottmcm/36a6d8db52870f1a60612f811b5ff69a

leonardo-m · 2017-05-02T17:06:57Z

[u8].reverse() has terrible performance.

This can be fixed with a specialization in another PR?

scottmcm · 2017-05-05T03:57:43Z

@leonardo-m I found a way: PR #41764

But reverse will always be slower than memcpy, since it's a more complex task. And that PR doesn't help things like [[u8;3]]::reverse(), which is also pretty slow.

ranma42 · 2017-05-07T17:19:16Z

src/libcore/slice/rotate.rs

+    }
+}
+
+unsafe fn ptr_swap_n<T>(a: *mut T, b: *mut T, n: usize) {


This seems to suffer from the same undef/padding issues as #41335
In this case it would be possible to overcome them by always doing ptr_swap_u8, but that would likely result in worse performance

Thanks for taking a look, @ranma42.

I expected this to be fine as ptr::swap is memcpy+memmove+memcpy. I figured that if memcpy'ing padding was invalid, then ptr::read::<(u8, u16)> would always be invalid, since it's also just memcpy. Is that logic incorrect?

Also, if this code hits those problems, does #40454 hit them too? (Ideally I'd take the swap-slices optimizations out of this and use a new ptr::swap_nonoverlapping_n-style function that just worked well.)

#40454 should be fine, because it only uses u8 values. Some of them might be undef, but as long as they are only copied and never used as condition guards, the worst that can happen is that the padding bytes are not guaranteed to be copied (as LLVM might see them as undef and use arbitrary values instead of accessing the memory).

ptr::read::<(u8, u16)> is not invalid as long as you do not try to interpret the result as another type. The problem arises, for example, if you have a (u8, u16) and read it as a u32 value (assuming that there is a 1-byte padding between the u8 and the u16). In this case, the u32 result would be undefined because of the padding.

carols10cents · 2017-05-08T14:28:15Z

ping @sfackler, looks like this might have fallen through the cracks? (pinging you on IRC too)

sfackler · 2017-05-09T03:04:17Z

The API itself seems pretty reasonable to me, though it sounds like there are still some unanswered questions in the implementation?

cc @rust-lang/libs

scottmcm · 2017-05-09T08:45:37Z

I've pushed an update removing the optimization, which I believe was the only question in the implementation.

(Hopefully #40454 will make it unnecessary, and if not it's probably better to share code with that that to have a different optimization here anyway.)

[Edit] Changed the commit message to make travis re-run, since it failed in git fetch. Is there a better way to handle that in future?

BurntSushi · 2017-05-09T11:13:08Z

The API seems reasonable to me too.

aturon · 2017-05-09T20:20:03Z

Libs team briefly discussed the API in triage meeting, seems good on that front.

scottmcm · 2017-05-10T04:59:21Z

And travis failed to git fetch in the xcode 8.2 config again: https://travis-ci.org/rust-lang/rust/jobs/230284608 Poking it...

kennytm · 2017-05-10T05:18:50Z

cc travis-ci/travis-ci#7589 (travis-ci/travis-build#1020) (#40474)

bors · 2017-05-10T11:40:05Z

☔ The latest upstream changes (presumably #41764) made this pull request unmergeable. Please resolve the merge conflicts.

scottmcm · 2017-05-10T17:27:47Z

Merge conflict with myself 😆 Rebased and made a real tracking issue.

carols10cents · 2017-05-15T15:09:31Z

sooooo sounds like everything is ready to go? wdyt @sfackler?

sfackler

Couple of little things. Looks reasonable otherwise.

sfackler · 2017-05-16T03:32:48Z

src/libcollections/slice.rs

@@ -1337,6 +1337,61 @@ impl<T> [T] {
        core_slice::SliceExt::sort_unstable_by_key(self, f);
    }

+    /// Permutes the slice in-place such that `self[mid..]` move to the


Hmm, I guess I was mentally inserting "the elements in". Considering it as "the subslice" is probably better; will update.

sfackler · 2017-05-16T03:32:57Z

src/libcollections/slice.rs

@@ -1337,6 +1337,61 @@ impl<T> [T] {
        core_slice::SliceExt::sort_unstable_by_key(self, f);
    }

+    /// Permutes the slice in-place such that `self[mid..]` move to the
+    /// beginning of the slice while `self[..mid]` move to the end of the


sfackler · 2017-05-16T03:34:11Z

src/libcore/slice/mod.rs

+            rotate::ptr_rotate(mid, p.offset(mid as isize), k);
+        }
+
+        k


What's the purpose of returning k?

self[..mid] moves to self[k..] and self[mid..] moves to self[..k], so when, say, rotating around a partition point, it gives the new partition point.

I return it because Stepanov returns the equivalent iterator, but I've definitely found the index less useful that the iterator. Obviously it's easy to calculate yourself, but the more annoying problem is that I'm usually rotating a subslice, so the returned index needs to be fixed-up since it's not slice-independent. (With iterators, in contrast, the new split point is non-trivial to calculate for non-random-access iterators, and iterators are independently-meaningful, so the returned split point can directly be used to construct ranges in the original container.)

So I don't have particularly strong feelings; let me know what you'd prefer.

leonardo-m · 2017-05-17T23:49:25Z

Regarding the k return value, also take a look at the API of how the D standard library performs an analogous operation (and also the name of the function):
https://dlang.org/phobos/std_algorithm_mutation.html#bringToFront

A helpful primitive for moving chunks of data around inside a slice. In particular, adding elements to the end of a Vec then moving them somewhere else, as a way to do efficient multiple-insert. (There's drain for efficient block-remove, but no easy way to block-insert.) Talk with another example: <https://youtu.be/qH6sSOr-yk8?t=560>

Batch-insert is better done with Vec::splice

It can be revisted later after the mem::swap optimizations land.

scottmcm · 2017-05-21T10:21:21Z

Thanks for the link, @leonardo-m; good to see another example of returning usize with the same meaning. It's a more useful result there, though, as it working on forward iterators means it's O(k) to recalculate it, unlike here.

I thought some more about my earlier comment and decided to just remove the return value. It's easy to calculate and could plausibly be added later, so might as well just be left off for now.

carols10cents · 2017-05-29T14:12:32Z

ping @sfackler, sounds like this is ready for another look!

alexcrichton · 2017-06-01T16:03:29Z

@bors: r+

Thanks again @scottmcm!

bors · 2017-06-01T16:03:30Z

📌 Commit 094d61f has been approved by alexcrichton

bors · 2017-06-02T07:51:26Z

⌛ Testing commit 094d61f with merge 558cd1e...

Add an in-place rotate method for slices to libcore A helpful primitive for moving chunks of data around inside a slice. For example, if you have a range selected and are drag-and-dropping it somewhere else (Example from [Sean Parent's talk](https://youtu.be/qH6sSOr-yk8?t=560)). (If this should be an RFC instead of a PR, please let me know.) Edit: changed example

bors · 2017-06-02T10:17:27Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 558cd1e to master...

RalfJung · 2018-07-18T17:26:48Z

This adds a function which says "Safety: The type T must have non-zero size", and then proceeds to call it in SliceExt::rotate without checking the size. Am I missing something or is this grossly unsafe?

@scottmcm

fix unsafety: don't call ptr_rotate for ZST `rotate::ptr_rotate` has a comment saying ``` /// # Safety /// /// The specified range must be valid for reading and writing. /// The type `T` must have non-zero size. ``` So we better make sure we don't call it on ZST... Cc @scottmcm (author of rust-lang#41670)

rust-highfive assigned sfackler May 1, 2017

carols10cents added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 1, 2017

scottmcm mentioned this pull request May 5, 2017

speed up mem::swap #40454

Merged

ranma42 reviewed May 7, 2017

View reviewed changes

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 9, 2017

scottmcm force-pushed the slice-rotate branch from 1bbfc01 to 7f32bcd Compare May 9, 2017 10:01

scottmcm force-pushed the slice-rotate branch from 7f32bcd to 1cd592e Compare May 10, 2017 04:59

scottmcm mentioned this pull request May 10, 2017

Tracking issue for slice_rotate #41891

Closed

scottmcm force-pushed the slice-rotate branch from 1cd592e to ed2222f Compare May 10, 2017 17:26

sfackler reviewed May 16, 2017

View reviewed changes

Mark-Simulacrum added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 20, 2017

Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 20, 2017

scottmcm added 6 commits May 21, 2017 01:55

Change the doctest example to slide

ebc8489

Batch-insert is better done with Vec::splice

Remove the optimization in ptr_swap_n

59a6fe6

It can be revisted later after the mem::swap optimizations land.

Update slice_rotate to a real tracking number

a92ad5e

Tweak comment wording

95db271

Stop returning k from [T]::rotate

094d61f

scottmcm force-pushed the slice-rotate branch from d71f3e0 to 094d61f Compare May 21, 2017 10:05

aidanhs added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 25, 2017

alexcrichton added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 1, 2017

bors merged commit 094d61f into rust-lang:master Jun 2, 2017

bors mentioned this pull request Jun 2, 2017

Improve reallocation in alloc_system on Windows #42331

Merged

scottmcm deleted the slice-rotate branch June 3, 2017 06:58

brson mentioned this pull request Jun 6, 2017

Rust 1.19 regression, odds 0.2.25, splice method #42462

Closed

RalfJung mentioned this pull request Jul 18, 2018

fix ptr_rotate comments #52502

Merged

scottmcm mentioned this pull request Dec 17, 2018

Consider deprecating rotate_left bluss/odds#26

Open

Add an in-place rotate method for slices to libcore #41670

Add an in-place rotate method for slices to libcore #41670

Conversation

scottmcm commented May 1, 2017 • edited Loading

rust-highfive commented May 1, 2017

kennytm commented May 1, 2017

leonardo-m commented May 1, 2017

scottmcm commented May 1, 2017

Diggsey commented May 1, 2017

scottmcm commented May 2, 2017

leonardo-m commented May 2, 2017

scottmcm commented May 5, 2017

ranma42 May 7, 2017

Choose a reason for hiding this comment

scottmcm May 7, 2017

Choose a reason for hiding this comment

ranma42 May 7, 2017 • edited Loading

Choose a reason for hiding this comment

carols10cents commented May 8, 2017

sfackler commented May 9, 2017

scottmcm commented May 9, 2017 • edited Loading

BurntSushi commented May 9, 2017

aturon commented May 9, 2017

scottmcm commented May 10, 2017

kennytm commented May 10, 2017 • edited Loading

bors commented May 10, 2017

scottmcm commented May 10, 2017

carols10cents commented May 15, 2017

sfackler left a comment

Choose a reason for hiding this comment

sfackler May 16, 2017

Choose a reason for hiding this comment

scottmcm May 16, 2017

Choose a reason for hiding this comment

sfackler May 16, 2017

Choose a reason for hiding this comment

sfackler May 16, 2017

Choose a reason for hiding this comment

scottmcm May 16, 2017

Choose a reason for hiding this comment

leonardo-m commented May 17, 2017

scottmcm commented May 21, 2017

carols10cents commented May 29, 2017

alexcrichton commented Jun 1, 2017

bors commented Jun 1, 2017

bors commented Jun 2, 2017

bors commented Jun 2, 2017

RalfJung commented Jul 18, 2018

scottmcm commented May 1, 2017 •

edited

Loading

ranma42 May 7, 2017 •

edited

Loading

scottmcm commented May 9, 2017 •

edited

Loading

kennytm commented May 10, 2017 •

edited

Loading