Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sorting in place for OrderMap, OrderSet #57

Merged
merged 8 commits into from
Jan 4, 2018
Merged

Conversation

bluss
Copy link
Member

@bluss bluss commented Jan 3, 2018

Implement an in place sorting method for OrderMap (and OrderSet). We use a very neat trick and temporarily save away the hash for each entry somewhere else and store the old index in the hash field. The benefit is that we can use self.entries.sort_by directly after that, which is very fast.

I don't think the extra allocation is avoidable (but we could have unstable sort too, avoiding an allocation).

Benchmarks from the current version, sort is the sorting algorithm in this PR.

test ordermap_simple_sort_s                ... bench:   3,169,240 ns/iter (+/- 180,707)
test ordermap_simple_sort_u32              ... bench:     845,610 ns/iter (+/- 54,359)
test ordermap_sort_s                       ... bench:   2,550,010 ns/iter (+/- 279,423)
test ordermap_sort_u32                     ... bench:     678,828 ns/iter (+/- 10,313)

The maps have 10k key-value pairs. s is for OrderMap<String, String> and u32 is for OrderMap<u32, u32>. "~As you can see, the simple implementation wins when the key/value types are simple.~~ (Now fixed!)

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

Please read the last commits in the PR, the ones that implement sort_by and sort_keys

@bluss bluss force-pushed the sort-in-place branch 2 times, most recently from 766c6aa to 434f3fc Compare January 3, 2018 22:25
@vitiral
Copy link

vitiral commented Jan 3, 2018

I wish I could review this but (after looking at the code) I don't know enough about the inner workings of the data structure to make a comment. Thanks for doing this!

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

The improvements we could do with unsafe code are (I think) mostly about

  1. Removing bounds checks for the indexing used across sort_by
  2. Removing the second writes when applying the permutation. By that I mean the swaps; we move values with swap in safe Rust, instead of pushing one hole around. Pushing a hole halves the number of writes needed.

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

@vitiral Any review would be appreciated, of course! This implementation is actually rather simple -- it makes an identity permutation of the indices, then sorts that permutation with the user's comparison function (mapping each index to its key-value pair), and then we apply the permutation to the two parts of the hash table.

@vitiral
Copy link

vitiral commented Jan 3, 2018

the docstrings look good to me though. Some minor suggested tweaks:

+    /// Sort the map’s key-value pairs by the default ordering of the keys

Consider adding a period at the end.

Sort the map’s key-value pairs in place using the comparison
function `compare`; the comparison function receives two key and
value pairs to compare (so you can sort by keys or values).

Consider changing to:

    /// Sort the map's key-value pairs in place using the comparison·
    /// function.

    /// The comparison function receives two key and value pairs to compare, so you can sort by
    /// keys or values or both. 

I prefer docs to have a single sentance "basic summary" and then "further description" below. I also added the "or both" description and removed the parenthesis.

(In addition I alligned by 100 columns which is the default in rust, feel free to do whatever alignment you want).

(Final edit: I realized that I aligned without the correct comment inlines. fixed)

@vitiral
Copy link

vitiral commented Jan 3, 2018

It's amazing to me that you can simply mutate the Pos objects to a different index value and "walah!" it is updated. I'm definitely not up to speed on how you can then quickly iterate in the correct order. That's pretty incredible!

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

Simplified for the common case where capacity fits in 32-bits:

  • self.indices: Box<[Pos]> is like the actual hash table. Each Pos there is one u32 with the index (into self.entries) and one u32 with the hash.
  • self.entries: Vec<Bucket<K, V>> is the Key-Value-(Hash) pairs (triplets) in their order.

To change the order in sort_by we first go through the "actual hash table" and update each Pos there and switch the old index to the new index.

Then we permutate self.entries so that it uses the new indexing.

When we iterate the ordermap in its order, we just completely overlook self.indices and we just iterate self.entries in order. That's why iteration is so fast.. it's just a mapped version of the contiguous slice iterator that Vec also uses.

Copy link

@vitiral vitiral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta love quickcheck -- doesn't take a lot of tests and it is easy to feel confident that the library does a pretty decent job :)

tests/quick.rs Outdated
let mut answer = keyvals.0;
answer.sort_by_key(|t| t.0);

// reverse dedup: Because OrderMap::from_iter keeps the last value for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good comment!

@vitiral
Copy link

vitiral commented Jan 3, 2018

I just realized, is removal pretty slow for this library since you have to remove an entry from self.entries which is a Vec?

The README states:

Removal is fast since it moves memory areas only in the first vector, and uses a single swap in the second vector.

But for a large number of elements, removing a lower index should be pretty slow right?

Honestly, fast removals isn't that useful IMO. The number of times I have to remove an item is rare.

@vitiral
Copy link

vitiral commented Jan 3, 2018

I get it now! You can apply the new indexes in only O(N) time since you just need to properly swap things in a single pass. Cool!

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

The slow order preserving removal is not implemented -- it's just a swap_replace removal.

OrderMap is not a perfect structure -- we can't have it all. Slow order preserving removal is also not what we want. So my general though is: we want another variant crate of ordermap, that is not indexable, uses tombstones and has O(1) order preserving removals. Current crate will stay as it is in general because usize-indexable hash maps are also very useful.

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

Yes the permutation algo is pretty wild. And it destroys the permutation vector in the process, for its own bookkeeping :-)

@vitiral
Copy link

vitiral commented Jan 3, 2018

ah, swap_replace -- that's right! I forgot that removal screws up the order.

I think you made a very reasonable choice in terms of the different technical options. This is an AMAZING library. In particular it makes is SO much easier to work with Maps and Sets while testing since I can control the order (and I don't have the real world performance penalty of a BTree). This feature will make that even easier!

src/lib.rs Outdated
@@ -1110,6 +1110,65 @@ impl<K, V, S> OrderMap<K, V, S>
});
}

/// Sort the map’s key-value pairs by the default ordering of the keys.
pub fn sort_keys(&mut self)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a sort_keys_by helper would be nice as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, would sort_values and sort_values_by be good for completion?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference: I don't want API explosion and I doubt that would be used very often. Even when it is used, it would not have significant savings.

Benefit: if keys and values are comparable it might be possible to do an accidental footgun like:

h.sort_by(|(k1, k2, _, _)| k1.cmp(k2))

Needless to say, this would be annoying (IMO still not enough of a benefit to justify it though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want API explosion either. And we need to keep some headroom for the (inevitable?) sort_unstably family. Given the crate in question here, preferring stable sort seems obvious.

Copy link
Member Author

@bluss bluss Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be a good sport about it, but You said “Would be nice”! We should not fall into the would be nice trap. Gets in the way of the "this is my actual usecase" stuff 😄

src/lib.rs Outdated
new_index.sort_by(|&i, &j| {
let ei = &self.entries[i];
let ej = &self.entries[j];
compare(&ei.key, &ei.value, &ej.key, &ej.value)
Copy link
Member Author

@bluss bluss Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured and decrease in sort_by benchmark time from using unchecked indexing is up to 10% here for getting ei, ej. Just FYI.

@bluss
Copy link
Member Author

bluss commented Jan 3, 2018

Well this is boring. In the benchmark vs the simple version that @vitiral wrote (using drain, sort, extend); the simple version wins.

100K key-value pairs key type: u32, value type: u32

test ordermap_simple_sort                  ... bench:  11,565,368 ns/iter (+/- 1,280,747)
test ordermap_sort                         ... bench:  15,780,304 ns/iter (+/- 1,037,502)

We can decide what to do about that tomorrow. Now you see, sorting Vecs in place is pretty nice. (I wonder what we need to do this faster in place??)

Can we change the rules to win? Key type String, Value type String; with 10K key-value pairs gives a slight preference to this PR instead

test ordermap_simple_sort                  ... bench: 167,297,667 ns/iter (+/- 17,713,737)
test ordermap_sort                         ... bench: 151,640,343 ns/iter (+/- 9,195,779)

@bluss
Copy link
Member Author

bluss commented Jan 4, 2018

Ping @stjepang! Maybe you can see & superficially understand the application of slice::sort_by in this PR and we can meditate a bit over this, how to make the in place sort of OrderMap efficient.

The indirection in the current state of this PR is not good for performance, I think that's my conclusion. I'd like to sort the self.entries in place, and still somehow also produce a record that shows which start index moved to which index in the result. (Which is then used to update the self.indices.)

My mind is still on something I think we have talked about before -- sorting two slices in lock step. Imagine taking Vec::from_iter(0..v.len()) and v itself and sorting them both in lock step with some comparison function over the elements of v. This wouldn't be hard to implement I guess, it would just be weird and some unfortunate code duplication over the existing slice::sort_by.

@vitiral
Copy link

vitiral commented Jan 4, 2018 via email

@bluss
Copy link
Member Author

bluss commented Jan 4, 2018

I appreciate the review! There are up to date benchmarks in the first post. They show the same thing as before just a bit more cleanly; the simple key/value type of u32 makes the simple sort implementation win.

For complex types it should be important that this PR's sort saves us any rehashing or hash comparisons.

The remaining performance challenge I can see here is not the lack of unchecked indexing or any simple fixes that we could do with unsafe code (but won't, now), but the actual sort_by call itself. To be fast, we need a regular sort by over self.entries, not via this indirection.

We still merge this PR because this functionality is not the fastest but very useful, and we can improve upon it later.

The comparison with the simple sort as of this post (now superseded with faster sort)

test ordermap_simple_sort_s                ... bench:   3,153,184 ns/iter (+/- 100,313)
test ordermap_simple_sort_u32              ... bench:     854,225 ns/iter (+/- 39,856)
test ordermap_sort_s                       ... bench:   2,955,574 ns/iter (+/- 187,388)
test ordermap_sort_u32                     ... bench:   1,024,853 ns/iter (+/- 45,365)

@bluss
Copy link
Member Author

bluss commented Jan 4, 2018

Whaaaat whaat wut wut there's a simple way to do the faster and in place-er sort_by.

The improvement of that new commit / new version of the sort by algo: (I may squash it later):

 name               63 ns/iter  62 ns/iter  diff ns/iter   diff % 
 ordermap_sort_s    2,962,953   2,500,124       -462,829  -15.62% 
 ordermap_sort_u32  1,022,244   671,821         -350,423  -34.28%

Now it's faster than the "simple" version too. 😄

@bluss bluss changed the title Implement sorting in place Implement sorting in place for OrderMap, OrderSet Jan 4, 2018
@bluss bluss merged commit b43fa13 into master Jan 4, 2018
@bluss
Copy link
Member Author

bluss commented Jan 4, 2018

Too awesome to sit around unreleased.

@bluss bluss deleted the sort-in-place branch January 4, 2018 20:55
@vitiral
Copy link

vitiral commented Jan 4, 2018

ya, additional performance improvements can be done later. Glad you finally beat the simple version though 😄

@ghost
Copy link

ghost commented Jan 4, 2018

@bluss Your solution is very nice - I think it doesn't get better than that. :)

By the way, I was thinking... do we need a crate similar to itertools but focused on slices, perhaps named slicetools? Here are some quick ideas what kind of methods it might provide:

// These are inspired by the STL in C++.
fn nth_element(&mut self, index: usize);
fn lower_bound(&self, &T) -> usize;
fn upper_bound(&self, &T) -> usize;
fn equal_range(&self, &T) -> (usize, usize);
fn next_permutation(&mut self) -> bool;
fn prev_permutation(&mut self) -> bool;
fn is_sorted(&self) -> bool;

// Similar to, but faster than `itertools::partition` (optimized for slices).
fn partition(&mut self, f: impl FnMut(&T) -> bool) -> usize;

// Merging.
fn merge(&mut self, mid: usize);
fn merge_in_place(&mut self, mid: usize);

// Lazy sorting.
fn sorted(self) -> impl Iterator<Item = T>;
fn sorted_unstable(self) -> impl Iterator<Item = T>;

// In-place version of stable sort (and compatible with `no_std`).
fn sort_in_place(&self);

// Sorting in lockstep.
fn sort_and_permute(&mut self, &mut [P]);
fn sort_unstable_and_permute(&mut self, &mut [P]);

@clarfonthey
Copy link

@stjepang I think that'd be lovely and I'd love to help out if you made a repo for it. Although I think that a better name might be sword or knife, as they both help with slices.

@bluss
Copy link
Member Author

bluss commented Jan 5, 2018

Slice tools would be lovely. Crate odds contains some dusty gizmos for it as well, including BlockedIter.

@bluss
Copy link
Member Author

bluss commented Jan 5, 2018

Can we use sort_unstably transparently in OrderSet.sort() and OrderMap.sort_keys()? The keys are after all guaranteed to be unique by their default ordering.

@vitiral
Copy link

vitiral commented Jan 5, 2018

I would certainly hope so. I hadn't considered that before.

@bluss
Copy link
Member Author

bluss commented Jan 5, 2018

Implementation is simple but it's not a slam dunk since it's not faster for both the implemented benchmarks. Not that we have very impressive benchmarks.

 name               63 ns/iter  62 ns/iter  diff ns/iter   diff % 
 ordermap_sort_s    2,523,155   2,638,570        115,415    4.57% 
 ordermap_sort_u32  659,370     356,677         -302,693  -45.91%

@bluss
Copy link
Member Author

bluss commented Jan 5, 2018

Benchmarks here: rust-lang/rust#40601

sorting strings is basically the case slice::sort_large_random_expensive because the benchmarks use uniform random order.

@bluss
Copy link
Member Author

bluss commented Jan 9, 2018

This will seem pretty random, but benchmarks compensated for clone speed. Current stable vs unstable sort and for an element count of 24. But so that I have the numbers on file somewhere.r

 ordermap_sort_s              857         988                  131   15.29%
 ordermap_sort_u32            343         248                  -95  -27.70%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants