Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's sort tuples! #32710

Closed
wants to merge 3 commits into from
Closed

Conversation

willow-ahrens
Copy link
Contributor

@willow-ahrens willow-ahrens commented Jul 27, 2019

Use sorting networks for sorting tuples under length 32, and provide a fallback for large tuples which copies to a mutable intermediate. Provide an immutable sort function which returns a tuple for tuple inputs. This PR was pair programmed at the JuliaCon Hackathon with @lcw.

Some related packages:
ChipSort.jl by @nlw0
SortingNetworks.jl by @JeffreySarnoff

Here's a plot which compares the performance of the new sort on tuples (newsort(::Tuple{Vararg{Int}})), copying to array and copying back (copysort(::Tuple{Vararg{Int}})) and, for more comparison, a call to the old standby (sort!(::Vector{Int})) on a heap-allocated array of the same length. The average of calling @btime on each of 10 random integer permutations are displayed. We subtract the time it takes to allocate the result (a benchmark of the identity function with the same setup function). It is surprisingly hard to benchmark the sort function on tuples because the compiler is eager to optimize it away. The script which made the plot lives in TupleSort.jl

plot

I hope that this is an acceptable way to overload sort. It's unfortunate that there is no Algorithm trait to overload for non-allocating sort. I avoided putting this code in base/sort.jl to keep it out of the early code loading stages. It would be great if there were some way to avoid a heap allocation for the big tuple fallback.

@ericphanson
Copy link
Contributor

Related: #31818

@dkarrasch
Copy link
Member

There is tuple sorting functionality in https://github.com/Jutho/TupleTools.jl. Would be interesting to include in the comparison above. @Jutho

@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Jul 28, 2019
@willow-ahrens
Copy link
Contributor Author

I'm not really sure what the fallback behavior should look like. It appears that the spirit of tuple functions is to return some recursive, "probably inlineable" implementation, but it is difficult to write a performant implementation in this style. Because I am procrastinating on something else, I ended up writing a bitonic merge network generator to recursively stitch together the smaller best-known networks. Note that the merge network generator creates about N*log(n)^2 swaps in the function body. I'm also unsure if we should branch once on rev at the top of the network, or use the comparison lt(by(a), by(b)) != rev for each swap like TupleTools does. I can't detect a difference between the two, but the second form generates less code. For comparison, I use the second form in the network function shown in this new plot. The performance of the completely network-based sorting routine is listed as networksort(::Tuple{Vararg{Int}}). The performance of the TupleTools merge sort is listed as mergesort(::Tuple{Vararg{Int}}). I also added an @inbounds to the creation of the result tuple in copysort.

plot

A hypothesis for the performance of mergesort on larger tuples: since the TupleTools sort function is a merge sort where the recursive call depends on the remaining length of the sorted subtuples (a property which depends on the runtime values of the arguments), the calls cannot be inlined.

@Jutho
Copy link
Contributor

Jutho commented Jul 29, 2019

While I am not doubting your timings, and the sort in TupleTools.jl was a very basic implementation as I am no expert in sorting algorithms, the benchmark may be more fair if you generate a list of permutations beforehand and use exactly the same permutations to be sorted for each of the algorithms.

@willow-ahrens
Copy link
Contributor Author

willow-ahrens commented Jul 29, 2019

I agree that would be more fair. If it helps, running the benchmarks multiple times gives me the same results. Benchmarking these sorts has been an adventure. In order to avoid timing the allocation of a result tuple and avoid dead code elimination on the sort itself, I've been using a benchmarking statement like:

t = time(@benchmark sort(perm) setup=(perm=(randperm($n)...,)) seconds=25) - time(@benchmark identity(perm) setup=(perm=(randperm($n)...,)) seconds=25)

What follows are the results if I seed the rng before each benchmark, and use the same number of samples (1000). To help with fairness, I also use the same offset (the subtracted benchmark I am using to avoid including the overhead for allocating the result tuple) for each immutable sort.

plot

Unless anyone objects, I think it makes sense to let the compiler optimize the rev parameter (llvm can introduce its own branches if it wants i suppose) and go with the lt(by(a), by(b)) != rev form. I'll also add in the @inbounds annotation to the creation of the result tuple in the copying fallback case. Since the merge network generator is complicated (and no faster than the copying case) I'll just leave that out of this PR.

@nlw0
Copy link
Contributor

nlw0 commented Jul 31, 2019

@peterahrens When you asked me about sorting tuples I thought of a sequence of tuples, and not tuples as a container, that's why I didn't understand you before. I am curious to see if this can be brought to the stdlib!

Benchmarking this kind of code is really tricky. In my experiments I created large arrays and sorted small chunks in-place. This makes it easier to benchmark, though it's a bit of an artificial thing. It might be great to come up with a benchmark closer to an application like you mentioned to me before, sorting values before a summation.

I made a PR for Combsort (#32696) where I also had to deal with the Ordering, and I'm not very sure of what I did there. Maybe I can take some inspiration from this patch? My code there isn't handling floats, btw, does this function work fine?

@vtjnash
Copy link
Member

vtjnash commented Apr 13, 2021

With the sorting implementation in SVectors, I'm leaning towards closing this, and recommending using one of the algorithms described in JuliaArrays/StaticArrays.jl#754 (comment), and suggest possibly also contributing this (https://github.com/peterahrens/TupleSort.jl) as another trait to that package (and/or to https://github.com/JuliaCollections/SortingAlgorithms.jl)

@vtjnash vtjnash closed this Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants