Let's sort tuples! #32710

willow-ahrens · 2019-07-27T19:21:06Z

Use sorting networks for sorting tuples under length 32, and provide a fallback for large tuples which copies to a mutable intermediate. Provide an immutable sort function which returns a tuple for tuple inputs. This PR was pair programmed at the JuliaCon Hackathon with @lcw.

Some related packages:
ChipSort.jl by @nlw0
SortingNetworks.jl by @JeffreySarnoff

Here's a plot which compares the performance of the new sort on tuples (newsort(::Tuple{Vararg{Int}})), copying to array and copying back (copysort(::Tuple{Vararg{Int}})) and, for more comparison, a call to the old standby (sort!(::Vector{Int})) on a heap-allocated array of the same length. The average of calling @btime on each of 10 random integer permutations are displayed. We subtract the time it takes to allocate the result (a benchmark of the identity function with the same setup function). It is surprisingly hard to benchmark the sort function on tuples because the compiler is eager to optimize it away. The script which made the plot lives in TupleSort.jl

I hope that this is an acceptable way to overload sort. It's unfortunate that there is no Algorithm trait to overload for non-allocating sort. I avoided putting this code in base/sort.jl to keep it out of the early code loading stages. It would be great if there were some way to avoid a heap allocation for the big tuple fallback.

…rger tuples.

ericphanson · 2019-07-27T20:02:30Z

Related: #31818

dkarrasch · 2019-07-28T11:04:46Z

There is tuple sorting functionality in https://github.com/Jutho/TupleTools.jl. Would be interesting to include in the comparison above. @Jutho

willow-ahrens · 2019-07-28T23:56:15Z

I'm not really sure what the fallback behavior should look like. It appears that the spirit of tuple functions is to return some recursive, "probably inlineable" implementation, but it is difficult to write a performant implementation in this style. Because I am procrastinating on something else, I ended up writing a bitonic merge network generator to recursively stitch together the smaller best-known networks. Note that the merge network generator creates about N*log(n)^2 swaps in the function body. I'm also unsure if we should branch once on rev at the top of the network, or use the comparison lt(by(a), by(b)) != rev for each swap like TupleTools does. I can't detect a difference between the two, but the second form generates less code. For comparison, I use the second form in the network function shown in this new plot. The performance of the completely network-based sorting routine is listed as networksort(::Tuple{Vararg{Int}}). The performance of the TupleTools merge sort is listed as mergesort(::Tuple{Vararg{Int}}). I also added an @inbounds to the creation of the result tuple in copysort.

A hypothesis for the performance of mergesort on larger tuples: since the TupleTools sort function is a merge sort where the recursive call depends on the remaining length of the sorted subtuples (a property which depends on the runtime values of the arguments), the calls cannot be inlined.

Jutho · 2019-07-29T00:03:27Z

While I am not doubting your timings, and the sort in TupleTools.jl was a very basic implementation as I am no expert in sorting algorithms, the benchmark may be more fair if you generate a list of permutations beforehand and use exactly the same permutations to be sorted for each of the algorithms.

willow-ahrens · 2019-07-29T00:46:31Z

I agree that would be more fair. If it helps, running the benchmarks multiple times gives me the same results. Benchmarking these sorts has been an adventure. In order to avoid timing the allocation of a result tuple and avoid dead code elimination on the sort itself, I've been using a benchmarking statement like:

t = time(@benchmark sort(perm) setup=(perm=(randperm($n)...,)) seconds=25) - time(@benchmark identity(perm) setup=(perm=(randperm($n)...,)) seconds=25)

What follows are the results if I seed the rng before each benchmark, and use the same number of samples (1000). To help with fairness, I also use the same offset (the subtracted benchmark I am using to avoid including the overhead for allocating the result tuple) for each immutable sort.

Unless anyone objects, I think it makes sense to let the compiler optimize the rev parameter (llvm can introduce its own branches if it wants i suppose) and go with the lt(by(a), by(b)) != rev form. I'll also add in the @inbounds annotation to the creation of the result tuple in the copying fallback case. Since the merge network generator is complicated (and no faster than the copying case) I'll just leave that out of this PR.

nlw0 · 2019-07-31T15:44:33Z

@peterahrens When you asked me about sorting tuples I thought of a sequence of tuples, and not tuples as a container, that's why I didn't understand you before. I am curious to see if this can be brought to the stdlib!

Benchmarking this kind of code is really tricky. In my experiments I created large arrays and sorted small chunks in-place. This makes it easier to benchmark, though it's a bit of an artificial thing. It might be great to come up with a benchmark closer to an application like you mentioned to me before, sorting values before a summation.

I made a PR for Combsort (#32696) where I also had to deal with the Ordering, and I'm not very sure of what I did there. Maybe I can take some inspiration from this patch? My code there isn't handling floats, btw, does this function work fine?

vtjnash · 2021-04-13T17:54:48Z

With the sorting implementation in SVectors, I'm leaning towards closing this, and recommending using one of the algorithms described in JuliaArrays/StaticArrays.jl#754 (comment), and suggest possibly also contributing this (https://github.com/peterahrens/TupleSort.jl) as another trait to that package (and/or to https://github.com/JuliaCollections/SortingAlgorithms.jl)

willow-ahrens added 2 commits July 26, 2019 16:32

sorting networks for tuples under length 32, default algorithm for la…

95b1cc1

…rger tuples.

Let's sort tuples!

26fce37

ararslan requested a review from JeffBezanson July 28, 2019 22:19

ararslan added the collections Data structures holding multiple items, e.g. sets label Jul 28, 2019

tweaks

9bf3185

tkf mentioned this pull request Mar 21, 2020

RFC: sorting for SVectors JuliaArrays/StaticArrays.jl#754

Merged

vtjnash closed this Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's sort tuples! #32710

Let's sort tuples! #32710

willow-ahrens commented Jul 27, 2019 •

edited

Loading

ericphanson commented Jul 27, 2019

dkarrasch commented Jul 28, 2019

willow-ahrens commented Jul 28, 2019

Jutho commented Jul 29, 2019

willow-ahrens commented Jul 29, 2019 •

edited

Loading

nlw0 commented Jul 31, 2019 •

edited

Loading

vtjnash commented Apr 13, 2021

Let's sort tuples! #32710

Let's sort tuples! #32710

Conversation

willow-ahrens commented Jul 27, 2019 • edited Loading

ericphanson commented Jul 27, 2019

dkarrasch commented Jul 28, 2019

willow-ahrens commented Jul 28, 2019

Jutho commented Jul 29, 2019

willow-ahrens commented Jul 29, 2019 • edited Loading

nlw0 commented Jul 31, 2019 • edited Loading

vtjnash commented Apr 13, 2021

willow-ahrens commented Jul 27, 2019 •

edited

Loading

willow-ahrens commented Jul 29, 2019 •

edited

Loading

nlw0 commented Jul 31, 2019 •

edited

Loading