-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's sort tuples! #32710
Let's sort tuples! #32710
Conversation
Related: #31818 |
There is tuple sorting functionality in https://github.com/Jutho/TupleTools.jl. Would be interesting to include in the comparison above. @Jutho |
I'm not really sure what the fallback behavior should look like. It appears that the spirit of tuple functions is to return some recursive, "probably inlineable" implementation, but it is difficult to write a performant implementation in this style. Because I am procrastinating on something else, I ended up writing a bitonic merge network generator to recursively stitch together the smaller best-known networks. Note that the merge network generator creates about A hypothesis for the performance of |
While I am not doubting your timings, and the sort in TupleTools.jl was a very basic implementation as I am no expert in sorting algorithms, the benchmark may be more fair if you generate a list of permutations beforehand and use exactly the same permutations to be sorted for each of the algorithms. |
I agree that would be more fair. If it helps, running the benchmarks multiple times gives me the same results. Benchmarking these sorts has been an adventure. In order to avoid timing the allocation of a result tuple and avoid dead code elimination on the sort itself, I've been using a benchmarking statement like:
What follows are the results if I seed the rng before each benchmark, and use the same number of samples (1000). To help with fairness, I also use the same offset (the subtracted benchmark I am using to avoid including the overhead for allocating the result tuple) for each immutable sort. Unless anyone objects, I think it makes sense to let the compiler optimize the |
@peterahrens When you asked me about sorting tuples I thought of a sequence of tuples, and not tuples as a container, that's why I didn't understand you before. I am curious to see if this can be brought to the stdlib! Benchmarking this kind of code is really tricky. In my experiments I created large arrays and sorted small chunks in-place. This makes it easier to benchmark, though it's a bit of an artificial thing. It might be great to come up with a benchmark closer to an application like you mentioned to me before, sorting values before a summation. I made a PR for Combsort (#32696) where I also had to deal with the |
With the sorting implementation in SVectors, I'm leaning towards closing this, and recommending using one of the algorithms described in JuliaArrays/StaticArrays.jl#754 (comment), and suggest possibly also contributing this (https://github.com/peterahrens/TupleSort.jl) as another trait to that package (and/or to https://github.com/JuliaCollections/SortingAlgorithms.jl) |
Use sorting networks for sorting tuples under length 32, and provide a fallback for large tuples which copies to a mutable intermediate. Provide an immutable
sort
function which returns a tuple for tuple inputs. This PR was pair programmed at the JuliaCon Hackathon with @lcw.Some related packages:
ChipSort.jl by @nlw0
SortingNetworks.jl by @JeffreySarnoff
Here's a plot which compares the performance of the new sort on tuples (
newsort(::Tuple{Vararg{Int}})
), copying to array and copying back (copysort(::Tuple{Vararg{Int}})
) and, for more comparison, a call to the old standby (sort!(::Vector{Int})
) on a heap-allocated array of the same length. The average of calling@btime
on each of 10 random integer permutations are displayed. We subtract the time it takes to allocate the result (a benchmark of theidentity
function with the same setup function). It is surprisingly hard to benchmark the sort function on tuples because the compiler is eager to optimize it away. The script which made the plot lives in TupleSort.jlI hope that this is an acceptable way to overload
sort
. It's unfortunate that there is noAlgorithm
trait to overload for non-allocating sort. I avoided putting this code inbase/sort.jl
to keep it out of the early code loading stages. It would be great if there were some way to avoid a heap allocation for the big tuple fallback.