Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented CombSort as an additional sorting algorithm in base #32696

Closed
wants to merge 3 commits into from

Conversation

nlw0
Copy link
Contributor

@nlw0 nlw0 commented Jul 26, 2019

Comb sort is a classic and much overlooked algorithm that is very suitable for vectorization. This PR implements it as an alternative to the other algorithms in Base (Insertion, Quick, Merge). It can achieve speed-ups of ~90% over Quicksort on an AVX2 machine.

One important detail of this implementation is that we switch to Insertion sort when the interval becomes 1.

This algorithm implementation was investigated before on ChipSort.jl, presented at JuliaCon2019. More information can be found on its references.

@JeffBezanson
Copy link
Member

Cool. Does LLVM manage to vectorize this? Any example performance numbers?

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 30, 2019

Cool. Does LLVM manage to vectorize this? Any example performance numbers?

Yes, it gets nicely vectorized. The method implemented here is the same as the blue curve on the experiments graph from my article. It is roughly 85% faster than the default quicksort, at least for Int32 and up to 1M elements.

Check out my presentation, more specifically slides 8 and 9 https://nlw0.github.io/chipsort-juliacon2019/index.html#/8 https://nlw0.github.io/chipsort-juliacon2019/index.html#/9

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 30, 2019

A simple test with this patch:

julia> using BenchmarkTools

julia> @btime sort!(a, alg=QuickSort) setup=(a = rand(UInt32, 2^20));
  70.771 ms (0 allocations: 0 bytes)

julia> @btime sort!(a, alg=CombSort) setup=(a = rand(UInt32, 2^20));
  35.446 ms (0 allocations: 0 bytes)

@nlw0 nlw0 mentioned this pull request Jul 31, 2019
@nlw0
Copy link
Contributor Author

nlw0 commented Jul 31, 2019

I noticed later this implementation is failing with floats. It seems to have something to do with the approach for handling Ordering, that I'm actually not very happy about. It would be nice to have some feedback about the solution before fixing it.

@StefanKarpinski
Copy link
Member

Would it be possible to post this code somewhere in case someone wants to pick it up?

@giordano
Copy link
Contributor

giordano commented Oct 5, 2019

Should be here: 91fc5a3

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 6, 2019

One last comment: it turns out RadixSort from SortingAlgorithms.jl can outperform this. I haven't figured out yet whether it also exploits SIMD somehow. Thus radix might be a better possible candidate for the default library, the saving grace from Combsort being its tiny code footprint.

This was closed by accident, by the way, I never had two PRs open before, sorry!

@oscardssmith
Copy link
Member

The biggest downside of radix sort are that it didn't work for all types.

@nlw0
Copy link
Contributor Author

nlw0 commented Jan 11, 2022

This was closed by accident, but I never reopened it. Updated with latest master branch.

@nlw0 nlw0 reopened this Jan 11, 2022
@nlw0
Copy link
Contributor Author

nlw0 commented Jan 11, 2022

Or should it be closed? The issue was just something weird happening with floats. It still might be of interest, it's a simple algorithm that performs well in some cases.

@oscardssmith
Copy link
Member

oscardssmith commented Jan 11, 2022

I'm in favor of merging if we can demonstrate performance improvements (preferably for types where radix sort doesn't work) and turn it on by default.

@oscardssmith oscardssmith added the triage This should be discussed on a triage call label Jan 11, 2022
@nlw0
Copy link
Contributor Author

nlw0 commented Jan 12, 2022

The biggest claim to fame here is greater speed than the standard library on Int32 arrays. Although, as you said, RadixSort tends to be the winner in this scenario. On the other hand, RadixSort seems to be complex and specific enough not to have been implemented here already.

What's compelling about this CombSort approach is that it is a comparison sort and has a simple implementation, while leading itself to automatic parallelization better than QuickSort, according to my experiments. This seems to make CombSort Pareto-optimal in a way, beating the standard QuickSort for Int32, while also beating RadixSort for generality.

I wasn't thinking in replacing the default, just offering an interesting alternative. Anyways, I'd gladly make more tests with other datatypes. Can I have some suggestions? I can think of random strings, and structures with increasing sizes and Int32 keys, or perhaps tuple keys. Anything else?

@JeffBezanson
Copy link
Member

Triage thinks this should go in SortingAlgorithms.jl, and we will consider moving radix sort to Base for performance.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants