Implements the CombSort algorithm #54

nlw0 · 2022-02-19T12:33:59Z

This implements the comb sort algorithm. The patch was first submitted to Julia core, but it was decided that SortingAlgorithms.jl would be a better place. JuliaLang/julia#32696

Please check previous threads for details and motivation. This algorithm was discussed in a 2019 JuliaCon presentation https://youtu.be/_bvb8X4DT90?t=402 . The main motivation to use comb sort is that the algorithm happens to lead itself very well to compiler optimizations, especially vectorization. This can be checked by running e.g. @code_llvm sort!(rand(Int32, 2^12), 1, 2^12, CombSort, Base.Order.Forward) and looking for an instruction such as icmp slt <8 x i32>.

Comb sort is a general, non-stable comparison sort that outperforms the standard quick/intro sort for 32-bit integers. It doesn't seem to outperform radix sort for that kind of element type, though. So it's not clear whether it only outperforms quick sort in the cases where radix sort is actually optimal. The motivation is that comb sort is a simple general-purpose algorithm that seems to be easily optimized by the compiler to exploit modern parallel architectures.

I'd gladly perform more benchmarks if this is desired, although it would be nice to hear specific ideas of the kind of input types and sizes we are interested in. As far as I know, none of the currently implemented algorithms had to be validated with such experiments before being merged. It would be great to hear some advice about moving forward with this contribution, if at all, since this peculiar algorithm seems to attract a high level of scrutiny, probably deserved.

All the tests right now seem to be heavily based on floating-point numbers, and here there's actually some challenges in the implementation. The core of the implementation is the function ltminmax which compares two values and returns an ordered pair using the min and max functions. This is perfect for integers and strings, but with floating-point things get weird, as usual. The results with NaNs right now are actually not even correct, although the test is passing (!). It would be great to have some advice about how to fix that, as well as how we might extend the tests.

I'm very glad to have studied this algorithm using Julia, I feel it's a great showcase for the language, and it seems to epitomize modern, parallel-focused computing. I'd love to hear suggestions about how we might highlight these ideas in this patch.

codecov-commenter · 2022-02-19T12:35:26Z

Codecov Report

Merging #54 (3da6412) into master (a17c80c) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
+ Coverage   96.40%   96.51%   +0.10%     
==========================================
  Files           1        1              
  Lines         334      344      +10     
==========================================
+ Hits          322      332      +10     
  Misses         12       12

Impacted Files	Coverage Δ
src/SortingAlgorithms.jl	`96.51% <100.00%> (+0.10%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

nalimilan · 2022-02-19T18:52:43Z

Thanks. Contrary to Base, I don't think systematic benchmarks are needed to include a new algorithm in this package. The point of SortingAlgorithms.jl is to provide a variety of algorithms. However, it should give correct results or throw an error for unsupported values or types. It's surprising that tests pass with NaN if you said they currently don't work. Do you have an example of a case that fails? Would throwing an error if you detect an NaN be OK?

It would be good to add tests for other types BTW (strings, custom type, missing...). This is a real lack for current algorithms but it's never too late to improve.

nlw0 · 2022-02-20T17:09:33Z

Thank you, looking forward to contributing this and perhaps other algorithms later.

I was perhaps a bit fast to judge, the implementation actually does seem to work:

julia> sort(randn_with_nans(11, 0.1), alg=CombSort)'
1×11 adjoint(::Vector{Float64}) with eltype Float64:
 -1.1997  -0.266208  -0.0248003  0.165739  0.731837  0.779865  0.801758  0.900917  1.8231  NaN  NaN

My confusion is because I was testing with direct calls to the 6-parameters method, and I guess I picked the wrong choice for the ordering parameter. I suppose I just don't understand exactly how that works. What happens is that if I call it with sort!(v, 1, length(v), CombSort, Base.Sort.Float.Right()) we end up with all NaNs. I believe I can fix it with a special function that treats NaN as the max, but I'm not sure if this is necessary or not. In fact, a direct call to other algorithms also cause problems. Maybe nan-handling should even be implemented through by?

My uncertainty was later reinforced because it seems the test checks for "issorted" instead of comparing the output to a vector sorted by another method, and if the output was all [NaN, NaN, NaN, ...], then it would pass. So I recommend the test should always be checking for the specific values in the output.

It all seems to be fine, though, I just don't understand the mechanics of the ordering etc, and using the algorithm with the higher-level sort function seems fine.

nalimilan · 2022-04-21T20:23:50Z

Sorry for the delay. I don't think you should be concerned with Base.Sort.Float.Right() handling NaN. AFAICT it's an internal ordering used only by fpsort! after it has moved all NaNs to the end of the vector so that they are not passed to the sorting algorithm at all. But indeed tests should ideally be stricter than just calling issorted.

nlw0 · 2022-08-25T13:39:57Z

Can somebody help me here? What are we missing to go ahead?

src/SortingAlgorithms.jl

nlw0 · 2022-08-27T12:13:38Z

Thanks for the review, I believe I have covered everything.

LilithHafner

I've been doing a bit of benchmarking, and this looks really good! It seems to be faster than any algorithm I know of for unstable sorting of primitives in default order of length about 10–1500. That's a very particular domain, but also a fairly common use case and a case where Julia currently struggles.

LilithHafner · 2022-08-27T13:20:12Z

src/SortingAlgorithms.jl

+ - H. Inoue, T. Moriyama, H. Komatsu and T. Nakatani, "AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors," 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007, pp. 189-198, doi: 10.1109/PACT.2007.4336211.
+ - Werneck, N. L., (2020). ChipSort: a SIMD and cache-aware sorting module. JuliaCon Proceedings, 1(1), 12, https://doi.org/10.21105/jcon.00012


Both of these works describe different and much larger sorting algorithms. CombSort (Albeit with a bubble finish) was introduced in a one-pager from 1980: Dobosiewicz, Wlodzimierz, "An efficient variation of bubble sort", Information Processing Letters, 11(1), 1980, pp. 5-6, https://doi.org/10.1016/0020-0190(80)90022-8.

We can add further references if you like, but please add them as suggestions in this case. It feels enough to me.

This implementation is the "default" method presented in ChipSort.jl, and the inspiration came mostly from AA-Sort. I'm not sure earlier papers discuss vectorization, and the early history of all these algorithms gets a little murky. I recommend following the citations in Section 2.4 of the ChipSort paper, which includes Knuth, and also this wiki which has a good recap. They cite Dobosiewicz there (and Knuth himself also did in a later edition of his book). https://code.google.com/archive/p/combsortcs2p-and-other-sorting-algorithms/wikis/CombSort.wiki

The algorithm in AA-Sort Figure 2 actually finishes with bubble sort as well. That's why I point out finishing with insertion might be the small contribution from the ChipSort paper. Although it's a pretty conventional idea, as explained in sec 2.4. What's still a mystery is whether this comb+insertion approach might be guaranteed to have n.log(n) complexity.

While I view Wlodzimierz's piece as a concise exposition of CombSort, I view the AA-Sort paper as a longer analysis of a set of modifications to CombSort. Does this PR implement the modifications the AA-Sort paper describes?

The second reference seems more appropriate, but IIUC the only part of that paper that describes this algorithm is the first half of section 2.4.

If you would like a more specific reference, it may not exist. This PR is a result of finding out, amidst the other investigations in ChipSort, that this simple algorithm can be well vectorized and offers a great performance.

Indeed, very little from the AA-sort paper is here. It was just the original inspiration, and I feel I can't just cite myself, although I wouldn't object to.

I think it would be reasonable to just cite yourself or to cite yourself first followed by AA-Sort

src/SortingAlgorithms.jl

nlw0 · 2022-08-27T14:56:55Z

I've been doing a bit of benchmarking, and this looks really good! It seems to be faster than any algorithm I know of for unstable sorting of primitives in default order of length about 10–1500. That's a very particular domain, but also a fairly common use case and a case where Julia currently struggles.

Cool! Are you talking about any cases where Radix cannot be applied? I'm not sure I've ever seen a case where Radix does not win, not even very specific examples. That's one unfortunate omission in the ChipSort paper, I didn't get to benchmark against Radix. Which I also understand is now finally going to be offered in Base as well.

One interesting technique for small inputs is sorting networks. ChipSort.jl has it, and I believe there are other packages offering it as well.

LilithHafner · 2022-08-27T15:58:25Z

Cool! Are you talking about any cases where Radix cannot be applied? I'm not sure I've ever seen a case where Radix does not win, not even very specific examples.

What radix sort were you comparing to? The one in Base is slower than CombSort for 700 Ints on this computer:

julia> @btime sort!(x; alg=CombSort) setup=(x=rand(Int, 700)) evals=1;
  10.110 μs (0 allocations: 0 bytes)

julia> @btime sort!(x) setup=(x=rand(Int, 700)) evals=1; # Adaptive sort dispatching to radix sort
  12.163 μs (3 allocations: 7.84 KiB)

julia> versioninfo()
Julia Version 1.9.0-DEV.1035
Commit 52f5dfe3e1* (2022-07-20 20:15 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.3.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.5 (ORCJIT, skylake)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/lib
  JULIA_PKG_PRECOMPILE_AUTO = 0

nlw0 · 2022-08-27T17:19:32Z

Interesting, I'm not familiar yet with what's been added to base. I think there's a package that specifically implements radix sort, and that's what I've tried in the past. Maybe it uses a few extra tricks. One of my arguments for having comb sort in base was that implementing radix is not as easy, so I'm curious to find out if the implementation there is outperformed.

I've never actually understood how radix can make use of any ILP, what's simple to understand with comb. It might be that some small detail is missing in base to enable vectorization. Or otherwise it might be just a case of tuning eg when to switch to insertion sort. I'll definitely make some experiments later now that you showed me that!

src/SortingAlgorithms.jl

LilithHafner · 2022-09-12T12:43:14Z

We should mention asymptotic runtime (c.f. JuliaLang/julia#46679 (comment))

LilithHafner · 2022-10-02T03:14:52Z

I think this is very close to ready. All it needs are a couple of documentation changes and a rebase/merge onto the latest master to make sure tests pass on nightly.

It's a neat algorithm that I'd like to see merged!

nlw0 · 2022-10-02T14:45:00Z

Sure thing, I hope to do it later today or tomorrow

…

On Sun, 2 Oct 2022, 05:15 Lilith Orion Hafner, ***@***.***> wrote: I think this is very close to ready. All it needs are a couple of documentation changes and a rebase/merge onto the latest master to make sure tests pass on nightly. — Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABHKEQWFAMVFF7WOP2EFVTWBD43NANCNFSM5O2OTAQA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Co-authored-by: Lilith Orion Hafner <[email protected]>

nlw0 · 2022-10-04T06:39:50Z

@LilithHafner I've changed the docstring and rebased, hope it's all fine now

LilithHafner · 2022-10-05T11:27:41Z

src/SortingAlgorithms.jl

+ - *in-place* in memory.
+ - *parallelizable* suitable for vectorization with SIMD instructions
+   because it performs many independent comparisons.
+ - *complexity* worst-case only proven to be better than quadratic, but not `n*log(n)`.


This algorithm has quadratic worst-case runtime.

julia> @time sort!(4^7*repeat(1:30, 4^7)); 0.027213 seconds (8 allocations: 11.258 MiB) julia> @time sort!(4^7*repeat(1:30, 4^7); alg=CombSort); 4.866824 seconds (4 allocations: 7.500 MiB)

Proof

Take an arbitrary k, let m = 4k, and let n = m*4^7. Consider the first 7 intervals for an input of length n: [n*(3/4)^i for i in 1:7] == [m*4^7*(3/4)^i for i in 1:7] == [m*4^(7-i)*3^i for i in 1:7]. Notice that each interval is divisible by m.

Now, construct a pathological input v = repeat(1:m, 4^7). This input has the property v[i] == v[i+*jm] for any intergers i and j which yield inbounds indices. Consequently, the first 7 passes cannot alter v at all.

Informal interlude: There are still a lot of low numbers near the end of the list, and the remaining passes will have a hard time moving them to the beginning because their intervals are fairly small.

Consider the elements 1:k that fall in the final quarter of v. There are k*4^7/4 = n/16 such elements. Each of them must end up in the first quarter of the list once sorted, so they must each travel a total of at least n/2 slots (in reality they must each travel more than this, but all I claim is a lower bound).

To recap, we have established n/16 elements that must travel at least n/2 slots, and that they do not travel at all in the first 7 passes. The remaining comb passes have intervals no greater than [n*(3/4)^i for i in 8:inf]. The furthest an elemental can move toward the start of the vector in a single pass is the interval size of that pass, so the furthest an element can move toward the start of the vector in all remaining passes combined is sum([n*(3/4)^i for i in 8:inf]) = n*(3/4)^8 / (1 - 3/4) = 4n*(3/4)^8 < 0.401n. Thus, after all the comb passes are compete, we will still have n/16 elements that have to move at least 0.099n slots toward the start of the vector. Insertion sort, which can only move one swap at a time will require 0.099n*n/4 > .024n^2 swaps to accomplish this. Therefore, the worst case runtime of this algorithm is Ω(n^2).

It is structurally impossible for this algorithm to take more than O(n^2) time, so we can conclude Θ(n^2) is a tight asymptotic bound on the worst case runtime of this implementation of combsort. (A similar analysis holds for any geometric interval distribution).

We can verify the math in this proof empirically:

code

function comb!(v) lo, hi = extrema(eachindex(v)) interval = (3 * (hi-lo+1)) >> 2 while interval > 1 for j in lo:hi-interval a, b = v[j], v[j+interval] v[j], v[j+interval] = b < a ? (b, a) : (a, b) end interval = (3 * interval) >> 2 end v end function count_insertion_sort!(v) count = 0 lo, hi = extrema(eachindex(v)) for i = lo+1:hi j = i x = v[i] while j > lo && x < v[j-1] count += 1 v[j] = v[j-1] j -= 1 end v[j] = x end count end K = 1:6 M = 4 .* K N = M .* 4^7 swaps = [count_insertion_sort!(comb!(repeat(1:m, 4^7))) for m in M] using Plots plot(N, swaps, label="actual swaps", xlabel="n", ylabel="swaps", legend=:topleft) plot!(N, .024N.^2, label="theoretical minimum")

Results

The proof conveniently provides us with a pathological input to test. So, even more empirically, we can simply measure runtime.

Code

# multiply by a large number ot avoid dispatch to counting sort make_vector(m) = 4^7*repeat(1:m, 4^7) x = 1:20 n = 4^7*x comb = [(x = make_vector(m); @elapsed(sort!(x; alg=CombSort))) for m in x] default = [(x = make_vector(m); @elapsed(sort!(x ))) for m in x] theory = .024n.^2 / 1.6e9 # 1.6 ghz clock speed plot(n, comb, label="comb sort", xlabel="n", ylabel="time (s)", legend=:topleft) plot!(n, default, label="default sort") plot!(n, theory, label="theoretical minimum")

Results

nlw0 · 2022-10-05T13:14:31Z

I was just quoting the result from the third reference, my (shallow) understanding is that they've proven a slightly better worst case than n2, but it probably relies on some special way of reducing the intervals or something like that...

…

On Wed, 5 Oct 2022, 13:27 Lilith Orion Hafner, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/SortingAlgorithms.jl <#54 (comment)> : > + CombSort + +Indicates that a sorting function should use the comb sort +algorithm. Comb sort traverses the collection multiple times +ordering pairs of elements with a given interval between them. +The interval decreases exponentially until it becomes 1, then +it switches to insertion sort on the whole input. + +Characteristics: + - *not stable* does not preserve the ordering of elements which + compare equal (e.g. "a" and "A" in a sort of letters which + ignores case). + - *in-place* in memory. + - *parallelizable* suitable for vectorization with SIMD instructions + because it performs many independent comparisons. + - *complexity* worst-case only proven to be better than quadratic, but not `n*log(n)`. This algorithm has quadratic worst-case runtime. julia> @time sort!(4^7*repeat(1:30, 4^7)); 0.027213 seconds (8 allocations: 11.258 MiB) julia> @time sort!(4^7*repeat(1:30, 4^7); alg=CombSort); 4.866824 seconds (4 allocations: 7.500 MiB) ------------------------------ Proof Take an arbitrary k, let m = 4k, and let n = m*4^7. Consider the first 7 intervals for an input of length n: [n*(3/4)^i for i in 1:7] == [m*4^7*(3/4)^i for i in 1:7] == [m*4^(7-i)*3^i for i in 1:7]. Notice that each interval is divisible by m. Now, construct a pathological input v = repeat(1:m, 4^7). This input has the property v[i] == v[i+*jm] for any intergers i and j which yield inbounds indices. Consequently, the first 7 passes cannot alter v at all. *Informal interlude: There are still a lot of low numbers near the end of the list, and the remaining passes will have a hard time moving them to the beginning because their intervals are fairly small.* Consider the elements 1:k that fall in the final quarter of v. There are k*4^7/4 = n/16 such elements. Each of them must end up in the first quarter of the list once sorted, so they must each travel a total of at least n/2 slots (in reality they must each travel more than this, but all I claim is a lower bound). To recap, we have established n/16 elements that must travel at least n/2 slots, and that they do not travel at all in the first 7 passes. The remaining comb passes have intervals no greater than [n*(3/4)^i for i in 8:inf]. The furthest an elemental can move toward the start of the vector in a single pass is the interval size of that pass, so the furthest an element can move toward the start of the vector in all remaining passes combined is sum([n*(3/4)^i for i in 8:inf]) = n*(3/4)^8 / (1 - 3/4) = 4n*(3/4)^8 < 0.401n. Thus, after all the comb passes are compete, we will still have n/16 elements that have to move at least 0.099n slots toward the start of the vector. Insertion sort, which can only move one swap at a time will require 0.099n*n/4 > .024n^2 swaps to accomplish this. Therefore, the worst case runtime of this algorithm is Ω(n^2). It is structurally impossible for this algorithm to take more than O(n^2) time, so we can conclude Θ(n^2) is a tight asymptotic bound on the worst case runtime of this implementation of combsort. (A similar analysis holds for any geometric interval distribution). ------------------------------ We can verify the math in this proof empirically: code function comb!(v) lo, hi = extrema(eachindex(v)) interval = (3 * (hi-lo+1)) >> 2 while interval > 1 for j in lo:hi-interval a, b = v[j], v[j+interval] v[j], v[j+interval] = b < a ? (b, a) : (a, b) end interval = (3 * interval) >> 2 end v end function count_insertion_sort!(v) count = 0 lo, hi = extrema(eachindex(v)) for i = lo+1:hi j = i x = v[i] while j > lo && x < v[j-1] count += 1 v[j] = v[j-1] j -= 1 end v[j] = x end count end K = 1:6 M = 4 .* K N = M .* 4^7 swaps = [count_insertion_sort!(comb!(repeat(1:m, 4^7))) for m in M] using Plots plot(N, swaps, label="actual swaps", xlabel="n", ylabel="swaps", legend=:topleft) plot!(N, .024N.^2, label="theoretical minimum") Results [image: Screen Shot 2022-10-05 at 5 02 03 PM] <https://user-images.githubusercontent.com/60898866/194047014-8a96fddb-60da-4f81-accc-26f05375221d.png> ------------------------------ The proof conveniently provides us with a pathological input to test. So, even more empirically, we can simply measure runtime. Code # multiply by a large number ot avoid dispatch to counting sort make_vector(m) = 4^7*repeat(1:m, 4^7) x = 1:20 n = 4^7*x comb = [(x = make_vector(m); @Elapsed(sort!(x; alg=CombSort))) for m in x] default = [(x = make_vector(m); @Elapsed(sort!(x ))) for m in x] theory = .024n.^2 / 1.6e9 # 1.6 ghz clock speed plot(n, comb, label="comb sort", xlabel="n", ylabel="time (s)", legend=:topleft) plot!(n, default, label="default sort") plot!(n, theory, label="theoretical minimum") Results [image: Screen Shot 2022-10-05 at 5 18 43 PM] <https://user-images.githubusercontent.com/60898866/194048696-d4f0adc4-7ec7-4cbc-8238-2197618731c9.png> — Reply to this email directly, view it on GitHub <#54 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABHKEW56FQQ7SBVG2RIR7DWBVQ3TANCNFSM5O2OTAQA> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

LilithHafner · 2022-10-06T04:53:24Z

I agree that reference must have been talking about nongeometric gap sequences if it found subquadratic runtimes (i.e. some special way of reducing the intervals). I suspect that, like shell sort, the ideal gap sequence is hard to compute.

The algorithm as written (with a geometric gap sequence) also has Θ(n^2) average case runtime.

The proof is similar to the worst case proof, but gives a much lower constant factor.

Take arbitrary integer m ≥ 5. Our input is v = rand(m*4^7). Now, consider the views [@view(v[i:m:end]) for i in 1:m]. Because the first 7 passes have intervals with multiples of m, they cannot swap elements from one view to another. At best, these first 7 passes sort each view independently.

Now consider which elements fall in the first quartile. Obviously, one quarter do. Less obviously, consider how many elements of a given view fall above the median. Specifically, what are the odds that more than three quarters of a view falls in the first quartile? This is not an easy question to answer precisely. Note that the answer depends on m because if a view consists of very large elements, that will push up the median and as m increases, this effect is lessened. When m is 1, 2, or 3, the odds are 0, and as m increases, the odds increase monotonically. Let k be the probability when m = 5. We now know that the probability for all m ≥ 5 is at least k.

Consider the on average mk views which have more than 3/4 of their elements in the first quadrant. In passes 8:end, those elements can move at most .401*m*4^7 slots toward the beginning, but there are .049*4^7 elements that must be more than (.401+.049)*m*4^7 slots out of place in each view, even if the view is fully sorted by the first 7 passes. After the entire comb process is compete, this leaves .049*4^7*m*k elements that are .049*m*4^7 slots out of place for a minimum runtime of the insertion sort pass of .049^2*(4^7*m)^2*k ≈ .0024k*n^2. Thus comb sort with this (or any) geometric gap sequence has average case runtime of Ω(n^2), and structurally cannot be worse, so is Θ(n^2). This completes the answer to the question Dobosiewicz was unable to answer and posed in his initial publication (pdf attached) of the algorithm Best case: Θ(n log n) worst case: Θ(n^2) average runtime: Θ(n^2).

But before we write off this algorithm as quadratic and suitable only for small vectors, we should compute k. Statistical formulas I don't know off the top of my head would tell us that the odds approach something like error_function(4^7/sqrt(4^7)), but we can also compute this exactly for m=5 which will give us a precise lower bound for all m ≥ 5 which is what we seek. First, we compute a denominator: how many ways are there to choose 5*4^7*3/4 elements above the first quartile and 5*4^7*1/4 elements below the first quartile? binomial(5*4^7, 5*4^6). Then, a numerator: how many ways are there to choose those elements such that at least 4^7*3/4of the elements in the first view are in the first quantile? If i elements from the first quantile are in the first view, then that leaves 5*4^7*1/4-i elements in the first quantile for the remaining 4 views. The number of ways to choose all these elements is binomial(4^7, i)*binomial(4*4^7, 5*4^7*1/4-i). We can add these up for i > 4^7*3/4 with sum(binomial(big(4^7 [combsort.pdf](https://github.com/JuliaCollections/SortingAlgorithms.jl/files/9721279/combsort.pdf) ), i)*binomial(big(4*4^7), 5*4^6-i) for i in 4^6*3:4^7) ≈ 2.75e+14720. Dividing this by our denominator yields k ≈ 3.2e-5284, a very low constant factor.

This is a proof that this algorithm has quadratic asymptotic runtime, but due to a very low constant factor, the proof is empirically vacuous.

src/SortingAlgorithms.jl

Co-authored-by: Lilith Orion Hafner <[email protected]>

nlw0 · 2022-10-08T07:46:29Z

@LilithHafner that's awesome, this algorithm really seems to require some different ways of thinking in order to analyze it, not just figuring out the "mechanics"...

I can't see any attachments, is the article you referred to somewhere on-line?

Regarding the impact of the interval choices, I imagine it would be nice to find a way to ensure we don't stick to a partition of views like you describe. I also imagine the main issue is whether we can guarantee that once we reach interval 1, all the values are at most a certain distance d to their correct position, what would make the final insertion sort step linear. Is this correct? Or perhaps there could be an estimate about the probability distribution of the distances at that stage.

I've also been thinking about this algorithm compared to sorting networks. The geometric interval decay would serve as a kind of heuristic in the design of the complete sorting network for the full input. Knowing that there must be "optimal", n log n sorting networks for any input size, the remaining question would be what modifications would we need to perform to the original network in order to implement an optimal network? Could there really be a systematic limitation in this approach that puts it decidedly outside the set of optimal sorting networks? Of course insertion sort is not a sorting network, this is just how I've been thinking about this lately.

nlw0 · 2022-10-08T08:48:51Z

I made an experiment here trying to understand what the "comb" passes do to the data, and what's the effect of the partitioning. I ran the algorithm without the final insertion sort on random permutations of 1:Nlen, with Nlen=10007 (a prime number) and Nlen=2^13, on a sample of 1001 inputs.

I'm plotting here statistics of what I'm calling the "error", which is the vector minus 1:Nlen, or the distance from the value to where it should be in the sorted array.

The general impression I have is that with the partition, it seems we actually get a hard limit on the error, although the distribution is broader. With the prime input, the distribution is more concentrated, but there can be a few strong peaks. So without the partition we can remain with a few values way far from where they should be, what I believe are called "turtles" in the context of bubble sort. Other than that, values tend to be closer to where they should in general. Anyways, there seems to be some interesting compromise between the two cases. Partitioning leaves us further from the desired position, but seems to guarantee a maximum error.

nlw0 · 2022-10-08T08:55:10Z

OK there seems to be no guarantee for N^13, actually!...

Here are just the general error ecdf from both cases

nlw0 · 2022-10-08T09:50:51Z

With the Mersenne prime 2^13-1 the differences to 2^13 are less pronounced, so the length might actually have been a larger factor here than the partitioning of the input.

This plot here highlights what the distribution of the errors looks like. 40% of the numbers are in the correct place after the "combing", and 85% within 1 step.

A histogram with logarithmic scale, perhaps the tails are exponential?... If that was the case, what would it imply to the complexity of insertion sort?

LilithHafner · 2022-10-08T13:53:24Z

I can't see any attachments, is the article you referred to somewhere on-line?

Sorry about that:

combsort.pdf

https://pdfslide.net/download/link/an-efficient-variation-of-bubble-sort

LilithHafner

Your exploration of the effect of the comb pass is interesting. It seems that the theoretical problems don't really come up in random input of reasonable sizes. That's good!

If the tails were exponential (i.e. odds of an element being x places out of place is p^x for some p < 1) then that would imply that the insertion pass will run in linear time. Empirically, that seems to hold on the data you tested.

I think working on better gap sequences and/or the theory or empirical benchmarks to back them is a worthwhile pursuit if you are interested, but I also think it is a long pursuit and would prefer to merge this first, and then improve the gap sequence later, if that is okay with you.

src/SortingAlgorithms.jl

Co-authored-by: Lilith Orion Hafner <[email protected]>

nlw0 · 2022-10-09T08:50:07Z

Sure, let's merge this.

If the standand deviation of that exponential distribution is linear with the input size, wouldn't that make the final insertion sort quadratic?

I think a variation of this algorithm that offers n log n worst-case will probably require some big insight, there's some structural detail missing. And like I said, maybe sorting networks will offer the inspiration. In fact, maybe the best step forward trying to leverage the good parallelism we get from this code might actually be to implement a generic sorting network method such as https://en.wikipedia.org/wiki/Bitonic_sorter

LilithHafner · 2022-10-09T10:18:40Z

If the standand deviation of that exponential distribution is linear with the input size, wouldn't that make the final insertion sort quadratic?

I was assuming that the coefficient for the geometric distribution was constant with input size, if it scales linearly, then that would indeed be quadratic.

nlw0 mentioned this pull request Mar 2, 2022

Implement SmoothSort #55

Open

LilithHafner reviewed Aug 25, 2022

View reviewed changes

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

LilithHafner mentioned this pull request Aug 25, 2022

Better testing #58

Open

nalimilan reviewed Aug 25, 2022

View reviewed changes

LilithHafner reviewed Aug 27, 2022

View reviewed changes

LilithHafner reviewed Sep 4, 2022

View reviewed changes

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

nlw0 and others added 12 commits October 4, 2022 08:37

Implements the CombSort algorithm

b7d344b

Update src/SortingAlgorithms.jl

ab91a3c

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update src/SortingAlgorithms.jl

4afd232

Co-authored-by: Milan Bouchet-Valat <[email protected]>

added reference to AA-Sort

1114aff

Update src/SortingAlgorithms.jl

5a44bf3

Co-authored-by: Milan Bouchet-Valat <[email protected]>

reference to SIMD in the docs

6f2b897

LilithHafner suggestion for the comparison used in combsort

201f6d3

juliacon reference

af75499

expanding the auxiliary method ltminmax since it's too simple now

c743043

Update src/SortingAlgorithms.jl

3d05664

Co-authored-by: Lilith Orion Hafner <[email protected]>

Update src/SortingAlgorithms.jl

b4d21d1

Co-authored-by: Lilith Orion Hafner <[email protected]>

references changes, plus mentioning complexity

8023013

nlw0 force-pushed the nic/combsort branch from d464d55 to 8023013 Compare October 4, 2022 06:38

LilithHafner reviewed Oct 5, 2022

View reviewed changes

LilithHafner reviewed Oct 6, 2022

View reviewed changes

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved

nlw0 and others added 2 commits October 8, 2022 09:21

Update src/SortingAlgorithms.jl

135a653

Co-authored-by: Lilith Orion Hafner <[email protected]>

Update src/SortingAlgorithms.jl

dbc0e03

Co-authored-by: Lilith Orion Hafner <[email protected]>

LilithHafner reviewed Oct 8, 2022

View reviewed changes

src/SortingAlgorithms.jl Show resolved Hide resolved

Update src/SortingAlgorithms.jl

3da6412

Co-authored-by: Lilith Orion Hafner <[email protected]>

LilithHafner merged commit 80c14f5 into JuliaCollections:master Oct 9, 2022

nlw0 mentioned this pull request Oct 11, 2022

Bitonic mergesort #62

Open

LilithHafner mentioned this pull request Nov 11, 2022

Tag 1.1.0 #64

Merged

LilithHafner added the new algorithm label Dec 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements the CombSort algorithm #54

Implements the CombSort algorithm #54

nlw0 commented Feb 19, 2022

codecov-commenter commented Feb 19, 2022 •

edited

Loading

nalimilan commented Feb 19, 2022

nlw0 commented Feb 20, 2022

nalimilan commented Apr 21, 2022

nlw0 commented Aug 25, 2022

nlw0 commented Aug 27, 2022

LilithHafner left a comment

LilithHafner Aug 27, 2022

nlw0 Aug 27, 2022

LilithHafner Aug 27, 2022

nlw0 Aug 27, 2022 •

edited

Loading

LilithHafner Oct 2, 2022

nlw0 commented Aug 27, 2022

LilithHafner commented Aug 27, 2022

nlw0 commented Aug 27, 2022

LilithHafner commented Sep 12, 2022

LilithHafner commented Oct 2, 2022 •

edited

Loading

nlw0 commented Oct 2, 2022 via email

nlw0 commented Oct 4, 2022

LilithHafner Oct 5, 2022

nlw0 commented Oct 5, 2022 via email

LilithHafner commented Oct 6, 2022

nlw0 commented Oct 8, 2022

nlw0 commented Oct 8, 2022

nlw0 commented Oct 8, 2022

nlw0 commented Oct 8, 2022

LilithHafner commented Oct 8, 2022

LilithHafner left a comment

nlw0 commented Oct 9, 2022

LilithHafner commented Oct 9, 2022

		- H. Inoue, T. Moriyama, H. Komatsu and T. Nakatani, "AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors," 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007, pp. 189-198, doi: 10.1109/PACT.2007.4336211.
		- Werneck, N. L., (2020). ChipSort: a SIMD and cache-aware sorting module. JuliaCon Proceedings, 1(1), 12, https://doi.org/10.21105/jcon.00012

Implements the CombSort algorithm #54

Implements the CombSort algorithm #54

Conversation

nlw0 commented Feb 19, 2022

codecov-commenter commented Feb 19, 2022 • edited Loading

Codecov Report

nalimilan commented Feb 19, 2022

nlw0 commented Feb 20, 2022

nalimilan commented Apr 21, 2022

nlw0 commented Aug 25, 2022

nlw0 commented Aug 27, 2022

LilithHafner left a comment

Choose a reason for hiding this comment

LilithHafner Aug 27, 2022

Choose a reason for hiding this comment

nlw0 Aug 27, 2022

Choose a reason for hiding this comment

LilithHafner Aug 27, 2022

Choose a reason for hiding this comment

nlw0 Aug 27, 2022 • edited Loading

Choose a reason for hiding this comment

LilithHafner Oct 2, 2022

Choose a reason for hiding this comment

nlw0 commented Aug 27, 2022

LilithHafner commented Aug 27, 2022

nlw0 commented Aug 27, 2022

LilithHafner commented Sep 12, 2022

LilithHafner commented Oct 2, 2022 • edited Loading

nlw0 commented Oct 2, 2022 via email

nlw0 commented Oct 4, 2022

LilithHafner Oct 5, 2022

Choose a reason for hiding this comment

nlw0 commented Oct 5, 2022 via email

LilithHafner commented Oct 6, 2022

nlw0 commented Oct 8, 2022

nlw0 commented Oct 8, 2022

nlw0 commented Oct 8, 2022

nlw0 commented Oct 8, 2022

LilithHafner commented Oct 8, 2022

LilithHafner left a comment

Choose a reason for hiding this comment

nlw0 commented Oct 9, 2022

LilithHafner commented Oct 9, 2022

codecov-commenter commented Feb 19, 2022 •

edited

Loading

nlw0 Aug 27, 2022 •

edited

Loading

LilithHafner commented Oct 2, 2022 •

edited

Loading