In Nov 2018, I needed to figure out what is the best sort algorithm on small datasets (almost sorted for graphics rendering) so I started my quest gathering good Java Sorting implementations ... reading papers & started experimenting my own benchmarks.
As I found this gem (https://github.com/sebawild/nearly-optimal-mergesort-code) the source code of Sebastian Wild 2018 paper, I started hacking the code to make its benchmark for fair (more reproductible) tests and I fixed several Sorter implementations to use pre-allocation (no GC overhead).
Thanks to Vladimir Yaroslavskiy, I added its DualPivotQuickSort 2011, 2018 & his BentleyBasher implementation, from Tageer Valev his adaptive RadixSort ...
I made lots of improvements on the BentleyBasher
All code is free & open source, under MIT or GPL2 license.
TODO: complete history & motivations
- Use a robust test methodology to test Java Sorting Implementations (working, tested) with all known distributions (random, sawtooth, dithered, ...) as recent Sorting implementations use polymorphism / adaptive strategy to better sort some types of data sets (sorted & reversed runs, random ...)
- Include best Sort implementations (based on int[] arrays only, sorry) and also using 2 int[] arrays (data + indices)
- Provide a complete Test suite (basher, stats, & analysis) to reproduce reliable experiments and allow optimisation (tuning every details of a particular instance) in a faire manner (reduce OS & JVM biases) to obtain a fair comparison
See in the results folder to last data & comparison stats
Nice Plots will be coming next...
I derived my work from this fabulous repository on github, that provides source code (MIT license) of several sorting algorithms (merge sorts, like TD/BU, TimSort & variants, Peek & Power Sort).
Here is the original 'README' of the master repository:
Code for experiments with nearly optimally adaptive mergesort variants peeksort and powersort.
To reproduce the running time study from the paper, execute
ant package
./paper-experiments.sh
The build requires a recent JDK 8, Oracle's version is recommended.
Make sure to use the paper release:
This produces several files in the current directory.
- The *.out files show the progress made in the individual runs and contain debug output from JVM's just-in-time compiler. It can be used to check that no massive deoptimization steps happened during the timed experiments. (Endless output during the warmup phase and occasional printed lines during timed runs are normal.)
- The *.csv files contain one line per executed sort and report the individual running time. These files were used in the paper to compute average and standard deviations of running times.
To run harness tests for correctness of the sorting methods, run
ant test