In most application natural ordering is more and more common. But there is not one common definition of what natural order is. If you look at wikipedia you'll find a simple definition.
Natural sort order is an ordering of strings in alphabetical order, except that multi-digit numbers are ordered as a single character
It means k2 < k10. But a lot of possibilities aren't defined. So what is happening with negative numbers e.g. k-2 ? k-10 will it be represented as minus or as a dash.
Another definition from rosettacode.
- Ignore leading, trailing and multiple adjacent spaces
- Make all whitespace characters equivalent.
- Sorting without regard to case.
- Sorting numeric portions of strings in numeric order.
- ...
The questions are
- what java implementations are available
- what feature do they support
- how fast are they.
- berry - AlphaNumericStringComperator
- dacus - NaturalSorter
- devexed - NaturalOrderComparator
- friedrich - Strings had been original been posted here but this side isn't available anymore. But it is still archived
- koelle - AlphanumComparator
- padler - NaturalOrderComparator
- panther - java-nat-sort
To test what features are supported and if they are working well, I created a set of unit tests. The starting point of the tests are the work of Nathan Woltman he has done for his string-natural-compare. Also most of the implementations have a test sets. So I copied them and checked if this set do also work for the other implementations.
features | padler | dacus | devexed | keolle | friedrich | panther | berry |
---|---|---|---|---|---|---|---|
natural order of numbers | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
ignore decimal point | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |
local specific decimal point | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
handle leading zeros | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
large numbers | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ |
order of negativ numbers | smallest negative number first | smallest negative number first | largest negative number first | smallest negative number first | smallest negative number first | smallest negative number first | smallest negative number first |
case sensitive ASCII | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
case insensitive ASCII | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ | ❌ |
ignore leading and trailing whitespaces | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
support java.text.Collator | |||||||
Unicode support |
The problem with the implemenations are that they are mostly hardcoded and don't provide much configuration option. So if you need some specific natural order you may have to stick with one implementation. Or you even need to implement it by your self.
Dacus has some problems with the order in both cases case sensitive
and case insenstive
that is the reason why both are red.
The test is based on JMH. The test string have been created via EasyRandom with string length between 1 and 100.
The benchmark is using the comperator within Collection.sort
. The sort itself has O(n*log(n))
. This should be the major performance difference.
To execute the benchmark you first need to build the jar with mvn clean install -DskipTests
and then run java -jar target/naturalsort-1.0.0-SNAPSHOT
test size | padler | dacus | devexed | keolle | friedrich | panther | berry |
---|---|---|---|---|---|---|---|
50 | 322233,394 ± 23164,234 ops/s | 1669,788 ± 2144,115 ops/s | 65,350 ± 28,982 ops/s | 12639,356 ± 340,300 ops/s | 351708,520 ± 17678,003 ops/s | 379094,168 ± 28208,360 ops/s | 1539,858 ± 194,941 ops/s |
500 | 13363,359 ± 1086,943 ops/s | 92,728 ± 30,003 ops/s | 3,497 ± 0,694 ops/s | 720,988 ± 67,330 ops/s | 13814,737 ± 852,784 ops/s | 14819,162 ± 1340,082 ops/s | 84,747 ± 0,707 ops/s |
5000 | 785,413 ± 40,206 ops/s | 6,320 ± 3,476 ops/s | 0,246 ± 0,220 ops/s | 48,067 ± 3,274 ops/s | 861,733 ± 18,264 ops/s | 884,419 ± 29,256 ops/s | 5,951 ± 0,240 ops/s |
50000 | 50,611 ± 22,134 ops/s | 0,469 ± 0,026 ops/s | 0,018 ± 0,013 ops/s | 3,595 ± 0,135 ops/s | 54,059 ± 42,570 ops/s | 58,565 ± 4,214 ops/s | 0,443 ± 0,120 ops/s |
500000 | 2,543 ± 0,259 ops/s | 0,037 ± 0,007 ops/s | 0,002 ± 0,001 ops/s | 0,275 ± 0,004 ops/s | 2,930 ± 0,235 ops/s | 3,204 ± 0,229 ops/s | 0,035 ± 0,016 ops/s |
JMH Benchmark configuration
# JMH version: 1.21
# VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home/bin/java
# VM options: <none>
# Warmup: 2 iterations, 10 s each
# Measurement: 3 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time