-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uutils' factor is much slower than GNU's coreutils factor #1456
Comments
Yikes, that is pretty bad. I'll look into it. |
On Mon, 30 Mar 2020 05:02:04 -0700 Alex Lyon ***@***.***> wrote:
Yikes, that is pretty bad. I'll look into it.
Thanks!
…--
Shlomi Fish https://www.shlomifish.org/
Free (Creative Commons) Music Downloads, Reviews and more - http://jamendo.com/
The only reason some jokes never die is because Chuck Norris is not interested
in killing them.
— http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/
Please reply to list if it's a mailing list post - http://shlom.in/reply .
|
You might want to try again now. Performance is ~16-17 times worse than GNU right now (at least for me). |
When #1547 is merged, performance will be ~6-7 times worse (as measured on my desktop). |
@Arcterus : thanks! The performance looks much better on uutils' git master HEAD:
|
@shlomif You are welcome :) I left the bug open through my performance-related PRs to |
Another issue with the current factor implementation is that it only supports numbers up to 264 - 1. GNU factor supports numbers up to 2127 - 1 if compiled without the GNU Multiple Precision (GMP) library or arbitrary-precision numbers if compiled with it. This can be tested by running this command: factor 9223372036854775807 18446744073709551615 170141183460469231731687303715884105727 340282366920938463463374607431768211455 which is GNU factor without GMP produces:
GNU factor with GMP produces:
and uutils' factor produces:
This was first documented in #201. |
@nbraud Thanks for your work improving the performance! I am getting incorrect results for some numbers, for example:
You can see a bunch by running this in Bash: diff <(seq 2 $(( 10 ** 8 )) | factor) <(seq 2 $(( 10 ** 8 )) | uu-factor) |
If you have time, would you mind tracking down which commit introduced the problem? If not, I’ll find it later today or tomorrow.
… 2020/06/22 午前7:13、Teal Dulcet ***@***.***>のメール:
@nbraud Thanks for your work improving the performance! I am getting incorrect results for some numbers, for example:
$ factor 10425511
10425511: 2441 4271
$ uu-factor 10425511
10425511: 10425511
You can see a bunch by running this in Bash:
diff <(seq 2 $(( 10 ** 8 )) | factor) <(seq 2 $(( 10 ** 8 )) | uu-factor)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'm aware, and it's on my TODO-list, but I do not believe it makes much sense to go implement support for larger numbers, when the current implementation will be far too slow to meaningfully use there. Please open an issue about it, though, as it's definitely worth tracking :) |
FYI, @Arcterus, I found the bug and opened a new PR. (Update: fixed and merged) |
The merged commit, effb94b, (thanks primarily to @nbraud) moves the needle on this code's performance from about 1/200th to about 1/3rd the speed of GNU There are obviously more performance increases that can be obtained by optimizing the algorithms, but the effort seems exponentially harder for very small possible improvements. If someone want to champion massaging this into a world record breaker, I'm all for it. But, otherwise, I'm happy with this code state and performance and think this is a good place to stop. @shlomif , your comments/experience/thoughts? @uutils/members ? |
Hi @rivy ! Thanks for the heads up. The performance ratio is somewhat worse here:
Anyway, I am fine with closing this ticket because my interest in uutils is primarily academic. A 4.5 ratio is much more acceptable than a 200x one, so good job - and thanks. |
I can confirm @shlomif results. For numbers less than 104, it is about 2.4× slower, for 105 about 3.5× slower, for 106 about 4.5× slower, for 107 about 4.3× slower and for 108 about 3.2× slower. For example, here is a benchmark of the system GNU factor (
(This is similar to @rivy's benchmarks, but using my Bash port of hyperfine, which outputs more info.) However, if you go backwards 106 from 264, it is 5.8× slower:
Since it seems to be slower for larger numbers, it follows that the ratio will be even worse once #1559 is fixed. While the performance has definitely improved a lot, I think this issue should at least be left open until it is no more than around two times slower in all cases, particularly after arbitrary-precision numbers are supported. |
I'm happy to leave this an an open issue to encourage more refinement, with a plan to revisit next year. PR's are welcome! |
Yes, that's a large part of why I didn't bother implementing support for arbitrarily-large numbers yet, since it would be unusably slow anyhow.
I don't have strong opinions about what the performance threshold should be before closing this, but I have some branches implementing some better algorithms etc. so there's a clear direction forward for closing this. Unfortunately, I was pretty broken for the last half-year and didn't have a workstation to run builds and benchmarks on (so far it's all been on my laptop, but it's very slow and has thermal issues that make precise benchmarking mostly impossible) but things should hopefully be looking up this year. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This C++ library allegedly has better performance than GNU factor and it does have good documentation on the algorithms the author used to achieve that (both on the README and in the code), so it may be a good reference for potential improvements here: https://github.com/hurchalla/factoring. It also supports up to 128-bit numbers. |
FYI, if you want to make up a 4x performance improvement, doing so is pretty easy here as https://en.algorithmica.org/hpc/algorithms/factorization/ has a good overview for fast factoring of relatively small numbers in Rust. The key things missing here is Pollard-Brent Algorithm which lets you compute the GCD much less often and should yield a roughly 5x speedup. (you could also improve the For performance on >64 bits, you probably would need ecm if you want top of the line performance, but that gets complicated fairly quickly. |
Yes, that looks like a good resource, although note that his example code is in C++ not Rust. While uutils should definitely implement the Pollard-Brent rho algorithm as GNU has, I am not sure it would lead to a 5× speed up. I ran his factor code and it was orders of magnitude slower than GNU factor, although he does not have a complete implementation of all his methods combined, so it is hard to directly compare the performance. It looks like uutils already uses the faster binary GCD algorithm. There is more information about all the algorithms used by GNU factor here: https://www.maizure.org/projects/decoded-gnu-coreutils/factor.html. |
Since it has now been almost 3 years, I thought I should update the benchmarks from #1456 (comment) and include more programs for comparison. For integers less than 104, it is now about 14.8× slower, for 105 about 7.6× slower, for 106 about 6.0× slower, for 107 about 5.7× slower and for 108 about 4.2× slower:
Overall, uutils' factor seems to have gotten significantly slower over the past 3 years. Considering that GNU factor has largely not changed in over 10 years (since the rewrite in 2012) and uutils' factor has not changed in two years (since @nbraud left), my best guess is that this is due to compiler improvements that have benefited GNU factor more than uutils' factor. If you go backwards 105 from 263, it is now 5.2× slower than GNU factor, but 20.9× slower than that EPR Factoring Library I mentioned above:
Due to #1559, uutils factor of course cannot currently test numbers over 64-bits. However, if I continue to compare GNU factor to the EPR Factoring Library, it is 3.5× faster for 64-bit numbers:
28.8× faster for 96-bit numbers:
and a whopping 334× faster than GNU factor for 127-bit numbers:
Considering that GNU factor seems to be around 5 times faster than uutils' factor, if this library is over 300 times faster than GNU factor, that means it would be at least 1,500 times faster than uutils' factor for 128-bit numbers. I also found that for some numbers with large 64-bit factors it is an astonishing almost 2,000 times faster than GNU factor, which mean it would be at least 10,000 times faster than uutils' factor! 🤯 Note that this library is slower than both GNU factor and uutils' factor for smaller numbers under 32-bits as can be seen from the first benchmark above, but I filed an issue about this: hurchalla/factoring#1. For reference, the above benchmarks include the system GNU factor ( I tested several other programs and libraries, but nothing else seemed to have competitive performance in this range. The algorithms used by the yafu program (Yet Another Factorization Utility) may be useful for numbers above 64-bits (up to 160 digits or around 512-bits), but it was hard to benchmark, as it has extremely verbose output which is informative (similar to the undocumented |
Nice investigation. How did you compile the rust coreutils ? |
Thanks. This is the main command I used: |
ok, so, it should be ok. I am surprised that we see such a big difference. Compilers haven't improved that much (and clang and rust are both based on LLVM). thanks |
I complied GNU coreutils with the default GCC and uutils of course with the default rustc/LLVM. I suppose I could compile GNU coreutils with Clang/LLVM as well, but I figured it would be more optimized for their own compiler/toolchain. For reference, I used GCC 11.4, Clang 14.0 and cargo/rustc 1.72. I no longer has access to the Intel Xeon E3 server I used in 2020 for the benchmarking, so I used an Intel Core i7-9750 system this time. It is probably faster than the old system (for single threaded programs at least), but I would not think this would make any difference in the GNU vs uutils performance ratio. Anyway, maybe someone else should build GNU and uutils coreutils themselves and then run hyperfine or my Benchmarking Tool to see if they can reproduce the results. |
You can try to incorporate machine-factor to speed up factorisation for 128-bit integers. It seems to be about 3 times faster than GNU factor for 128-bit semiprimes and a full order of magnitude faster than num-prime. It would take some work to integrate it however. |
After building uutils using
cargo build --release
on Fedora 32 x86-64:A ~200 times performance loss is very bad.
The text was updated successfully, but these errors were encountered: