-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks in readme #4
Comments
Thanks for the interest @RagnarGrootKoerkamp. We're in the process right now of building a benchmarking system that will evaluate sufr, libdivsufsort, libsais, and others. We don't want to muddy this repository with that evaluation code, so we'll create a 2nd repo (sufr-benchmark) and point to it when it's complete. We'll also get to the other suggestion (describing alternative methods) in an soon-to-appear manuscript. The simplest description of how this method differs is that it follows the core ideas of CaPS-SA (merge sort with help from on-the-fly LCP array construction), then introduces ideas to dramatically reduce RAM requirement during construction, enables creation of a spaced seed SA, and provides search functions. More details to come. (Fixed the 1st link in the README - thanks) Finally: in a bit of synchronicity, we've just been discussing your recent static search tree post - it's a really lovely exploration of the idea. I'm toying with how to bring that approach into SA search. If you're interested in discussing, let me know. |
Yeah that's reasonable
Sounds great
👍 |
That's fair. You can expect some useful text to appear in the next week or two. Things slowed down in the last half of December, just after we wrapped up the final planned feature. We want to kick off a robust benchmarking run asap ... then will get back to expanding documentation to (a) discuss relative merits/performance and (b) better describe the API. |
Could you add some benchmarks to the readme comparing the performance of this crate to eg
libdivsufsort
andlibsais
(and any other SACA's that appear high on crates.io), for example for constructing the suffix array of a human genome.It's very hard to know how useful this crate (and most other suffix array construction crates) are without having at least some easily available numbers.
Ideally you'd also include a quick sentence on all of the methods and how they differ / why you are better. Some are linear, some log-linear, and it's unclear how your method differs.
PS: the first link in the readme is broken ;)
The text was updated successfully, but these errors were encountered: