Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks in readme #4

Open
RagnarGrootKoerkamp opened this issue Jan 2, 2025 · 3 comments
Open

Benchmarks in readme #4

RagnarGrootKoerkamp opened this issue Jan 2, 2025 · 3 comments

Comments

@RagnarGrootKoerkamp
Copy link

Could you add some benchmarks to the readme comparing the performance of this crate to eg libdivsufsort and libsais (and any other SACA's that appear high on crates.io), for example for constructing the suffix array of a human genome.

It's very hard to know how useful this crate (and most other suffix array construction crates) are without having at least some easily available numbers.

Ideally you'd also include a quick sentence on all of the methods and how they differ / why you are better. Some are linear, some log-linear, and it's unclear how your method differs.

PS: the first link in the readme is broken ;)

@traviswheeler
Copy link
Member

Thanks for the interest @RagnarGrootKoerkamp. We're in the process right now of building a benchmarking system that will evaluate sufr, libdivsufsort, libsais, and others. We don't want to muddy this repository with that evaluation code, so we'll create a 2nd repo (sufr-benchmark) and point to it when it's complete.

We'll also get to the other suggestion (describing alternative methods) in an soon-to-appear manuscript. The simplest description of how this method differs is that it follows the core ideas of CaPS-SA (merge sort with help from on-the-fly LCP array construction), then introduces ideas to dramatically reduce RAM requirement during construction, enables creation of a spaced seed SA, and provides search functions. More details to come.

(Fixed the 1st link in the README - thanks)

Finally: in a bit of synchronicity, we've just been discussing your recent static search tree post - it's a really lovely exploration of the idea. I'm toying with how to bring that approach into SA search. If you're interested in discussing, let me know.

@RagnarGrootKoerkamp
Copy link
Author

We're in the process right now of building a benchmarking system that will evaluate sufr, libdivsufsort, libsais, and others. We don't want to muddy this repository with that evaluation code

Yeah that's reasonable

so we'll create a 2nd repo (sufr-benchmark) and point to it when it's complete.

Sounds great

We'll also get to the other suggestion (describing alternative methods) in an soon-to-appear manuscript. The simplest description of how this method differs is that it follows the core ideas of CaPS-SA (merge sort with help from on-the-fly LCP array construction), then introduces ideas to dramatically reduce RAM requirement during construction, enables creation of a spaced seed SA, and provides search functions. More details to come.

👍
I think I had the framing of expecting this to be completed work since I found it via crates.io. There's the programmer perspective where some quick info in the readme is very helpful, and then there's the academic perspective of writing a nice paper, and they slightly conflict with each other.

@traviswheeler
Copy link
Member

I think I had the framing of expecting this to be completed work since I found it via crates.io. There's the programmer perspective where some quick info in the readme is very helpful, and then there's the academic perspective of writing a nice paper, and they slightly conflict with each other.

That's fair. You can expect some useful text to appear in the next week or two. Things slowed down in the last half of December, just after we wrapped up the final planned feature. We want to kick off a robust benchmarking run asap ... then will get back to expanding documentation to (a) discuss relative merits/performance and (b) better describe the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants