Benchmarks in readme #4

RagnarGrootKoerkamp · 2025-01-02T20:34:36Z

Could you add some benchmarks to the readme comparing the performance of this crate to eg libdivsufsort and libsais (and any other SACA's that appear high on crates.io), for example for constructing the suffix array of a human genome.

It's very hard to know how useful this crate (and most other suffix array construction crates) are without having at least some easily available numbers.

Ideally you'd also include a quick sentence on all of the methods and how they differ / why you are better. Some are linear, some log-linear, and it's unclear how your method differs.

PS: the first link in the readme is broken ;)

The text was updated successfully, but these errors were encountered:

traviswheeler · 2025-01-03T19:38:30Z

Thanks for the interest @RagnarGrootKoerkamp. We're in the process right now of building a benchmarking system that will evaluate sufr, libdivsufsort, libsais, and others. We don't want to muddy this repository with that evaluation code, so we'll create a 2nd repo (sufr-benchmark) and point to it when it's complete.

We'll also get to the other suggestion (describing alternative methods) in an soon-to-appear manuscript. The simplest description of how this method differs is that it follows the core ideas of CaPS-SA (merge sort with help from on-the-fly LCP array construction), then introduces ideas to dramatically reduce RAM requirement during construction, enables creation of a spaced seed SA, and provides search functions. More details to come.

(Fixed the 1st link in the README - thanks)

Finally: in a bit of synchronicity, we've just been discussing your recent static search tree post - it's a really lovely exploration of the idea. I'm toying with how to bring that approach into SA search. If you're interested in discussing, let me know.

RagnarGrootKoerkamp · 2025-01-03T20:21:06Z

We're in the process right now of building a benchmarking system that will evaluate sufr, libdivsufsort, libsais, and others. We don't want to muddy this repository with that evaluation code

Yeah that's reasonable

so we'll create a 2nd repo (sufr-benchmark) and point to it when it's complete.

Sounds great

We'll also get to the other suggestion (describing alternative methods) in an soon-to-appear manuscript. The simplest description of how this method differs is that it follows the core ideas of CaPS-SA (merge sort with help from on-the-fly LCP array construction), then introduces ideas to dramatically reduce RAM requirement during construction, enables creation of a spaced seed SA, and provides search functions. More details to come.

👍
I think I had the framing of expecting this to be completed work since I found it via crates.io. There's the programmer perspective where some quick info in the readme is very helpful, and then there's the academic perspective of writing a nice paper, and they slightly conflict with each other.

traviswheeler · 2025-01-03T20:32:50Z

I think I had the framing of expecting this to be completed work since I found it via crates.io. There's the programmer perspective where some quick info in the readme is very helpful, and then there's the academic perspective of writing a nice paper, and they slightly conflict with each other.

That's fair. You can expect some useful text to appear in the next week or two. Things slowed down in the last half of December, just after we wrapped up the final planned feature. We want to kick off a robust benchmarking run asap ... then will get back to expanding documentation to (a) discuss relative merits/performance and (b) better describe the API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks in readme #4

Benchmarks in readme #4

RagnarGrootKoerkamp commented Jan 2, 2025

traviswheeler commented Jan 3, 2025

RagnarGrootKoerkamp commented Jan 3, 2025

traviswheeler commented Jan 3, 2025

Benchmarks in readme #4

Benchmarks in readme #4

Comments

RagnarGrootKoerkamp commented Jan 2, 2025

traviswheeler commented Jan 3, 2025

RagnarGrootKoerkamp commented Jan 3, 2025

traviswheeler commented Jan 3, 2025