[Alphabet] seqan3::concatenated_sequences benchmark #2951

smehringer · 2022-03-21T10:03:23Z

Description

seqan3::concatenated_sequences is a container that is supposed to be more efficient when using "static" input data (many read operations, few to no write operations)
In contrast to a std::vector<std::vector>, where each inner vector does its own allocation, seqan3::concatenated_sequences stores the data in contiguous memory. In theory, this leads to fewer cache misses when accessing the data structure, and access should be hence faster.

But we never have tested this advantage.

#2947 introduced member functions for efficient push_back. The efficiency of push_back should also be tested.

Tasks

General Considerations

Choose some appropriate size(s) (inspiration). If the whole data structure fits into the CPU cache, we cannot see how fast "actual memory" access is.
Since the "big" benchmarks might take quite a while in Debug (Nightlies), you can set different sizes for Release (used for actual benchmarking) and Debug (Nightlies).
For an example, see how we did it in the format_sam_benchmark.

Benchmarking Access

For both std::vector<seqan3::dna4> and std::string (=underlying_container_t), write a benchmark that measures many (random access) read operations

on a seqan3::concatenated_sequences<underlying_container_t>
on a std::vector<underlying_container_t>
on a SeqAn2 pendent¹

Benchmarking Appending

For both std::vector<seqan3::dna4> and std::string (=underlying_container_t), write a benchmark that measures many push_back operations

on a seqan3::concatenated_sequences<underlying_container_t>² - the old way before #2947
on a seqan3::concatenated_sequences<underlying_container_t>³ - the new way after #2947
on a std::vector<underlying_container_t> - should accomplish same task as shown in ² and ³
if available, on a SeqAn2 pendent¹ #2977

Footnotes

¹ Notes regarding SeqAn2

Known equivalent in SeqAn2: seqan::StringSet<TString, seqan::Owner<seqan::ConcatDirect>>
Check whether there are other SeqAn2 specialisations that are equivalent
std::string equivalent in SeqAn2 is seqan::CharString
std::vector<seqan3::dna4>equivalent in SeqAn2 is seqan::DnaString

² Example code for old way

std::string const input{"This is some (possible large) input."};
seqan3::concatenated_sequences<std::string> output{};

for (/*benchmark loop*/)
{
    std::string buffer{};

    std::ranges::copy(input, buffer);
    buffer += '|';
    std::ranges::copy(input, buffer);
    buffer += '|';
    std::ranges::copy(input, buffer);

    output.push_back(string_buffer);
}

³ Example code for new way

std::string const input{"This is some (possible large) input."};
seqan3::concatenated_sequences<std::string> output{};

for (/*benchmark loop*/)
{
    output.push_back();
    output.last_append(input);
    output.last_push_back('|');
    output.last_append(input);
    output.last_push_back('|');
    output.last_append(input);
}

The text was updated successfully, but these errors were encountered:

h-2 · 2022-03-21T13:33:55Z

Note that you need to get out of the realm of CPU caches for the benchmark to be meaningful!

MitraDarja · 2022-03-21T13:35:56Z

@h-2 Is that something we have to do in every single benchmark (like this one) or some over all adaptation?

eseiler · 2022-03-21T13:44:28Z

While you were commenting, I was editing the issue – I thought the same and already added a few sentences.

However, I'm not so knowledgeable as to what cache level I need to escape — surely L1 and L2; what about L3?

And what would be good sizes to securely achieve this? We should be good with 8 MiB for L2, right?

@h-2 do you know from the top of your head?

Otherwise, I would just consider it a research task for this issue 😁

h-2 · 2022-03-21T13:45:08Z

Is that something we have to do in every single benchmark (like this one) or some over all adaptation?

I am not sure. Maybe try with sizes similar to the other benchmarks first and see if that produces interesting results or not. I think there were plans for "macro-benchmarks" at some point and this could be part of one, but that maybe does not have to happen now.

h-2 · 2022-03-21T13:47:48Z

Otherwise, I would just consider it a research task for this issue

The cache sizes vary a lot. I just wanted to point out that I have seen very significant speed improvements with the change. In the range of 50% speed-up for overall runtime of my program. And the concat_sequences can be 100s of MBs in my program, so I don't know if similar speed-ups are visible when everything fits in the cache.

edit: the reported 50% speed-gain is for switching from vector-of-vector to concat_seqs, so definitely not just because of the change.

eseiler · 2022-03-21T13:52:31Z

Otherwise, I would just consider it a research task for this issue

The cache sizes vary a lot. I just wanted to point out that I have seen very significant speed improvements with the change. In the range of 50% speed-up for overall runtime of my program. And the concat_sequences can be 100s of MBs in my program, so I don't know if similar speed-ups are visible when everything fits in the cache.

I'm sure it's faster, but we need a benchmark at the very least for regression testing :D

I would just test a tiny size (let's say 1000) and a big size (8*1024*1024) and see if there are significant differences in the timing for a fixed number of random accesses. If yes, we probably avoided staying in cache constantly.
After having verified this, we can use 1000 or so for Debug mode, and 8*1024*1024 or so for Release mode.

h-2 · 2022-03-21T17:07:44Z

I would just test a tiny size (let's say 1000) and a big size (810241024) and see if there are significant differences in the timing for a fixed number of random accesses. If yes, we probably avoided staying in cache constantly.

If you are benchmarking random-access-reads (unrelated to the changes from my PR), I would actually recommend creating a large sequence (say 100MB total off 1000 subseqs of 100KB each) and reading from that. Creating the sequence can be performed before the benchmark.

smehringer transferred this issue from seqan/product_backlog Mar 21, 2022

smehringer added the ready to tackle Fully specified, discussed and understood. Can be tackled. label Mar 21, 2022

MitraDarja self-assigned this Mar 21, 2022

SGSSGene changed the title ~~[PERF] seqan3::concetanated_sequences benchmark~~ [PERF] seqan3::concatenated_sequences benchmark Mar 21, 2022

eseiler changed the title ~~[PERF] seqan3::concatenated_sequences benchmark~~ [TEST] seqan3::concatenated_sequences benchmark Mar 21, 2022

eseiler changed the title ~~[TEST] seqan3::concatenated_sequences benchmark~~ [Alphabet] seqan3::concatenated_sequences benchmark Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alphabet] seqan3::concatenated_sequences benchmark #2951

[Alphabet] seqan3::concatenated_sequences benchmark #2951

smehringer commented Mar 21, 2022 •

edited by MitraDarja

Loading

h-2 commented Mar 21, 2022

MitraDarja commented Mar 21, 2022

eseiler commented Mar 21, 2022 •

edited

Loading

h-2 commented Mar 21, 2022

h-2 commented Mar 21, 2022 •

edited

Loading

eseiler commented Mar 21, 2022

h-2 commented Mar 21, 2022

[Alphabet] seqan3::concatenated_sequences benchmark #2951

[Alphabet] seqan3::concatenated_sequences benchmark #2951

Comments

smehringer commented Mar 21, 2022 • edited by MitraDarja Loading

Description

Tasks

General Considerations

Benchmarking Access

Benchmarking Appending

Footnotes

h-2 commented Mar 21, 2022

MitraDarja commented Mar 21, 2022

eseiler commented Mar 21, 2022 • edited Loading

h-2 commented Mar 21, 2022

h-2 commented Mar 21, 2022 • edited Loading

eseiler commented Mar 21, 2022

h-2 commented Mar 21, 2022

smehringer commented Mar 21, 2022 •

edited by MitraDarja

Loading

eseiler commented Mar 21, 2022 •

edited

Loading

h-2 commented Mar 21, 2022 •

edited

Loading