Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alphabet] seqan3::concatenated_sequences benchmark #2951

Open
7 tasks
smehringer opened this issue Mar 21, 2022 · 7 comments
Open
7 tasks

[Alphabet] seqan3::concatenated_sequences benchmark #2951

smehringer opened this issue Mar 21, 2022 · 7 comments
Assignees
Labels
ready to tackle Fully specified, discussed and understood. Can be tackled.

Comments

@smehringer
Copy link
Member

smehringer commented Mar 21, 2022

Description

seqan3::concatenated_sequences is a container that is supposed to be more efficient when using "static" input data (many read operations, few to no write operations)
In contrast to a std::vector<std::vector>, where each inner vector does its own allocation, seqan3::concatenated_sequences stores the data in contiguous memory. In theory, this leads to fewer cache misses when accessing the data structure, and access should be hence faster.

But we never have tested this advantage.

#2947 introduced member functions for efficient push_back. The efficiency of push_back should also be tested.

Tasks

General Considerations

Choose some appropriate size(s) (inspiration). If the whole data structure fits into the CPU cache, we cannot see how fast "actual memory" access is.
Since the "big" benchmarks might take quite a while in Debug (Nightlies), you can set different sizes for Release (used for actual benchmarking) and Debug (Nightlies).
For an example, see how we did it in the format_sam_benchmark.

Benchmarking Access

For both std::vector<seqan3::dna4> and std::string (=underlying_container_t), write a benchmark that measures many (random access) read operations

  • on a seqan3::concatenated_sequences<underlying_container_t>
  • on a std::vector<underlying_container_t>
  • on a SeqAn2 pendent¹

Benchmarking Appending

For both std::vector<seqan3::dna4> and std::string (=underlying_container_t), write a benchmark that measures many push_back operations

  • on a seqan3::concatenated_sequences<underlying_container_t>² - the old way before #2947
  • on a seqan3::concatenated_sequences<underlying_container_t>³ - the new way after #2947
  • on a std::vector<underlying_container_t> - should accomplish same task as shown in ² and ³
  • if available, on a SeqAn2 pendent¹ #2977

Footnotes

¹ Notes regarding SeqAn2
  • Known equivalent in SeqAn2: seqan::StringSet<TString, seqan::Owner<seqan::ConcatDirect>>
  • Check whether there are other SeqAn2 specialisations that are equivalent
  • std::string equivalent in SeqAn2 is seqan::CharString
  • std::vector<seqan3::dna4>equivalent in SeqAn2 is seqan::DnaString
² Example code for old way
std::string const input{"This is some (possible large) input."};
seqan3::concatenated_sequences<std::string> output{};

for (/*benchmark loop*/)
{
    std::string buffer{};

    std::ranges::copy(input, buffer);
    buffer += '|';
    std::ranges::copy(input, buffer);
    buffer += '|';
    std::ranges::copy(input, buffer);

    output.push_back(string_buffer);
}
³ Example code for new way
std::string const input{"This is some (possible large) input."};
seqan3::concatenated_sequences<std::string> output{};

for (/*benchmark loop*/)
{
    output.push_back();
    output.last_append(input);
    output.last_push_back('|');
    output.last_append(input);
    output.last_push_back('|');
    output.last_append(input);
}
@smehringer smehringer transferred this issue from seqan/product_backlog Mar 21, 2022
@smehringer smehringer added the ready to tackle Fully specified, discussed and understood. Can be tackled. label Mar 21, 2022
@MitraDarja MitraDarja self-assigned this Mar 21, 2022
@h-2
Copy link
Member

h-2 commented Mar 21, 2022

Note that you need to get out of the realm of CPU caches for the benchmark to be meaningful!

@MitraDarja
Copy link
Contributor

@h-2 Is that something we have to do in every single benchmark (like this one) or some over all adaptation?

@SGSSGene SGSSGene changed the title [PERF] seqan3::concetanated_sequences benchmark [PERF] seqan3::concatenated_sequences benchmark Mar 21, 2022
@eseiler
Copy link
Member

eseiler commented Mar 21, 2022

While you were commenting, I was editing the issue – I thought the same and already added a few sentences.

However, I'm not so knowledgeable as to what cache level I need to escape — surely L1 and L2; what about L3?

And what would be good sizes to securely achieve this? We should be good with 8 MiB for L2, right?

@h-2 do you know from the top of your head?

Otherwise, I would just consider it a research task for this issue 😁

@h-2
Copy link
Member

h-2 commented Mar 21, 2022

Is that something we have to do in every single benchmark (like this one) or some over all adaptation?

I am not sure. Maybe try with sizes similar to the other benchmarks first and see if that produces interesting results or not. I think there were plans for "macro-benchmarks" at some point and this could be part of one, but that maybe does not have to happen now.

@h-2
Copy link
Member

h-2 commented Mar 21, 2022

Otherwise, I would just consider it a research task for this issue

The cache sizes vary a lot. I just wanted to point out that I have seen very significant speed improvements with the change. In the range of 50% speed-up for overall runtime of my program. And the concat_sequences can be 100s of MBs in my program, so I don't know if similar speed-ups are visible when everything fits in the cache.

edit: the reported 50% speed-gain is for switching from vector-of-vector to concat_seqs, so definitely not just because of the change.

@eseiler
Copy link
Member

eseiler commented Mar 21, 2022

Otherwise, I would just consider it a research task for this issue

The cache sizes vary a lot. I just wanted to point out that I have seen very significant speed improvements with the change. In the range of 50% speed-up for overall runtime of my program. And the concat_sequences can be 100s of MBs in my program, so I don't know if similar speed-ups are visible when everything fits in the cache.

I'm sure it's faster, but we need a benchmark at the very least for regression testing :D

I would just test a tiny size (let's say 1000) and a big size (8*1024*1024) and see if there are significant differences in the timing for a fixed number of random accesses. If yes, we probably avoided staying in cache constantly.
After having verified this, we can use 1000 or so for Debug mode, and 8*1024*1024 or so for Release mode.

@eseiler eseiler changed the title [PERF] seqan3::concatenated_sequences benchmark [TEST] seqan3::concatenated_sequences benchmark Mar 21, 2022
@h-2
Copy link
Member

h-2 commented Mar 21, 2022

I would just test a tiny size (let's say 1000) and a big size (810241024) and see if there are significant differences in the timing for a fixed number of random accesses. If yes, we probably avoided staying in cache constantly.

If you are benchmarking random-access-reads (unrelated to the changes from my PR), I would actually recommend creating a large sequence (say 100MB total off 1000 subseqs of 100KB each) and reading from that. Creating the sequence can be performed before the benchmark.

@eseiler eseiler changed the title [TEST] seqan3::concatenated_sequences benchmark [Alphabet] seqan3::concatenated_sequences benchmark Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to tackle Fully specified, discussed and understood. Can be tackled.
Projects
None yet
Development

No branches or pull requests

4 participants