Skip to content

Commit

Permalink
[FEATURE] Make search modi into individual pipeable config elements.
Browse files Browse the repository at this point in the history
  • Loading branch information
smehringer committed May 19, 2020
1 parent c574e0c commit 5264f6c
Show file tree
Hide file tree
Showing 26 changed files with 316 additions and 230 deletions.
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,23 @@ Note that 3.1.0 will be the first API stable release and interfaces in this rele
#### Search

* Moved `seqan3::search` from `search/algorithm/` to `search/` ([\#1696](https://github.com/seqan/seqan3/pull/1696)).
* Configuration refactoring:
* The names for the search mode configuration have changed and are now individual config elements
that are pipeable ([\#1639](https://github.com/seqan/seqan3/pull/1639)):
`seqan3::search_cfg::all` to `seqan3::search_cfg::hit_all`
`seqan3::search_cfg::best` to `seqan3::search_cfg::hit_single_best`
`seqan3::search_cfg::all_best` to `seqan3::search_cfg::hit_all_best`
`seqan3::search_cfg::strata{5}` to `seqan3::search_cfg::hit_strata{5}`
* The configuration element `seqan3::search_cfg::mode` does not exist anymore.
You can replace it by directly using one of the above mentioned "hit strategy" configuration elements
([\#1639](https://github.com/seqan/seqan3/pull/1639)).

## Notable Bug-fixes

### Argument Parser

* Long option identifiers and their value must be separated by a space or equal sign `=`.
Handling this restriction resolves the ambiguity if one long option identifier is the prefix of
Handling this restriction resolves the ambiguity if one long option identifier is the prefix of
another ([\#1792](https://github.com/seqan/seqan3/pull/1792)).

Valid short id value pairs: `-iValue`, `-i=Value`, `-i Value`
Expand Down
2 changes: 1 addition & 1 deletion doc/tutorial/read_mapper/read_mapper_step2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ void map_reads(std::filesystem::path const & query_path,
seqan3::sequence_file_input query_in{query_path};

seqan3::configuration const search_config = seqan3::search_cfg::max_error{seqan3::search_cfg::total{errors}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all_best};
seqan3::search_cfg::hit_all_best;

for (auto & [query, id, qual] : query_in | seqan3::views::take(20))
{
Expand Down
2 changes: 1 addition & 1 deletion doc/tutorial/read_mapper/read_mapper_step3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ void map_reads(std::filesystem::path const & query_path,
seqan3::sequence_file_input query_in{query_path};

seqan3::configuration const search_config = seqan3::search_cfg::max_error{seqan3::search_cfg::total{errors}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all_best};
seqan3::search_cfg::hit_all_best;

//! [alignment_config]
seqan3::configuration const align_config = seqan3::align_cfg::edit |
Expand Down
2 changes: 1 addition & 1 deletion doc/tutorial/read_mapper/read_mapper_step4.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ void map_reads(std::filesystem::path const & query_path,
//! [alignment_file_output]

seqan3::configuration const search_config = seqan3::search_cfg::max_error{seqan3::search_cfg::total{errors}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all_best};
seqan3::search_cfg::hit_all_best;

seqan3::configuration const align_config = seqan3::align_cfg::edit |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
Expand Down
27 changes: 14 additions & 13 deletions doc/tutorial/search/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,27 +212,28 @@ At position 85: ACT
```
\endsolution

## Search modes
## Which hits are reported?

Besides the error configuration, you can define what kind of hits should be reported:
Besides the error configuration, you can define which hits should be reported:

- seqan3::search_cfg::all: Report all hits that satisfy the (approximate) search.
- seqan3::search_cfg::best: Report the best hit, i.e. the *first* hit with the lowest edit distance.
- seqan3::search_cfg::all_best: Report all hits with the lowest edit distance.
- seqan3::search_cfg::strata: best+x mode. Report all hits within the x-neighbourhood of the best hit.
- seqan3::search_cfg::hit_all: Report all hits that satisfy the (approximate) search.
- seqan3::search_cfg::hit_single_best: Report the best hit, i.e. the *first* hit with the lowest edit distance.
- seqan3::search_cfg::hit_all_best: Report all hits with the lowest edit distance.
- seqan3::search_cfg::hit_strata: best+x strategy. Report all hits within the x-neighbourhood of the best hit.

The mode is appended to the error configuration by using the `|`-operator:
\snippet doc/tutorial/search/search_small_snippets.cpp mode_best
Any hit configuration element is appended to the error configuration by using the `|`-operator:
\snippet doc/tutorial/search/search_small_snippets.cpp hit_best

The strata mode needs an additional parameter:
\snippet doc/tutorial/search/search_small_snippets.cpp mode_strata
The `seqan3::search_cfg::strata` configuration element needs an additional parameter:
\snippet doc/tutorial/search/search_small_snippets.cpp hit_strata

If the best hit had an edit distance of 1, the strata mode would report all hits with up to an edit distance of 3.
If the best hit had an edit distance of 1, the strata strategy would report all hits with up to an edit distance of 3.
Since in this example the total error number is set to 2, all hits with 1 or 2 errors would be reported.

\assignment{Assignment 4}
Search for all occurrences of `GCT` in the text from [assignment 1](#assignment_create_index).<br>
Allow up to 1 error of any type and print the number of hits for each search mode (use `mode::strata{1}`).
Allow up to 1 error of any type and print the number of hits for each hit strategy (use
`seqan3::search_cfg::strata{1}`).
\endassignment

\solution
Expand All @@ -254,7 +255,7 @@ There are 25 hits.

\assignment{Assignment 5}
Search for all occurrences of `GCT` in the text from [assignment 1](#assignment_create_index).<br>
Allow up to 1 error of any type search for all occurrences in the all_best mode.<br>
Allow up to 1 error of any type and search for all occurrences with the strategy `seqan3::search_cfg::hit_all_best`.<br>
Align the query to each of the found positions in the genome and print the score and alignment.<br>
**BONUS**<br>
Do the same for the text collection from [assignment 2](#assignment_exact_search).
Expand Down
12 changes: 6 additions & 6 deletions doc/tutorial/search/search_small_snippets.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,23 +89,23 @@ seqan3::configuration const cfg = seqan3::search_cfg::max_error{seqan3::search_c
}

{
//![mode_best]
//![hit_best]
seqan3::configuration const cfg = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1},
seqan3::search_cfg::substitution{0},
seqan3::search_cfg::insertion{1},
seqan3::search_cfg::deletion{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::best};
//![mode_best]
seqan3::search_cfg::hit_single_best;
//![hit_best]
}

{
//![mode_strata]
//![hit_strata]
seqan3::configuration const cfg = seqan3::search_cfg::max_error{seqan3::search_cfg::total{2},
seqan3::search_cfg::substitution{0},
seqan3::search_cfg::insertion{1},
seqan3::search_cfg::deletion{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::strata{2}};
//![mode_strata]
seqan3::search_cfg::hit_strata{2};
//![hit_strata]
}

}
8 changes: 4 additions & 4 deletions doc/tutorial/search/search_solution4.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,25 @@ int main()

seqan3::debug_stream << "Searching all hits\n";
seqan3::configuration const cfg_all = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all};
seqan3::search_cfg::hit_all;
auto results_all = search(query, index, cfg_all);
seqan3::debug_stream << "Hits: " << results_all << "\n";

seqan3::debug_stream << "Searching all best hits\n";
seqan3::configuration const cfg_all_best = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all_best};
seqan3::search_cfg::hit_all_best;
auto results_all_best = search(query, index, cfg_all_best);
seqan3::debug_stream << "Hits: " << results_all_best << "\n";

seqan3::debug_stream << "Searching best hit\n";
seqan3::configuration const cfg_best = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::best};
seqan3::search_cfg::hit_single_best;
auto results_best = search(query, index, cfg_best);
seqan3::debug_stream << "Hits " << results_best << "\n";

seqan3::debug_stream << "Searching all hits in the 1-stratum\n";
seqan3::configuration const cfg_strata = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::strata{1}};
seqan3::search_cfg::hit_strata{1};
auto results_strata = search(query, index, cfg_strata);
seqan3::debug_stream << "Hits: " << results_strata << "\n";
}
4 changes: 2 additions & 2 deletions doc/tutorial/search/search_solution5.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ void run_text_single()
seqan3::debug_stream << "Searching all best hits allowing for 1 error in a single text\n";

seqan3::configuration const search_config = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all_best};
seqan3::search_cfg::hit_all_best;
seqan3::configuration const align_config = seqan3::align_cfg::edit |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::result{seqan3::with_alignment};
Expand Down Expand Up @@ -54,7 +54,7 @@ void run_text_collection()
seqan3::debug_stream << "Searching all best hits allowing for 1 error in a text collection\n";

seqan3::configuration const search_config = seqan3::search_cfg::max_error{seqan3::search_cfg::total{1}} |
seqan3::search_cfg::mode{seqan3::search_cfg::all_best};
seqan3::search_cfg::hit_all_best;
seqan3::configuration const align_config = seqan3::align_cfg::edit |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::result{seqan3::with_alignment};
Expand Down
41 changes: 29 additions & 12 deletions include/seqan3/search/configuration/all.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#include <seqan3/search/configuration/detail.hpp>
#include <seqan3/search/configuration/max_error.hpp>
#include <seqan3/search/configuration/max_error_rate.hpp>
#include <seqan3/search/configuration/mode.hpp>
#include <seqan3/search/configuration/hit.hpp>
#include <seqan3/search/configuration/output.hpp>
#include <seqan3/search/configuration/parallel.hpp>

Expand All @@ -32,31 +32,48 @@
*
* \details
*
* ### Introduction
* \section search_configuration_section_introduction Introduction
*
* In SeqAn the search algorithm uses a configuration object to determine the desired
* \ref seqan3::search_cfg::max_error "number"/\ref seqan3::search_cfg::max_error_rate "rate" of errors,
* what hits are considered as \ref seqan3::search_cfg::mode "results", and how to
* \ref seqan3::search_cfg::output "output" the result.
* what hits are reported based on a \ref search_configuration_subsection_hit_strategy "strategy", and how to
* \ref seqan3::search_cfg::output "output" the results.
* These configurations exist in their own namespace, namely seqan3::search_cfg, to disambiguate them from the
* configuration of other algorithms.
*
* If no configuration is provided upon invoking the seqan3::search algorithm, a default configuration is provided:
* \include test/snippet/search/configuration_default.cpp
*
* ### Combining configuration elements
* \section search_configuration_section_overview Overview on search configurations
*
* Configurations can be combined using the `|`-operator. If a combination is invalid, a static assertion is triggered
* during compilation and will inform the user that the the last config cannot be combined with any of the configs from
* the left-hand side of the configuration specification. Unfortunately, the names of the invalid
* types cannot be printed within the static assert, but the following table shows which combinations are possible.
* In general, the same configuration element cannot occur more than once inside of a configuration specification.
*
* | **Config** | **0** | **1** | **2** | **3** | **4** |
* | ------------------------------------------------------------|-------|-------|-------|-------|-------|
* | \ref seqan3::search_cfg::max_error "0: Max error" | ❌ | ❌ | ✅ | ✅ | ✅ |
* | \ref seqan3::search_cfg::max_error_rate "1: Max error rate" | ❌ | ❌ | ✅ | ✅ | ✅ |
* | \ref seqan3::search_cfg::output "2: Output" | ✅ | ✅ | ❌ | ✅ | ✅ |
* | \ref seqan3::search_cfg::mode "3: Mode" | ✅ | ✅ | ✅ | ❌ | ✅ |
* | \ref seqan3::search_cfg::parallel "4: Parallel" | ✅ | ✅ | ✅ | ✅ | ❌ |
* | **Configuration group** | **0** | **1** | **2** | **3** | **4** |
* | --------------------------------------------------------------------|-------|-------|-------|-------|-------|
* | \ref seqan3::search_cfg::max_error "0: Max error" | ❌ | ❌ | ✅ | ✅ | ✅ |
* | \ref seqan3::search_cfg::max_error_rate "1: Max error rate" | ❌ | ❌ | ✅ | ✅ | ✅ |
* | \ref seqan3::search_cfg::output "2: Output" | ✅ | ✅ | ❌ | ✅ | ✅ |
* | \ref search_configuration_subsection_hit_strategy "3. Hit" | ✅ | ✅ | ✅ | ❌ | ✅ |
* | \ref seqan3::search_cfg::parallel "4: Parallel" | ✅ | ✅ | ✅ | ✅ | ❌ |
*
* \subsection search_configuration_subsection_hit_strategy 3. Hit Configuration
*
* This configuration can be used to determine which hits are reported.
* Currently these strategies are available:
*
* | Hit Configurations | Behaviour |
* |-------------------------------------|---------------------------------------------------------------------|
* | seqan3::search_cfg::hit_all | Report all hits within error bounds. |
* | seqan3::search_cfg::hit_all_best | Report all hits with the lowest number of errors within the bounds. |
* | seqan3::search_cfg::hit_single_best | Report one best hit (hit with lowest error) within bounds. |
* | seqan3::search_cfg::hit_strata | Report all hits within best + `stratum` errors. |
*
* The individual configuration elements to select a search strategy cannot be combined with each other
* (mutual exclusivity).
*
* \include test/snippet/search/hit_configuration_examples.cpp
*/
4 changes: 2 additions & 2 deletions include/seqan3/search/configuration/default_configuration.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#include <seqan3/search/configuration/detail.hpp>
#include <seqan3/search/configuration/max_error.hpp>
#include <seqan3/search/configuration/max_error_rate.hpp>
#include <seqan3/search/configuration/mode.hpp>
#include <seqan3/search/configuration/hit.hpp>
#include <seqan3/search/configuration/output.hpp>

namespace seqan3::search_cfg
Expand All @@ -27,6 +27,6 @@ namespace seqan3::search_cfg
*/
inline constexpr configuration default_configuration = max_error{total{0}, substitution{0}, insertion{0}, deletion{0}} |
output{text_position} |
mode{all};
hit_all;

} // namespace seqan3::search_cfg
4 changes: 2 additions & 2 deletions include/seqan3/search/configuration/detail.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ enum struct search_config_id : uint8_t
max_error, //!< Identifier for the max_errors configuration.
max_error_rate, //!< Identifier for the max_error_rate configuration.
output, //!< Identifier for the output configuration.
mode, //!< Identifier for the search mode configuration.
hit, //!< Identifier for the hit configuration (all, all_best, single_best, strata).
parallel, //!< Identifier for the parallel execution configuration.
//!\cond
// ATTENTION: Must always be the last item; will be used to determine the number of ids.
Expand All @@ -69,7 +69,7 @@ inline constexpr std::array<std::array<bool, static_cast<uint8_t>(search_config_
static_cast<uint8_t>(search_config_id::SIZE)> compatibility_table<search_config_id> =
{
{
// max_error, max_error_rate, output, mode, parallel
// max_error, max_error_rate, output, hit, parallel
{ 0, 0, 1, 1, 1},
{ 0, 0, 1, 1, 1},
{ 1, 1, 0, 1, 1},
Expand Down
Loading

0 comments on commit 5264f6c

Please sign in to comment.