-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(docs): transfer ambiguous symbols explanation #551
- Loading branch information
1 parent
03079d7
commit 33d3a87
Showing
8 changed files
with
158 additions
and
136 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
68 changes: 68 additions & 0 deletions
68
lapis2-docs/src/content/docs/concepts/ambiguous-symbols.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
--- | ||
title: Ambiguous symbols | ||
description: Explanation how ambiguous reads are handled in the data | ||
--- | ||
|
||
The underlying sequence files in `.FASTA` format can contain any of the following symbols: | ||
|
||
Here is the data converted into a Markdown table: | ||
|
||
| Symbol | Meaning | | ||
| ------ | ----------------- | | ||
| A | Adenine | | ||
| C | Cytosine | | ||
| G | Guanine | | ||
| T | Thymine | | ||
| - | Deletion | | ||
| N | failed read / any | | ||
| R | A or G | | ||
| Y | C or T | | ||
| S | C or G | | ||
| W | A or T | | ||
| K | G or T | | ||
| M | A or C | | ||
| B | not A | | ||
| D | not C | | ||
| H | not G | | ||
| V | not T | | ||
|
||
While one mostly queries for the symbols `A`, `C`, `G`, `T` and `-` to look for specific features and mutations of a sequence, | ||
or `N` for quality control of the underlying data, | ||
the ambiguous symbols `R` through `V` are often too cumbersome to consider in analyses. | ||
|
||
LAPIS supports the flexible consideration of these ambiguous symbols | ||
through an extension of the boolean logic syntax in the variant queries. | ||
|
||
Here we introduce two new expressions: | ||
* Maybe (or UpperBound) to consider sequences that have an ambiguous code which **maybe** matches the queries value. | ||
* The complementary expression Exact (or LowerBound). | ||
|
||
#### Example | ||
|
||
Consider the following sequences: | ||
|
||
``` | ||
12345 | ||
AAACG | ||
AARCG | ||
AANCG | ||
AAGCG | ||
AAACG | ||
``` | ||
|
||
A filter for the mutation `3G` returns only the sequence `AAGCG`, as it is the only sequence with the symbol `G` at position 3. | ||
The filter `Maybe(3G)`, also considers however, that the sequences `AARCG` and `AANCG` **may** have the symbol `G` at position 3, because the symbols `R` and `N` can represent Guanine. | ||
|
||
Conversely, the filter `Not(3A)` contains the sequences | ||
|
||
``` | ||
AARCG | ||
AANCG | ||
AAGCG | ||
``` | ||
|
||
If you want to restrict the set of sequences to those which also do not have an ambiguous code containing `A` at position 3, you can get the lower bound of the sequences using the filter `Exact(Not(3G))` or equivalently `Not(Maybe(3G)`: | ||
|
||
``` | ||
AAGCG | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.