-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770
[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/seqan/seqan3/Fugjpfm7cJd6WKCsQNp6ZVmKY2H4 |
fabca48
to
fb2ae2a
Compare
Codecov Report
@@ Coverage Diff @@
## master #2770 +/- ##
==========================================
- Coverage 98.22% 98.19% -0.04%
==========================================
Files 267 267
Lines 11511 11521 +10
==========================================
+ Hits 11307 11313 +6
- Misses 204 208 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one minor thing. :)
Core Meeting 25.10.2021 - The default should still be "blanks are stripped" because usually blanks are usually ignored. the option is still useful to allow for a "perfect roundtrip". |
I will unchange default to 'true' |
fb2ae2a
to
fd3d739
Compare
e9a0f99
to
c42ce88
Compare
c42ce88
to
fe9d3c0
Compare
{} | ||
if (options.fasta_ignore_blanks_before_id) | ||
{ | ||
for (; (it != e) && (is_id || is_blank)(*it); ++it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't you do something like
auto const is_id = options.fasta_ignore_blanks_before_id ? (is_char<'>'> || is_char<';'> || is_blank) : (is_char<'>'> || is_char<';'>);
in line 178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is not possible. Line 180 checks if the character is a >
or ;
if (!is_id(*begin(stream_view)))
That line would be wrong if it included a whitespace.
I'm just noticing the parsing of ID is not correct either way.
Currently >>>>>>>> TEST
would be parsed as TEST
.
and with my changes: > > >>> > > > TEST
would also be parsed as TEST
I am assuming that is also not what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a new PR to fix this: #2869
fe9d3c0
to
f6207ef
Compare
20f708d
to
a8fbeaf
Compare
a8fbeaf
to
1983b08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge #2869 first, I want to have another look at the predicates afterwards :)
1983b08
to
f886ce0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't find an elegant way to avoid code duplication
f886ce0
to
1983b08
Compare
1983b08
to
f886ce0
Compare
f886ce0
to
9a317f9
Compare
9a317f9
to
6bed967
Compare
9a317f9
to
ea2de7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there is no change in behavior, I would add a short changelog entry for this new feature
">TEST 1\n" | ||
"ACGT\n" | ||
"> Test2\n" | ||
"AGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGN\n" | ||
"> Test3\n" | ||
"GGAGTATAATATATATATATATAT\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be even more understandable if you use this as input AND output string. So one does not have to compare input
and output_comp
when looking at it but directly notices the "perfect roundtrip".
Or do you explicitly want to show that spaces in the sequence are still removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will adjust. 👍
ea2de7b
to
98c7471
Compare
there is a conflict. Please rebase |
98c7471
to
048469c
Compare
A simple roundtrip through seqan3 (reading and writing a fasta file) should not introduce any changes with the default options
The PR #2769 fixes the forced introduction of spaces before the sequence ids.
This PR fixes the removal of whitespaces when reading a fasta file.
This changes the default behavior. To achieve the old behavior a new flag is introduced:
fin.options.fasta_ignore_blank_before_id
which is by defaulttrue
.Example:
File
seqan3 now:
id="seq1"
some others:
id=" seq1"