[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770

SGSSGene · 2021-08-23T11:14:37Z

A simple roundtrip through seqan3 (reading and writing a fasta file) should not introduce any changes with the default options

auto fout = seqan3::sequence_file_input{std::istringstream{input}, seqan3::format_fasta{}} |
                  seqan3::sequence_file_output{std::ostringstream{}, seqan3::format_fasta{}};

The PR #2769 fixes the forced introduction of spaces before the sequence ids.
This PR fixes the removal of whitespaces when reading a fasta file.
This changes the default behavior. To achieve the old behavior a new flag is introduced:
fin.options.fasta_ignore_blank_before_id which is by default true.

Example:
File

> seq1
ACTG

seqan3 now: id="seq1"
some others: id=" seq1"

vercel · 2021-08-23T11:14:42Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/Fugjpfm7cJd6WKCsQNp6ZVmKY2H4
✅ Preview: https://seqan3-git-fork-sgssgene-feat-removeblankbeforeid-seqan.vercel.app

codecov · 2021-08-23T11:27:51Z

Codecov Report

Merging #2770 (048469c) into master (b4984bc) will decrease coverage by 0.03%.
The diff coverage is 72.22%.

@@            Coverage Diff             @@
##           master    #2770      +/-   ##
==========================================
- Coverage   98.22%   98.19%   -0.04%     
==========================================
  Files         267      267              
  Lines       11511    11521      +10     
==========================================
+ Hits        11307    11313       +6     
- Misses        204      208       +4

Impacted Files	Coverage Δ
include/seqan3/io/sequence_file/format_fasta.hpp	`89.38% <72.22%> (-2.86%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5c63f9...048469c. Read the comment docs.

MitraDarja

LGTM, just one minor thing. :)

test/unit/io/sequence_file/sequence_file_integration_test.cpp

smehringer · 2021-10-25T10:37:32Z

Core Meeting 25.10.2021 - The default should still be "blanks are stripped" because usually blanks are usually ignored. the option is still useful to allow for a "perfect roundtrip".

SGSSGene · 2021-10-25T10:58:59Z

I will unchange default to 'true'

eseiler · 2021-10-27T08:05:36Z

include/seqan3/io/sequence_file/format_fasta.hpp

-                {}
+                if (options.fasta_ignore_blanks_before_id)
+                {
+                    for (; (it != e) && (is_id || is_blank)(*it); ++it)


couldn't you do something like

auto const is_id = options.fasta_ignore_blanks_before_id ? (is_char<'>'> || is_char<';'> || is_blank) : (is_char<'>'> || is_char<';'>);

in line 178

I think that is not possible. Line 180 checks if the character is a > or ;

if (!is_id(*begin(stream_view)))

That line would be wrong if it included a whitespace.

I'm just noticing the parsing of ID is not correct either way.
Currently >>>>>>>> TEST would be parsed as TEST.
and with my changes: > > >>> > > > TEST would also be parsed as TEST

I am assuming that is also not what we want?

I have created a new PR to fix this: #2869

SGSSGene · 2022-01-12T12:31:11Z

#2769 is merged, so this can be rebased? (This PR was marked blocked by #2769)

Its blocked by #2869

SGSSGene · 2022-01-12T12:48:32Z

#2769 is merged, so this can be rebased? (This PR was marked blocked by #2769)

Its blocked by #2869

No, this is not blocked by #2869, this can be independently be merged :-)

eseiler

I think we should merge #2869 first, I want to have another look at the predicates afterwards :)

eseiler

Didn't find an elegant way to avoid code duplication

smehringer

Although there is no change in behavior, I would add a short changelog entry for this new feature

smehringer · 2022-03-30T19:17:00Z

test/unit/io/sequence_file/sequence_file_integration_test.cpp

+        ">TEST 1\n"
+        "ACGT\n"
+        "> Test2\n"
+        "AGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGNAGGCTGN\n"
+        ">  Test3\n"
+        "GGAGTATAATATATATATATATAT\n"


wouldn't it be even more understandable if you use this as input AND output string. So one does not have to compare input and output_comp when looking at it but directly notices the "perfect roundtrip".

Or do you explicitly want to show that spaces in the sequence are still removed?

Good point, will adjust. 👍

smehringer · 2022-03-31T19:57:12Z

there is a conflict. Please rebase

vercel bot temporarily deployed to Preview August 23, 2021 11:14 Inactive

SGSSGene force-pushed the feat/remove_blank_before_id branch from fabca48 to fb2ae2a Compare August 23, 2021 11:15

SGSSGene requested review from a team and MitraDarja and removed request for a team August 23, 2021 11:15

SGSSGene mentioned this pull request Aug 23, 2021

Fasta output doesn't meet specification #2767

Closed

vercel bot temporarily deployed to Preview August 23, 2021 11:18 Inactive

MitraDarja approved these changes Aug 24, 2021

View reviewed changes

test/unit/io/sequence_file/sequence_file_integration_test.cpp Outdated Show resolved Hide resolved

eseiler requested a review from marehr August 25, 2021 15:37

SGSSGene force-pushed the feat/remove_blank_before_id branch from fb2ae2a to fd3d739 Compare October 26, 2021 08:37

vercel bot temporarily deployed to Preview October 26, 2021 08:37 Inactive

SGSSGene changed the base branch from master to release-3.1.0 October 26, 2021 08:39

SGSSGene force-pushed the feat/remove_blank_before_id branch 3 times, most recently from e9a0f99 to c42ce88 Compare October 26, 2021 08:43

vercel bot temporarily deployed to Preview October 26, 2021 08:48 Inactive

SGSSGene force-pushed the feat/remove_blank_before_id branch from c42ce88 to fe9d3c0 Compare October 27, 2021 07:57

vercel bot temporarily deployed to Preview October 27, 2021 07:57 Inactive

eseiler reviewed Oct 27, 2021

View reviewed changes

eseiler changed the base branch from release-3.1.0 to master November 10, 2021 10:58

SGSSGene force-pushed the feat/remove_blank_before_id branch from fe9d3c0 to f6207ef Compare November 30, 2021 15:54

vercel bot temporarily deployed to Preview November 30, 2021 15:54 Inactive

smehringer requested review from a team and removed request for marehr and a team December 6, 2021 10:27

SGSSGene requested a review from eseiler January 12, 2022 12:48

SGSSGene force-pushed the feat/remove_blank_before_id branch from 20f708d to a8fbeaf Compare February 9, 2022 10:14

vercel bot temporarily deployed to Preview February 9, 2022 10:14 Inactive

SGSSGene force-pushed the feat/remove_blank_before_id branch from a8fbeaf to 1983b08 Compare February 9, 2022 11:06

vercel bot temporarily deployed to Preview February 9, 2022 11:06 Inactive

eseiler requested changes Mar 4, 2022

View reviewed changes

eseiler force-pushed the feat/remove_blank_before_id branch from 1983b08 to f886ce0 Compare March 23, 2022 15:13

eseiler approved these changes Mar 23, 2022

View reviewed changes

vercel bot temporarily deployed to Preview March 23, 2022 16:15 Inactive

SGSSGene force-pushed the feat/remove_blank_before_id branch from f886ce0 to 1983b08 Compare March 28, 2022 08:32

eseiler force-pushed the feat/remove_blank_before_id branch from 1983b08 to f886ce0 Compare March 28, 2022 08:36

SGSSGene force-pushed the feat/remove_blank_before_id branch from f886ce0 to 9a317f9 Compare March 29, 2022 05:44

vercel bot temporarily deployed to Preview March 29, 2022 05:44 Inactive

SGSSGene force-pushed the feat/remove_blank_before_id branch from 9a317f9 to 6bed967 Compare March 29, 2022 05:55

vercel bot temporarily deployed to Preview March 29, 2022 05:55 Inactive

SGSSGene force-pushed the feat/remove_blank_before_id branch from 9a317f9 to ea2de7b Compare March 29, 2022 08:41

vercel bot temporarily deployed to Preview March 29, 2022 08:41 Inactive

smehringer requested changes Mar 30, 2022

View reviewed changes

SGSSGene force-pushed the feat/remove_blank_before_id branch from ea2de7b to 98c7471 Compare March 31, 2022 09:48

vercel bot temporarily deployed to Preview March 31, 2022 09:48 Inactive

SGSSGene requested a review from smehringer March 31, 2022 09:50

SGSSGene added 2 commits April 12, 2022 15:06

[FEAT] option to not ignore blanks before the id in FASTA files

114e8a4

[TEST] i/o - checking .fasta_ignore_blanks_before_id option works

048469c

SGSSGene force-pushed the feat/remove_blank_before_id branch from 98c7471 to 048469c Compare April 12, 2022 13:07

vercel bot temporarily deployed to Preview April 12, 2022 13:07 Inactive

smehringer approved these changes Apr 27, 2022

View reviewed changes

smehringer merged commit 79b6467 into seqan:master Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770

[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770

SGSSGene commented Aug 23, 2021 •

edited

Loading

vercel bot commented Aug 23, 2021 •

edited

Loading

codecov bot commented Aug 23, 2021 •

edited

Loading

MitraDarja left a comment

smehringer commented Oct 25, 2021

SGSSGene commented Oct 25, 2021

eseiler Oct 27, 2021

SGSSGene Oct 27, 2021 •

edited

Loading

SGSSGene Oct 27, 2021

SGSSGene commented Jan 12, 2022

SGSSGene commented Jan 12, 2022

eseiler left a comment

eseiler left a comment

smehringer left a comment

smehringer Mar 30, 2022

SGSSGene Mar 31, 2022

smehringer commented Mar 31, 2022

[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770

[FEAT] Option to ignore blanks before ids when reading FastA-Files #2770

Conversation

SGSSGene commented Aug 23, 2021 • edited Loading

vercel bot commented Aug 23, 2021 • edited Loading

codecov bot commented Aug 23, 2021 • edited Loading

Codecov Report

MitraDarja left a comment

Choose a reason for hiding this comment

smehringer commented Oct 25, 2021

SGSSGene commented Oct 25, 2021

eseiler Oct 27, 2021

Choose a reason for hiding this comment

SGSSGene Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

SGSSGene Oct 27, 2021

Choose a reason for hiding this comment

SGSSGene commented Jan 12, 2022

SGSSGene commented Jan 12, 2022

eseiler left a comment

Choose a reason for hiding this comment

eseiler left a comment

Choose a reason for hiding this comment

smehringer left a comment

Choose a reason for hiding this comment

smehringer Mar 30, 2022

Choose a reason for hiding this comment

SGSSGene Mar 31, 2022

Choose a reason for hiding this comment

smehringer commented Mar 31, 2022

SGSSGene commented Aug 23, 2021 •

edited

Loading

vercel bot commented Aug 23, 2021 •

edited

Loading

codecov bot commented Aug 23, 2021 •

edited

Loading

SGSSGene Oct 27, 2021 •

edited

Loading