Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split fasta improvement #239

Merged
merged 2 commits into from
Oct 19, 2022
Merged

Split fasta improvement #239

merged 2 commits into from
Oct 19, 2022

Conversation

DataSpott
Copy link
Collaborator

@DataSpott DataSpott commented Oct 14, 2022

Changes to split_fasta.py in the fasta-input:

  1. Fixed the issue with leading empty lines in the fasta-file causing an error that fails the whole pipeline -> now empty-lines are skipped completely
  2. Changed behaviour regarding the fasta-header:
  • Before the header was split when a whitespace occured and only the first part wastaken as the new fasta-header -> this could lead to problems with fastas from e.g. GISAID that included whitespaces in their name (like "hcov19/Hong Kong/...") and were therefore detected as duplicates of the same sequence even if it were different sequences.
  • Now whitespaces are replaced with "_" -> can lead to longer file-names & fasta-headers, but the whole header information is preserved

Solves issue #229

… empty line. Now skips over all empty lines in the file
…he first whitespace and taking only the first part to replacing whitespaces with "_". Should avoid problems with fastas containing whitespaces in the header (like GISAID-fastas from Hong Kong & Sri Lanka) being labelled as duplicates of the same sequence.
@DataSpott DataSpott added bug Something isn't working enhancement New feature or request labels Oct 14, 2022
@DataSpott DataSpott self-assigned this Oct 14, 2022
@replikation
Copy link
Owner

please do some "worst case" scenario testings and also tests with default (from fastq) to check that nothing bad is happening at the final HTML. if everything is fine we can merge @DataSpott

@DataSpott
Copy link
Collaborator Author

Tested now with the "test_fastq"- & "test_fasta"-profile as well as a fasta-file with tow hard cutted fasta-sequences.
Worked in all three cases properly.

@DataSpott DataSpott linked an issue Oct 19, 2022 that may be closed by this pull request
@replikation replikation merged commit eab1c4f into master Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"split-fasta"-process fails due to leading empty line
2 participants