-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate vibrio characterisation with srst2 into TheiaProk workflows #216
Conversation
… and terra_tools table
… Merge remote-tracking branch 'origin/main' into im-vibrio-srst2
…th PE and SE workflows
…d some helpful comments. runs successfully with miniwdl
…_magic workflows. Also added to export_taxon_table task and input call block
FYI let's wait to fix the CI until we've made all the changes we discussed to the SRST2 task & workflows |
…an-readable format
…as present or absent; tcpA classical and tcpA ElTor are reported in the biotype column; O1 and O139 are reported in the serotype column
…paces from column values
Int srst2_min_depth = 5 | ||
Int srst2_min_edge_depth = 2 | ||
Int srst2_gene_max_mismatch = 10 | ||
Int srst2_gene_max_mismatch = 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be increased further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resloved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the validation runs linked above don't have these values set by merlin magic--which leads me to assume that they're just using default srst2 values (i.e. 10).
result = "O1" + ' ' + value_O1 | ||
serotype_fh.write(result.strip()) | ||
CODE | ||
>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So good! For future iterations, I'm thinking that it may be helpful to package all of this gene-type parsing into a single executable that can be made available in the docker image itself to promote interoperability of this specific functionality.
File changes look good! I ran things in a sandbox with the updated defaults as well. Functionally everything is looking great. Well done, all! |
Motivation
An Abricate database of target genes for Vibrio characterization was constructed, with the corresponding PR being open at StaPH-B/docker-builds#618.
This docker image includes a Vibrio cholerae-specific database of gene targets (traditionally used in PCR methods) for detecting O1 & O139 serotypes, toxin-production markers, and Biotype markers within the O1 serogroup ("El Tor" or "Classical" biotypes). These sequences were shared via personal communication with Dr. Christine Lee, of the National Listeria, Yersinia, Vibrio and Enterobacterales Reference Laboratory within the Enteric Diseases Laboratory Branch at CDC.
The genes included (and their purpose) included in the database are as follows:
ctxA
- Cholera toxin, an indication of toxigenic choleraeompW
- outer membrane protein, a V. cholerae species marker (alleles distinguishes V. cholerae from V. parahaemolyticus and V. vulnificus)tcpA
- toxin co-pilus A, used to infer Biotype, either "El Tor" or "Clasical"tcpA_classical
andtcpA_ElTor
toxR
- transcriptional activator (controls cholera toxin, pilus, and outer-membrane protein expression) - Species marker (allele distinguishes V. cholerae from V. parahaemolyticus and V. vulnificus)wbeN
- O antigen encoding region - used to identify the O1 serogroupwbfR
- O antigen encoding region - used to identify the O139 serogroupUntil further testing, the current container included in the workflow is
quay.io/kapsakcj/srst2:0.2.0-vcholerae
A new task
task_srst2_vibrio.wdl
was included that runs srst2 with the custom vibrio database, and the resulting hits on the gene sequences are reported. The task was included inmerlin_magic_workflow.wdl
for any sample identified as belonging to the genusvibrio
. This has been implemented in both ´TheiaProk_Illumina_PEand ´TheiaProk_Illumina_SE
.The following outputs are retrieved:
Testing
The workflow has been tested in 152 V. cholerae sequence runs on Terra using Theiaprok_Illumina_PE
Theiaprok_Illumina_SE has been tested locally with sample SRR7062492 as importing the workflow with the correct branch was not possible in Terra (@kapsakcj have you seen this issue before?)