Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate vibrio characterisation with srst2 into TheiaProk workflows #216

Merged
merged 31 commits into from
Apr 13, 2023

Conversation

cimendes
Copy link
Member

Motivation

An Abricate database of target genes for Vibrio characterization was constructed, with the corresponding PR being open at StaPH-B/docker-builds#618.

This docker image includes a Vibrio cholerae-specific database of gene targets (traditionally used in PCR methods) for detecting O1 & O139 serotypes, toxin-production markers, and Biotype markers within the O1 serogroup ("El Tor" or "Classical" biotypes). These sequences were shared via personal communication with Dr. Christine Lee, of the National Listeria, Yersinia, Vibrio and Enterobacterales Reference Laboratory within the Enteric Diseases Laboratory Branch at CDC.

The genes included (and their purpose) included in the database are as follows:

  • ctxA - Cholera toxin, an indication of toxigenic cholerae
  • ompW - outer membrane protein, a V. cholerae species marker (alleles distinguishes V. cholerae from V. parahaemolyticus and V. vulnificus)
  • tcpA - toxin co-pilus A, used to infer Biotype, either "El Tor" or "Clasical"
    • database includes an allele for each Biotype. tcpA_classical and tcpA_ElTor
  • toxR - transcriptional activator (controls cholera toxin, pilus, and outer-membrane protein expression) - Species marker (allele distinguishes V. cholerae from V. parahaemolyticus and V. vulnificus)
  • wbeN - O antigen encoding region - used to identify the O1 serogroup
  • wbfR - O antigen encoding region - used to identify the O139 serogroup

Until further testing, the current container included in the workflow is quay.io/kapsakcj/srst2:0.2.0-vcholerae

A new task task_srst2_vibrio.wdl was included that runs srst2 with the custom vibrio database, and the resulting hits on the gene sequences are reported. The task was included in merlin_magic_workflow.wdl for any sample identified as belonging to the genus vibrio. This has been implemented in both ´TheiaProk_Illumina_PE and ´TheiaProk_Illumina_SE.

The following outputs are retrieved:

  File srst2_tsv = "~{samplename}.tsv"
  String srst2_version = read_string("VERSION")
  String srst2_vibrio_ctxA = read_string("ctxA")
  String srst2_vibrio_ompW = read_string("ompW")
  String srst2_vibrio_tcpA_ElTor = read_string("tcpA_ElTor")
  String srst2_vibrio_toxR = read_string("toxR")
  String srst2_vibrio_wbeN_O1 = read_string("wbeN_O1")

Testing

The workflow has been tested in 152 V. cholerae sequence runs on Terra using Theiaprok_Illumina_PE

Theiaprok_Illumina_SE has been tested locally with sample SRR7062492 as importing the workflow with the correct branch was not possible in Terra (@kapsakcj have you seen this issue before?)

@cimendes cimendes requested a review from emmadoughty March 8, 2023 16:56
@kapsakcj kapsakcj marked this pull request as draft March 8, 2023 16:57
… Merge remote-tracking branch 'origin/main' into im-vibrio-srst2
…d some helpful comments. runs successfully with miniwdl
…_magic workflows. Also added to export_taxon_table task and input call block
@kapsakcj
Copy link
Contributor

FYI let's wait to fix the CI until we've made all the changes we discussed to the SRST2 task & workflows

@cimendes cimendes marked this pull request as ready for review April 6, 2023 12:24
Int srst2_min_depth = 5
Int srst2_min_edge_depth = 2
Int srst2_gene_max_mismatch = 10
Int srst2_gene_max_mismatch = 200
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be increased further

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resloved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the validation runs linked above don't have these values set by merlin magic--which leads me to assume that they're just using default srst2 values (i.e. 10).

result = "O1" + ' ' + value_O1
serotype_fh.write(result.strip())
CODE
>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So good! For future iterations, I'm thinking that it may be helpful to package all of this gene-type parsing into a single executable that can be made available in the docker image itself to promote interoperability of this specific functionality.

@kapsakcj
Copy link
Contributor

Emma's most recent test: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Doughty_Sandbox/job_history/ff536588-c8bb-4bbf-aab5-51158024114c

@kevinlibuit
Copy link
Contributor

File changes look good! I ran things in a sandbox with the updated defaults as well. Functionally everything is looking great. Well done, all!

@kevinlibuit kevinlibuit merged commit d284e48 into main Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants