Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate vibrio characterisation with srst2 into TheiaProk workflows #216

Merged
merged 31 commits into from
Apr 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7c12fee
add working version with local docker image - To be updated
cimendes Feb 24, 2023
40ae0db
parse output tsv
cimendes Feb 27, 2023
904ba4e
remove comment
cimendes Feb 27, 2023
8b93c5f
improvements to tsv parsing
cimendes Feb 27, 2023
c629fa5
update docker
cimendes Feb 27, 2023
eb77bf9
update gene_db path
cimendes Feb 27, 2023
e6ae8a2
allow for single end reads
cimendes Feb 27, 2023
4cbb1ee
add vibrio to merlin magic
cimendes Feb 27, 2023
ed44e6e
add vibrio typing to theiaprok_illumina_pe and theiaprok_illumina_se…
cimendes Feb 27, 2023
74fadcf
unsatisfactory solution to read name issue with clean reads on srst2 …
cimendes Feb 27, 2023
8980632
update md5sum for gambit
cimendes Feb 28, 2023
bebfaa8
Merge branch 'main' into im-vibrio-srst2
cimendes Feb 28, 2023
9d9b75d
Bringing in recent changes from main into this im-vibrio-srs2 branch.…
kapsakcj Mar 16, 2023
1ffd357
added srst2 outputs to input call block for export_taxon_tables in bo…
kapsakcj Mar 17, 2023
e07446d
fixed syntax in export_taxon_tables python dictionary
kapsakcj Mar 17, 2023
9b762a6
add additional output file: detailed TSV output from SRST2. Also adde…
kapsakcj Mar 17, 2023
5cea6c5
added new output file srst2_vibrio_detailed_tsv to PE, SE, and merlin…
kapsakcj Mar 17, 2023
8b33b5a
update parsing block to account for when output columns do not exist
cimendes Mar 20, 2023
c1a0b7b
remove duplicated block
cimendes Mar 20, 2023
99d01e0
expose srst2 parameters in merlin_magic
cimendes Mar 20, 2023
6db886c
fix CI with updated checksum
cimendes Mar 20, 2023
13abff2
add function to translate srst2 mismatch/uncertanty indicators to hum…
cimendes Mar 27, 2023
0e29cea
replace not found hits with "(not detected)" so it's more human readable
cimendes Mar 28, 2023
33d5f66
altered srst2_vibrio parsing: ctxA, ompW and toxR are being reported …
cimendes Mar 29, 2023
4e4116a
deprecate srst2_results_tsv file from output; remove trailing empty s…
cimendes Mar 29, 2023
ccad72a
fix merlin magic export mistake
cimendes Mar 29, 2023
c6cc642
fix more oops mistakes
cimendes Mar 29, 2023
3029eea
update srsrt2_vibrio default paramenters to be less stringent
cimendes Mar 31, 2023
32068c3
rename output column from srst2_vibrio_serotype to srst2_vibrio_serog…
cimendes Apr 5, 2023
4c1e70c
update container for staphb/srst2:0.2.0-vcholerae
cimendes Apr 6, 2023
a7ad2ff
Update task_srst2_vibrio.wdl
kevinlibuit Apr 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions tasks/species_typing/task_srst2_vibrio.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
version 1.0

task srst2_vibrio {
meta {
description: "Computational method for finding spa types in Staphylococcus aureus"
}
input {
File reads1
File? reads2
String samplename
Int srst2_min_cov
Int srst2_max_divergence
Int srst2_min_depth
Int srst2_min_edge_depth
Int srst2_gene_max_mismatch
String docker = "quay.io/staphb/srst2:0.2.0-vcholerae"
Int disk_size = 100
Int cpu = 4
}
command <<<
if [ -z "~{reads2}" ] ; then
INPUT_READS="--input_se ~{reads1}"
else
# This task expects/requires that input FASTQ files end in "_1.clean.fastq.gz" and "_2.clean.fastq.gz"
# which is the syntax from TheiaProk read cleaning tasks
INPUT_READS="--input_pe ~{reads1} ~{reads2} --forward _1.clean --reverse _2.clean"
fi

srst2 --version 2>&1 | tee VERSION
srst2 \
${INPUT_READS} \
--gene_db /vibrio-cholerae-db/vibrio_230224.fasta \
--output ~{samplename} \
--min_coverage ~{srst2_min_cov} \
--max_divergence ~{srst2_max_divergence} \
--min_depth ~{srst2_min_depth} \
--min_edge_depth ~{srst2_min_edge_depth} \
--gene_max_mismatch ~{srst2_gene_max_mismatch}

# capture output TSV
mv ~{samplename}__genes__*__results.txt ~{samplename}.tsv

# capture detailed output TSV - not available if no results are outputed
mv ~{samplename}__fullgenes__*__results.txt ~{samplename}.detailed.tsv || echo "No results" > ~{samplename}.detailed.tsv

# parsing block to account for when output columns do not exist
python <<CODE
import csv
import re

# Converting TSV file into list of dictionaries
def tsv_to_dict(filename):
result_list=[]
with open(filename) as file_obj:
reader = csv.DictReader(file_obj, delimiter='\t')
for row in reader:
result_list.append(dict(row))
# only one sample is run, so there's only one row, flattening list
return result_list[0]

# Converting None to empty string
conv = lambda i : i or '-'

# Make characters human-readable
def translate_chars(string):
cimendes marked this conversation as resolved.
Show resolved Hide resolved
translation = []
if '?' in string:
translation.append("low depth/uncertain")
if '-' in string:
translation.append("not detected")

# in case we want to retrieve the allele information
string = re.sub("\*|\?|-", "", string)

if len(translation) > 0:
return '(' + ';'.join(translation) + ')'
return ""

# load output TSV as dict
row = tsv_to_dict('~{samplename}.tsv')

# presence or absence genes - ctxA, ompW and toxR
with open("ctxA", "wb") as ctxA_fh:
value = row.get("ctxA")
presence = translate_chars(conv(value))
if presence == "(not detected)":
ctxA_fh.write(presence)
else:
result = "present" + ' ' + presence
ctxA_fh.write(result.strip())

with open("ompW", "wb") as ompW_fh:
value = row.get("ompW")
presence = translate_chars(conv(value))
if presence == "(not detected)":
ompW_fh.write(presence)
else:
result = "present" + ' ' + presence
ompW_fh.write(result.strip())

with open("toxR", "wb") as toxR_fh:
value = row.get("toxR")
presence = translate_chars(conv(value))
if presence == "(not detected)":
toxR_fh.write(presence)
else:
result = "present" + ' ' + presence
toxR_fh.write(result.strip())

# biotype - tcpA classical or tcpA ElTor
with open("BIOTYPE", "wb") as biotype_fh:
value_ElTor = translate_chars(conv(row.get("tcpA_ElTor")))
value_classical = translate_chars(conv(row.get("tcpA_classical")))

if value_ElTor == "(not detected)" and value_classical == "(not detected)":
biotype_fh.write("(not detected)")
else:
if value_ElTor == "(not detected)":
result = "tcpA_Classical" + ' ' + value_classical
biotype_fh.write(result.strip())
else:
result = "tcpA_ElTor" + ' ' + value_ElTor
biotype_fh.write(result.strip())

# serogroup - O1 or O139
with open("SEROGROUP", "wb") as serotype_fh:
value_O1 = translate_chars(conv(row.get("wbeN_O1")))
value_O139 = translate_chars(conv(row.get("wbfR_O139")))

if value_O1 == "(not detected)" and value_O139 == "(not detected)":
serotype_fh.write("(not detected)")
else:
if value_O1 == "(not detected)":
result = "O139" + ' ' + value_O139
serotype_fh.write(result.strip())
else:
result = "O1" + ' ' + value_O1
serotype_fh.write(result.strip())
CODE
>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So good! For future iterations, I'm thinking that it may be helpful to package all of this gene-type parsing into a single executable that can be made available in the docker image itself to promote interoperability of this specific functionality.

output {
File srst2_detailed_tsv = "~{samplename}.detailed.tsv"
String srst2_version = read_string("VERSION")
String srst2_vibrio_ctxA = read_string("ctxA")
String srst2_vibrio_ompW = read_string("ompW")
String srst2_vibrio_toxR = read_string("toxR")
String srst2_vibrio_biotype = read_string("BIOTYPE")
String srst2_vibrio_serogroup = read_string("SEROGROUP")
}
runtime {
docker: "~{docker}"
memory: "8 GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 0
}
}
2 changes: 2 additions & 0 deletions tasks/taxon_id/task_gambit.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,8 @@ task gambit {
if [[ ${predicted_taxon} == *"Streptococcus pneumoniae"* ]]; then
merlin_tag="Streptococcus pneumoniae"
fi
elif [[ ${predicted_taxon} == *"Vibrio"* ]]; then
merlin_tag="Vibrio"
else
merlin_tag="None"
fi
Expand Down
18 changes: 16 additions & 2 deletions tasks/utilities/task_broad_terra_tools.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,13 @@ task export_taxon_tables {
String? agrvate_agr_num_frameshifts
String? agrvate_version
String? agrvate_docker
File? srst2_vibrio_detailed_tsv
String? srst2_vibrio_version
String? srst2_vibrio_ctxA
String? srst2_vibrio_ompW
String? srst2_vibrio_toxR
String? srst2_vibrio_serogroup
String? srst2_vibrio_biotype
}
command <<<

Expand Down Expand Up @@ -511,7 +518,7 @@ task export_taxon_tables {
"tbprofiler_dr_type": "~{tbprofiler_dr_type}",
"tbprofiler_resistance_genes": "~{tbprofiler_resistance_genes}",
"tbprofiler_additional_outputs_csv": "~{tbprofiler_additional_outputs_csv}",
"tbprofiler_laboratorian_report_csv": "~{tbprofiler_laboratorian_report_csv}"
"tbprofiler_laboratorian_report_csv": "~{tbprofiler_laboratorian_report_csv}",
"tbprofiler_gene_name": "~{tbprofiler_gene_name}",
"tbprofiler_locus_tag": "~{tbprofiler_locus_tag}",
"tbprofiler_variant_substitutions": "~{tbprofiler_variant_substitutions}",
Expand Down Expand Up @@ -609,7 +616,14 @@ task export_taxon_tables {
"agrvate_agr_multiple": "~{agrvate_agr_multiple}",
"agrvate_agr_num_frameshifts": "~{agrvate_agr_num_frameshifts}",
"agrvate_version": "~{agrvate_version}",
"agrvate_docker": "~{agrvate_docker}"
"agrvate_docker": "~{agrvate_docker}",
"srst2_vibrio_detailed_tsv": "~{srst2_vibrio_detailed_tsv}",
"srst2_vibrio_version": "~{srst2_vibrio_version}",
"srst2_vibrio_ctxA": "~{srst2_vibrio_ctxA}",
"srst2_vibrio_ompW": "~{srst2_vibrio_ompW}",
"srst2_vibrio_toxR": "~{srst2_vibrio_toxR}",
"srst2_vibrio_serogroup": "~{srst2_vibrio_serogroup}",
"srst2_vibrio_biotype": "~{srst2_vibrio_biotype}",
}

with open("~{samplename}_terra_table.tsv", "w") as outfile:
Expand Down
3 changes: 2 additions & 1 deletion tests/workflows/test_wf_theiaprok_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@
- path: miniwdl_run/call-clean_check_reads/work/_miniwdl_inputs/0/test_1.clean.fastq.gz
- path: miniwdl_run/call-clean_check_reads/work/_miniwdl_inputs/0/test_2.clean.fastq.gz
- path: miniwdl_run/call-gambit/command
md5sum: ffda45de2bad7a2206f507bf4485c930
md5sum: 7998373971cbc16b4654736280d8d4bd
md5sum: 1931093118b504186aa3776dbca5adb4
- path: miniwdl_run/call-gambit/inputs.json
contains: ["assembly", "fasta", "samplename", "test"]
- path: miniwdl_run/call-gambit/outputs.json
Expand Down
31 changes: 29 additions & 2 deletions workflows/wf_merlin_magic.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import "../tasks/species_typing/task_pbptyper.wdl" as pbptyper
import "../tasks/species_typing/task_poppunk_streppneumo.wdl" as poppunk_spneumo
import "../tasks/species_typing/task_pasty.wdl" as pasty
import "../tasks/gene_typing/task_abricate.wdl" as abricate_task
import "../tasks/species_typing/task_srst2_vibrio.wdl" as srst2_vibrio_task

workflow merlin_magic {
meta {
Expand All @@ -46,6 +47,11 @@ workflow merlin_magic {
Boolean call_shigeifinder_reads_input = false
Boolean tbprofiler_additional_outputs = false
String output_seq_method_type = "WGS"
Int srst2_min_cov = 80
Int srst2_max_divergence = 20
Int srst2_min_depth = 5
Int srst2_min_edge_depth = 2
Int srst2_gene_max_mismatch = 200
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be increased further

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resloved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the validation runs linked above don't have these values set by merlin magic--which leads me to assume that they're just using default srst2 values (i.e. 10).

}
if (merlin_tag == "Acinetobacter baumannii") {
call kaptive.kaptive {
Expand Down Expand Up @@ -226,6 +232,19 @@ workflow merlin_magic {
}
}
}
if (merlin_tag == "Vibrio") {
call srst2_vibrio_task.srst2_vibrio {
input:
reads1 = read1,
reads2 = read2,
samplename = samplename,
srst2_min_cov = srst2_min_cov,
srst2_max_divergence = srst2_max_divergence,
srst2_min_depth = srst2_min_depth,
srst2_min_edge_depth = srst2_min_edge_depth,
srst2_gene_max_mismatch = srst2_gene_max_mismatch
}
}

output {
# Ecoli Typing
Expand Down Expand Up @@ -419,5 +438,13 @@ workflow merlin_magic {
String? seroba_ariba_serotype = seroba_task.seroba_ariba_serotype
String? seroba_ariba_identity = seroba_task.seroba_ariba_identity
File? seroba_details = seroba_task.seroba_details
}
}
# Vibrio
File? srst2_vibrio_detailed_tsv = srst2_vibrio.srst2_detailed_tsv
String? srst2_vibrio_version = srst2_vibrio.srst2_version
String? srst2_vibrio_ctxA = srst2_vibrio.srst2_vibrio_ctxA
String? srst2_vibrio_ompW = srst2_vibrio.srst2_vibrio_ompW
String? srst2_vibrio_toxR = srst2_vibrio.srst2_vibrio_toxR
String? srst2_vibrio_serogroup = srst2_vibrio.srst2_vibrio_serogroup
String? srst2_vibrio_biotype = srst2_vibrio.srst2_vibrio_biotype
}
}
17 changes: 16 additions & 1 deletion workflows/wf_theiaprok_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,14 @@ workflow theiaprok_illumina_pe {
pasty_docker = merlin_magic.pasty_docker,
pasty_comment = merlin_magic.pasty_comment,
qc_check = qc_check_task.qc_check,
qc_standard = qc_check_task.qc_standard
qc_standard = qc_check_task.qc_standard,
srst2_vibrio_detailed_tsv = merlin_magic.srst2_vibrio_detailed_tsv,
srst2_vibrio_version = merlin_magic.srst2_vibrio_version,
srst2_vibrio_ctxA = merlin_magic.srst2_vibrio_ctxA,
srst2_vibrio_ompW = merlin_magic.srst2_vibrio_ompW,
srst2_vibrio_toxR = merlin_magic.srst2_vibrio_toxR,
srst2_vibrio_serogroup = merlin_magic.srst2_vibrio_serogroup,
srst2_vibrio_biotype = merlin_magic.srst2_vibrio_biotype
}
}
}
Expand Down Expand Up @@ -792,5 +799,13 @@ workflow theiaprok_illumina_pe {
String? seroba_ariba_serotype = merlin_magic.seroba_ariba_serotype
String? seroba_ariba_identity = merlin_magic.seroba_ariba_identity
File? seroba_details = merlin_magic.seroba_details
# Vibrio Typing
File? srst2_vibrio_detailed_tsv = merlin_magic.srst2_vibrio_detailed_tsv
String? srst2_vibrio_version = merlin_magic.srst2_vibrio_version
String? srst2_vibrio_ctxA = merlin_magic.srst2_vibrio_ctxA
String? srst2_vibrio_ompW = merlin_magic.srst2_vibrio_ompW
String? srst2_vibrio_toxR = merlin_magic.srst2_vibrio_toxR
String? srst2_vibrio_biotype = merlin_magic.srst2_vibrio_biotype
String? srst2_vibrio_serogroup = merlin_magic.srst2_vibrio_serogroup
}
}
17 changes: 16 additions & 1 deletion workflows/wf_theiaprok_illumina_se.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,14 @@ workflow theiaprok_illumina_se {
agrvate_agr_multiple = merlin_magic.agrvate_agr_multiple,
agrvate_agr_num_frameshifts = merlin_magic.agrvate_agr_num_frameshifts,
agrvate_version = merlin_magic.agrvate_version,
agrvate_docker = merlin_magic.agrvate_docker
agrvate_docker = merlin_magic.agrvate_docker,
srst2_vibrio_detailed_tsv = merlin_magic.srst2_vibrio_detailed_tsv,
srst2_vibrio_version = merlin_magic.srst2_vibrio_version,
srst2_vibrio_ctxA = merlin_magic.srst2_vibrio_ctxA,
srst2_vibrio_ompW = merlin_magic.srst2_vibrio_ompW,
srst2_vibrio_toxR = merlin_magic.srst2_vibrio_toxR,
srst2_vibrio_serogroup = merlin_magic.srst2_vibrio_serogroup,
srst2_vibrio_biotype = merlin_magic.srst2_vibrio_biotype
}
}
}
Expand Down Expand Up @@ -712,5 +719,13 @@ workflow theiaprok_illumina_se {
String? poppunk_GPS_db_version = merlin_magic.poppunk_GPS_db_version
String? poppunk_version = merlin_magic.poppunk_version
String? poppunk_docker = merlin_magic.poppunk_docker
# Vibrio Typing
File? srst2_vibrio_detailed_tsv = merlin_magic.srst2_vibrio_detailed_tsv
String? srst2_vibrio_version = merlin_magic.srst2_vibrio_version
String? srst2_vibrio_ctxA = merlin_magic.srst2_vibrio_ctxA
String? srst2_vibrio_ompW = merlin_magic.srst2_vibrio_ompW
String? srst2_vibrio_toxR = merlin_magic.srst2_vibrio_toxR
String? srst2_vibrio_biotype = merlin_magic.srst2_vibrio_biotype
String? srst2_vibrio_serogroup = merlin_magic.srst2_vibrio_serogroup
}
}