Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurize optimus #1228

Merged
merged 53 commits into from
Mar 6, 2024
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
3057662
add logic to choose which docker
nikellepetrillo Sep 28, 2023
09bd338
fix param_meta and import
nikellepetrillo Sep 28, 2023
219e99d
add cloud provider to checkinput
nikellepetrillo Sep 28, 2023
e2e70c9
handle hard coded white list paths in CheckInputs.wdl
nikellepetrillo Sep 28, 2023
970eb98
last few dockers
nikellepetrillo Sep 29, 2023
4f40a11
last few dockers
nikellepetrillo Sep 29, 2023
3a9784a
last few dockers
nikellepetrillo Sep 29, 2023
adc8938
change error msg
nikellepetrillo Sep 29, 2023
46150d7
use ubuntu image
nikellepetrillo Oct 2, 2023
e63c32d
use ubuntu image
nikellepetrillo Oct 2, 2023
8c4785c
change whitelists
nikellepetrillo Oct 2, 2023
eec78b0
point to azure public whitelists
nikellepetrillo Oct 4, 2023
3044afe
add sas token
nikellepetrillo Oct 4, 2023
2f5aea1
echo whitelist
nikellepetrillo Oct 4, 2023
efdcda6
echo whitelist
nikellepetrillo Oct 4, 2023
3df2c28
testing for coa
nikellepetrillo Oct 5, 2023
103cc48
testing for coa
nikellepetrillo Oct 5, 2023
c5fecb2
change back to terra buckets for whitelists
nikellepetrillo Oct 6, 2023
bacefe7
change whitelists to point at public azure bucket
nikellepetrillo Oct 6, 2023
305485f
files to strings
nikellepetrillo Oct 10, 2023
09c69e5
print statemtns to checkinputs
nikellepetrillo Oct 10, 2023
9c39ff6
string to files
nikellepetrillo Oct 10, 2023
bba5226
change to terra bucket paths
nikellepetrillo Oct 11, 2023
8eaf292
strings not files
nikellepetrillo Oct 12, 2023
4aef7d6
append sas token
nikellepetrillo Oct 13, 2023
fa1713d
append sas token
nikellepetrillo Oct 13, 2023
6e0536d
append sas and use strings
nikellepetrillo Oct 16, 2023
ab042b9
back to bucket urls
nikellepetrillo Oct 16, 2023
4493699
back to bucket urls
nikellepetrillo Oct 16, 2023
26231e7
use google cloud urls
nikellepetrillo Oct 16, 2023
5129132
using public urls
nikellepetrillo Oct 18, 2023
1ad1160
trying to export sas_token
nikellepetrillo Oct 18, 2023
d19036a
trying to export sas_token
nikellepetrillo Oct 18, 2023
b8a826c
trying to export sas_token
nikellepetrillo Oct 18, 2023
755cadd
terra on gcp
nikellepetrillo Oct 19, 2023
07ca768
update azure whitelist files
phendriksen100 Feb 22, 2024
d8d988f
Merge branch 'azurized_wdls' into np_pd-2368_optimus_runs_on_ToA
nikellepetrillo Mar 1, 2024
f1e178f
changelogs
nikellepetrillo Mar 1, 2024
efee8cb
changelogs
nikellepetrillo Mar 1, 2024
c453e88
changelogs
nikellepetrillo Mar 1, 2024
6770f91
changelogs
nikellepetrillo Mar 1, 2024
ac3bbf2
fix some inputs
nikellepetrillo Mar 4, 2024
cb19404
fix some inputs
nikellepetrillo Mar 4, 2024
55fc7ff
fix some inputs
nikellepetrillo Mar 4, 2024
78e4513
fix some inputs
nikellepetrillo Mar 4, 2024
c602c3f
update optimus dockers
nikellepetrillo Mar 4, 2024
fc5d40a
warp_tools_docker_path for staralign
nikellepetrillo Mar 5, 2024
94f8721
stop using ice lake as default
nikellepetrillo Mar 5, 2024
b817d80
update pipeline docs
kayleemathews Mar 5, 2024
12ed929
2 threads
nikellepetrillo Mar 5, 2024
1d11175
Merge remote-tracking branch 'origin/np_pd-2368_optimus_runs_on_ToA' …
nikellepetrillo Mar 5, 2024
e9b5a36
counting mode
nikellepetrillo Mar 6, 2024
708b545
changelogs
nikellepetrillo Mar 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions pipelines/skylab/multiome/Multiome.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.2.2
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved
2024-03-01 (Date of Last Commit)

* Updated the Optimus.wdl to run on Azure. This change does not affect the Multiome pipeline.

# 3.2.1
2024-02-29 (Date of Last Commit)

Expand Down
6 changes: 4 additions & 2 deletions pipelines/skylab/multiome/Multiome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils
import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender

workflow Multiome {
String pipeline_version = "3.2.1"
String pipeline_version = "3.2.2"

input {
String input_id
String cloud_provider
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved

# Optimus Inputs
String counting_mode = "sn_rna"
Expand Down Expand Up @@ -68,7 +69,8 @@ workflow Multiome {
ignore_r1_read_length = ignore_r1_read_length,
star_strand_mode = star_strand_mode,
count_exons = count_exons,
soloMultiMappers = soloMultiMappers
soloMultiMappers = soloMultiMappers,
cloud_provider = cloud_provider
}

# Call the ATAC workflow
Expand Down
7 changes: 6 additions & 1 deletion pipelines/skylab/multiome/atac.changelog.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# 1.1.8
# 1.1.9
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved
2024-03-01 (Date of Last Commit)

* Updated the Optimus.wdl to run on Azure. This change does not affect the ATAC pipeline.

* # 1.1.8
2024-02-07 (Date of Last Commit)

* Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic
Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/multiome/atac.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ workflow ATAC {
String adapter_seq_read3 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG"
}

String pipeline_version = "1.1.8"
String pipeline_version = "1.1.9"

parameter_meta {
read1_fastq_gzipped: "read 1 FASTQ file as input for the pipeline, contains read 1 of paired reads"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,6 @@
"Multiome.Atac.cpu_platform_bwa":"Intel Cascade Lake",
"Multiome.Atac.num_threads_bwa":"16",
"Multiome.Atac.mem_size_bwa":"64",
"Multiome.soloMultiMappers":"Uniform"
"Multiome.soloMultiMappers":"Uniform",
"Multiome.cloud_provider":"gcp"
}
5 changes: 5 additions & 0 deletions pipelines/skylab/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 6.4.2
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved
2024-03-01 (Date of Last Commit)
* Updated the Optimus.wdl to run on Azure.


# 6.4.1
2024-02-29 (Date of Last Commit)
* Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Optimus workflow
Expand Down
98 changes: 76 additions & 22 deletions pipelines/skylab/optimus/Optimus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,16 @@ import "../../../tasks/skylab/RunEmptyDrops.wdl" as RunEmptyDrops
import "../../../tasks/skylab/CheckInputs.wdl" as OptimusInputChecks
import "../../../tasks/skylab/MergeSortBam.wdl" as Merge
import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils
import "../../../tasks/broad/Utilities.wdl" as utils

workflow Optimus {
meta {
description: "The optimus 3' pipeline processes 10x genomics sequencing data based on the v2 chemistry. It corrects cell barcodes and UMIs, aligns reads, marks duplicates, and returns data as alignments in BAM format and as counts in sparse matrix exchange format."
}

input {
String cloud_provider
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved

# Mode for counting either "sc_rna" or "sn_rna"
String counting_mode = "sc_rna"

Expand Down Expand Up @@ -45,36 +48,72 @@ workflow Optimus {

# Set to true to override input checks and allow pipeline to proceed with invalid input
Boolean force_no_check = false

# Check that tenx_chemistry_version matches the length of the read 1 fastq;
# Set to true if you expect that r1_read_length does not match length of UMIs/barcodes for 10x chemistry v2 (26 bp) or v3 (28 bp).
Boolean ignore_r1_read_length = false

# Set to Forward, Reverse, or Unstranded to account for stranded library preparations (per STARsolo documentation)
String star_strand_mode = "Forward"
# Set to true to count reads aligned to exonic regions in sn_rna mode

# Set to true to count reads aligned to exonic regions in sn_rna mode
Boolean count_exons = false

# this pipeline does not set any preemptible varibles and only relies on the task-level preemptible settings
# you could override the tasklevel preemptible settings by passing it as one of the workflows inputs
# for example: `"Optimus.StarAlign.preemptible": 3` will let the StarAlign task, which by default disables the
# usage of preemptible machines, attempt to request for preemptible instance up to 3 times.
# usage of preemptible machines, attempt to request for preemptible instance up to 3 times.
}

# version of this pipeline

String pipeline_version = "6.4.1"
String pipeline_version = "6.4.2"

# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
Array[Int] indices = range(length(r1_fastq))

# 10x parameters
File whitelist_v2 = "gs://gcp-public-data--broad-references/RNA/resources/737k-august-2016.txt"
File whitelist_v3 = "gs://gcp-public-data--broad-references/RNA/resources/3M-febrary-2018.txt"
File gcp_whitelist_v2 = "gs://gcp-public-data--broad-references/RNA/resources/737k-august-2016.txt"
File gcp_whitelist_v3 = "gs://gcp-public-data--broad-references/RNA/resources/3M-febrary-2018.txt"
File azure_whitelist_v2 = "https://datasetpublicbroadref.blob.core.windows.net/dataset/RNA/resources/737k-august-2016.txt"
File azure_whitelist_v3 = "https://datasetpublicbroadref.blob.core.windows.net/dataset/RNA/resources/3M-febrary-2018.txt"

# Takes the first read1 FASTQ from the inputs to check for chemistry match
File r1_single_fastq = r1_fastq[0]

# docker images
String picard_cloud_docker = "picard-cloud:2.26.10"
String pytools_docker = "pytools:1.0.0-1661263730"
String empty_drops_docker = "empty-drops:1.0.1-4.2"
String star_docker = "star:1.0.1-2.7.11a-1692706072"
String warp_tools_docker_1_0_1 = "warp-tools:1.0.1-1686932671"
String warp_tools_docker_1_0_5 = "warp-tools:1.0.5-1692706846"
String warp_tools_docker_1_0_6 ="warp-tools:1.0.6-1692962087"
#TODO how do we handle these?
String alpine_docker = "alpine-bash:latest"
String gcp_alpine_docker_prefix = "bashell/"
String acr_alpine_docker_prefix = "dsppipelinedev.azurecr.io/"
String alpine_docker_prefix = if cloud_provider == "gcp" then gcp_alpine_docker_prefix else acr_alpine_docker_prefix

String ubuntu_docker = "ubuntu_16_0_4:latest"
String gcp_ubuntu_docker_prefix = "gcr.io/gcp-runtimes/"
String acr_ubuntu_docker_prefix = "dsppipelinedev.azurecr.io/"
String ubuntu_docker_prefix = if cloud_provider == "gcp" then gcp_ubuntu_docker_prefix else acr_ubuntu_docker_prefix

String gcr_docker_prefix = "us.gcr.io/broad-gotc-prod/"
String acr_docker_prefix = "dsppipelinedev.azurecr.io/"

# choose docker prefix based on cloud provider
String docker_prefix = if cloud_provider == "gcp" then gcr_docker_prefix else acr_docker_prefix

# make sure either gcp or azr is supplied as cloud_provider input
if ((cloud_provider != "gcp") && (cloud_provider != "azure")) {
call utils.ErrorWithMessage as ErrorMessageIncorrectInput {
input:
message = "cloud_provider must be supplied with either 'gcp' or 'azure'."
}
}

parameter_meta {
r1_fastq: "forward read, contains cell barcodes and molecule barcodes"
r2_fastq: "reverse read, contains cDNA fragment generated from captured mRNA"
Expand All @@ -96,16 +135,21 @@ workflow Optimus {
force_no_check = force_no_check,
counting_mode = counting_mode,
count_exons = count_exons,
whitelist_v2 = whitelist_v2,
whitelist_v3 = whitelist_v3,
gcp_whitelist_v2 = gcp_whitelist_v2,
gcp_whitelist_v3 = gcp_whitelist_v3,
azure_whitelist_v2 = azure_whitelist_v2,
azure_whitelist_v3 = azure_whitelist_v3,
tenx_chemistry_version = tenx_chemistry_version,
r1_fastq = r1_single_fastq,
ignore_r1_read_length = ignore_r1_read_length
ignore_r1_read_length = ignore_r1_read_length,
cloud_provider = cloud_provider,
alpine_docker_path = alpine_docker_prefix + alpine_docker
}

call StarAlign.STARGenomeRefVersion as ReferenceCheck {
input:
tar_star_reference = tar_star_reference
tar_star_reference = tar_star_reference,
ubuntu_docker_path = ubuntu_docker_prefix + ubuntu_docker
}

call FastqProcessing.FastqProcessing as SplitFastq {
Expand All @@ -116,7 +160,8 @@ workflow Optimus {
whitelist = whitelist,
chemistry = tenx_chemistry_version,
sample_id = input_id,
read_struct = read_struct
read_struct = read_struct,
warp_tools_docker_path = docker_prefix + warp_tools_docker_1_0_1
}

scatter(idx in range(length(SplitFastq.fastq_R1_output_array))) {
Expand All @@ -131,29 +176,33 @@ workflow Optimus {
counting_mode = counting_mode,
count_exons = count_exons,
output_bam_basename = output_bam_basename + "_" + idx,
soloMultiMappers = soloMultiMappers
soloMultiMappers = soloMultiMappers,
star_docker_path = docker_prefix + star_docker
}
}
call Merge.MergeSortBamFiles as MergeBam {
input:
bam_inputs = STARsoloFastq.bam_output,
output_bam_filename = output_bam_basename + ".bam",
sort_order = "coordinate"
sort_order = "coordinate",
picard_cloud_docker_path = docker_prefix + picard_cloud_docker
}
call Metrics.CalculateGeneMetrics as GeneMetrics {
input:
bam_input = MergeBam.output_bam,
mt_genes = mt_genes,
input_id = input_id,
original_gtf = annotations_gtf,
input_id = input_id
warp_tools_docker_path = docker_prefix + warp_tools_docker_1_0_5
}

call Metrics.CalculateCellMetrics as CellMetrics {
input:
bam_input = MergeBam.output_bam,
mt_genes = mt_genes,
original_gtf = annotations_gtf,
input_id = input_id
input_id = input_id,
warp_tools_docker_path = docker_prefix + warp_tools_docker_1_0_6
}

call StarAlign.MergeStarOutput as MergeStarOutputs {
Expand All @@ -165,15 +214,17 @@ workflow Optimus {
summary = STARsoloFastq.summary,
align_features = STARsoloFastq.align_features,
umipercell = STARsoloFastq.umipercell,
input_id = input_id
input_id = input_id,
pytools_docker_path = docker_prefix + pytools_docker
}
if (counting_mode == "sc_rna"){
call RunEmptyDrops.RunEmptyDrops {
input:
sparse_count_matrix = MergeStarOutputs.sparse_counts,
row_index = MergeStarOutputs.row_index,
col_index = MergeStarOutputs.col_index,
emptydrops_lower = emptydrops_lower
emptydrops_lower = emptydrops_lower,
empty_drops_docker_path = docker_prefix + empty_drops_docker
}
}

Expand All @@ -192,7 +243,8 @@ workflow Optimus {
gene_id = MergeStarOutputs.col_index,
empty_drops_result = RunEmptyDrops.empty_drops_result,
counting_mode = counting_mode,
pipeline_version = "Optimus_v~{pipeline_version}"
pipeline_version = "Optimus_v~{pipeline_version}",
warp_tools_docker_path = docker_prefix + warp_tools_docker_1_0_6
}
}
if (count_exons && counting_mode=="sn_rna") {
Expand All @@ -202,7 +254,8 @@ workflow Optimus {
features = STARsoloFastq.features_sn_rna,
matrix = STARsoloFastq.matrix_sn_rna,
cell_reads = STARsoloFastq.cell_reads_sn_rna,
input_id = input_id
input_id = input_id,
pytools_docker_path = docker_prefix + pytools_docker
}
call H5adUtils.SingleNucleusOptimusH5adOutput as OptimusH5adGenerationWithExons{
input:
Expand All @@ -219,7 +272,8 @@ workflow Optimus {
sparse_count_matrix_exon = MergeStarOutputsExons.sparse_counts,
cell_id_exon = MergeStarOutputsExons.row_index,
gene_id_exon = MergeStarOutputsExons.col_index,
pipeline_version = "Optimus_v~{pipeline_version}"
pipeline_version = "Optimus_v~{pipeline_version}",
warp_tools_docker_path = docker_prefix + warp_tools_docker_1_0_6
}
}

Expand All @@ -246,4 +300,4 @@ workflow Optimus {
# h5ad
File h5ad_output_file = final_h5ad_output
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@
"Optimus.input_id": "pbmc_human_v3",
"Optimus.tenx_chemistry_version": "3",
"Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf",
"Optimus.star_strand_mode": "Forward"
"Optimus.star_strand_mode": "Forward",
"Optimus.cloud_provider": "gcp"
}
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@
"Optimus.input_id": "neurons2k_mouse",
"Optimus.tenx_chemistry_version": "2",
"Optimus.star_strand_mode": "Unstranded",
"Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/GRCm39/star/v2_7_10a/modified_vM32.annotation.gtf"
"Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/GRCm39/star/v2_7_10a/modified_vM32.annotation.gtf",
"Optimus.cloud_provider": "gcp"
}
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@
"Optimus.star_strand_mode": "Unstranded",
"Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/GRCm39/star/v2_7_10a/modified_vM32.annotation.gtf",
"Optimus.counting_mode": "sn_rna",
"Optimus.count_exons": true
"Optimus.count_exons": true,
"Optimus.cloud_provider": "gcp"
}
4 changes: 4 additions & 0 deletions pipelines/skylab/paired_tag/PairedTag.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# 0.2.1
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved
2024-03-01 (Date of Last Commit)
* Updated the Optimus.wdl to run on Azure. This change does not affect the PairedTag pipeline.

# 0.2.0
2024-02-29 (Date of Last Commit)
* Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Paired-tag workflow
Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/paired_tag/PairedTag.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus
import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils
import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing
workflow PairedTag {
String pipeline_version = "0.2.0"
String pipeline_version = "0.2.1"

input {
String input_id
Expand Down
5 changes: 5 additions & 0 deletions pipelines/skylab/slideseq/SlideSeq.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.2
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved
2024-03-01 (Date of Last Commit)
* Updated the Optimus.wdl to run on Azure. This change does not affect the SlideSeq pipeline.


# 3.1.1
2024-02-29 (Date of Last Commit)
* Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Slideseq workflow
Expand Down
Loading