Crux Pipeline #7

max-mapper · 2022-07-13T00:11:30Z

Here's an idea for how to parallelize the building of crux dbs. All worker types plus the scheduler + combiner are different types of docker images deployed with kubernetes.

scheduler has a redis database and issues jobs to workers, and keeps track of job state
obi-downloader workers get assigned a list of SRA accessions to download, downloads them in parallel, and after downloading converts them to fasta using fasterq-dump and then builds an obitools database for them, and then stores the fasta and obitools in a ceph folder
ecopcr workers get assigned a set of obitools databases, and are given a set of primers to run, and performs ecopcr against the databases using the primers, and stores the results in a ceph folder
blastn is given a set of ecopcr output queries and a set of blast databases to query against, and runs blastn in parallel, storing results in a ceph folder
combiner takes all of the blast results from ceph, combines them all (including deprelication), and then builds a bowtie2 database which is the final output stored in ceph

There are approximately 1.2 million SRA accessions for WGS projects, and ~64 NT chunks (nt.00.tar.gz etc). So blastn workers for example will receive some subset of the 1.2 million SRA accessions, plus an assignment to BLAST against one of the 64 NT chunks

The text was updated successfully, but these errors were encountered:

max-mapper · 2022-12-12T23:15:05Z

QC command

 /home/max/miniconda3/pkgs/singularity-3.8.6-h9c2343c_0/bin/singularity exec    -B /home/max/src/crux/anacapa /home/max/src/cruxcontainer/anacapa/anacapa-1.5.0.img /bin/bash    -c "/home/max/src/crux/anacapa/anacapa_db/anacapa_QC_dada2.sh    -i /home/max/src/crux/tronko-test    -o /home/max/src/crux/tronko-test/out/12S  -d /home/max/src/crux/anacapa/anacapa_db    -f /home/max/src/crux/tronko-test/forward_primers.txt    -r /home/max/src/crux/tronko-test/reverse_primers.txt    -e /home/max/src/crux/anacapa/anacapa_db/metabarcode_loci_min_merge_length.txt    -a nextera    -t MiSeq    -l    -m 50  -q 30"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crux Pipeline #7

Crux Pipeline #7

max-mapper commented Jul 13, 2022

max-mapper commented Dec 12, 2022

Crux Pipeline #7

Crux Pipeline #7

Comments

max-mapper commented Jul 13, 2022

max-mapper commented Dec 12, 2022