We aligned the Omni-C data to both assemblies following the [Arima Genomics Mapping Pipeline] (https://github.com/ArimaGenomics/mapping_pipeline) and then scaffold both assemblies with SALSA.
- You have been able to run the Arima Genomics Mapping Pipeline on each of the assemblies and you have generated the final deduplicated and sorted
BAM
file (sufix of this file is*.d.s.bam
) - There is a directory hierarchy starting at
WD
. Where:WD
asm
: Assembliesaln
: Alignments
BAM
file (sufix of this file is*.d.s.bam
)- Genome assembly file (
FASTA
)
SALSA
- samtools
bamToBed
(frombedtools
)- Conda environment for SALSA dependencies here
REF=assembly.fasta
conda activate salsa
mkdir $WD/scaffolding
# Prepping alignments for SALSA
REF="$WD/asm/assembly.fasta"
bamToBed -i ${WD}/aln/assembly.omnic.d.s.bam \
> ${WD}/scaffolding/assembly.omnic.bed &&
sort -k 4 ${WD}/scaffolding/assembly.omnic.bed \
> ${WD}/scaffolding/assembly.omnic_tmp.bed &&
mv ${WD}/scaffolding/assembly.omnic_tmp.bed ${WD}/scaffolding/assembly.omnic.bed &&
samtools faidx $REF &&
python /usr/local/src/SALSA/run_pipeline.py -a $REF \
-l $REF.fai \
-b $WD/scaffolding/${REFNAME}.${VERSION}.${ASM}.${DATA}.bed -e DNASE \
-o $WD/scaffolding/salsa_${REFNAME}_${VERSION}_${ASM} \
-i 20 -p yes \
&> $WD/scaffolding/salsa_assembly.log &&
cp $WD/scaffolding/salsa_assembly/scaffolds_FINAL.fasta \
$WD/asm/new.assembly.fasta &
conda activate salsa
- For
SALSA
:- Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017;18(1):527.
- Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Koren S. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019;15(8):e1007273.
- For
samtools
: - For
bedtools
: