Skip to content

UC Davis 2021 Exercise 1

GenomeRIK edited this page Nov 21, 2021 · 9 revisions

Setup and running TAMA Collapse on aligned cluster/polish reads

cp -r /home/genomerik/exercises_tama .

git clone https://github.com/GenomeRIK/tama.git

Download hg38.fa reference: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz

Load modules

  samtools/1.10
  biopython/1.71

TAMA Collapse for mapped FLNC sequences

go to folder

  exercises_tama

Look at TAMA Collapse bash script This is for running TAMA Collapse

  run_tama_collapse.sh

Script should contain

  spath='/home/genomerik/tama/'
  pscript='tama_collapse.py'
  fpath='/home/genomerik/test/'
  sam='mm2_alz_flnc_hg38_sort.bam'
  fasta='/home/genomerik/ref_files/hg38.fa'
  prefix=`echo ${sam} | sed 's/\.bam//' | awk '{print "tc_nc_lde220_"$1}' `
  capflag='no_cap'
  echo "python ${spath}${pscript} -s ${fpath}${sam} -f ${fasta} -p ${prefix} -d merge_dup -x ${capflag} -a 100 -z 100 -sj sj_priority -lde 2 -sjt 20 -log log_off -b BAM"
  python ${spath}${pscript} -s ${fpath}${sam} -f ${fasta} -p ${prefix} -d merge_dup -x ${capflag} -a 100 -z 100 -sj sj_priority -lde 2 -sjt 20 -log log_off -b BAM

It is using paths from GenomeRIK's folder but you can change the paths to reflect the locations within your folder structure.

run script

  sh run_tama_collapse.sh

Summary bash script This provides a summary of the resulting annotation bed12 file.

Script should contain

  file=$1
  echo "Genes"
  cat ${file} | awk -F "\t" '{print $4}' | awk -F ";" '{print $1}' | sort | uniq | wc -l
  echo "Transcripts"
  cat ${file} | awk -F "\t" '{print $4}' | awk -F ";" '{print $2}' | sort | uniq | wc -l
  echo "Multi-exon Transcripts"
  cat ${file} | awk -F "\t" '{if($10>1)print $4}' | awk -F ";" '{print $2}' | sort | uniq | wc -l
  echo "Multi-exon Genes"
  cat ${file} |  awk -F "\t" '{if($10>1)print $4}' |  awk -F ";" '{print $1}' | sort | uniq | wc -l 
  run script 

You can run like so

  sh run_summary_bed.sh tc_nc_lde220_mm2_alz_flnc_hg38_sort.bed

Filter annotation file for only chromosome level scaffolds.

Note that this is not a part of TAMA but we are doing this to make some results easier to understand.

Bash script should contain

  file='tc_nc_lde220_mm2_alz_flnc_hg38_sort.bed'
  outfile='tc_nc_lde220_mm2_alz_flnc_hg38_sort_chrom_cleanup.bed'
  cat ${file} | grep -v "_" > ${outfile}

Run bash script

  sh run_chrom_cleanup.sh