Please see Colonial One's help page. For working through the terminal, please complete the Codecademy's Command Line Tutorial. For help related to GW's HPC (Colonial One), please see our help page.
See Colonial One's wiki page. You will run all FONtANA related stuff on lustre
(NOT your home directory). You can access lustre
by such: cd /lustre/podinigrp
.
You will then want to create a directory for each person individually. Do so by doing mkdir -p $name
, so for example I would do mkdir -p kgibson
and now I would have a directory named kgibson
in this directory.
See this help page. Where it shows $name
insert your colonialone id. Mine is kmgibson
.
# here we are logging into Colonial one
ssh [email protected]
$enter_password
# here we are moving into the correct directory
cd /lustre/groups/podinigrp
# here we are creating a directory for you and creating a samples directory
mkdir -p $name/fontana
cd $name
We need to download all the snakemake files and other files from github. See here for help on cloning an existing repository.
# make sure you are in the correct folder. Your folder
pwd
# should be in /$name directory. If not, move there.
git clone [email protected]:kmgibson/FONTANA.git fontana
cd fontana
ls
- Download the
BAM
files for the IonTorrent data from the server. - Upload them to Colonial One like so: This is done on your own computer. Not within Colonial One.
# Example:
rsync -P -av "MP" [email protected]:/lustre/groups/podinigrp/kgibson/fontana/samples
This is saying upload directory "MP" to Colonial One at this location. You will need to change the directory name to whatever yours is and change kgibson to whatever your Colonial One ID is. So fill in $name and $directory in the one below:
rsync -P -av "$directory" [email protected]:/lustre/groups/podinigrp/$name/fontana/samples
If you have done this successfully, your files on your computer in $directory
will be uploaded to Colonial One under the directory samples/$directory
.
- Getting a list of all samples, renaming the files and creating a directory for each sample.
pwd # you should be in fontana directory
cd samples/$directory
module load samtools
for f in *.bam; do
name=$(echo $f |awk -F[_.] '{print $2}')
samp=$(samtools view -H $f | grep "SM:" | cut -f10 | sort | uniq | awk -F[:.] '{print $2}')
echo -e "${name}\t${samp}" >> dirsamp.txt # this gets a list of all samples
echo $f
done
# view this list
cat dirsamp.txt
# this loop creates directories and renames the file
for f in *.bam; do
name=$(echo $f |awk -F[_.] '{print $2}')
samp=$(samtools view -H $f | grep "SM:" | cut -f10 | sort | uniq | awk -F[:.] '{print $2}' | cut -d " " -f1)
echo -e "$f\t$name\t$samp"
mkdir -p Ion_${name}_${samp}
mv $f Ion_${name}_${samp}
mv Ion_${name}_${samp}/$f Ion_${name}_${samp}/original.bam
done
# creates a list of just the directory names. This file is used in FONtANA.
ls -d Ion_* > ../list.txt
# this moves the samples to the correct directory
mv Ion_* ../
rm dirsamp.txt
Your directory structure should now look like such:
/lustre/podinigrp/$name/fontana/
├── samples
│ ├── Ion_$samp1
| | ├── original.bam
│ └── Ion_$samp2
| └── original.bam
├── refs
│ ├── NexteraPE-PE.fa
│ └── adapters_SE.fa
├── envs
│ ├── haplotype.yaml
│ ├── processIllumina.yaml
│ ├── processIonTorrent.yaml
│ └── references.yaml
├── scripts
│ ├── checkbamoutput.sh
│ └── filter_spanning_reads.py
├── Snakefile
├── Snakefile.haplotype
├── Snakefile.processIllumina
├── Snakefile.processIonTorrent
├── Snakefile.references
├── environment.yaml
├── config.yaml
└── README.md
We need to create a new conda environment for FONtANA. See here for information regarding bioconda. See here for information about conda channels.
# load modules you will use.
module use /groups/podinigrp/shared/modulefiles
module unload python
module load miniconda3/4.3.27.1
# set conda channels.
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# create conda enironment.
conda env create --name fontana --file environment.yaml
- Get an interactive node. You need to allocate a node for yourself, because I have not made this automatic (into a job) yet. See this help page for getting an interactive node.
Basic commands are below. You can replace the defq
option with either short
or debug
.
# look for idle nodes.
sinfo
# request a node
salloc -p defq -N 1 -t 500
# find the node you got
squeue
# insert number where $num is, the number is on the allocated node you obtained.
ssh node$num
# You need to move back to the correct directory
cd /lustre/podinigrp/$name/
- Loading an environment for FONtANA.
module use /groups/podinigrp/shared/modulefiles
module unload python
module load miniconda3/4.3.27.1
conda activate fontana
# if that ^^ didn't work. Try:
# source activate fontana
- Editing the
config.yaml
file for your preferences and files.
# view the config.yaml file
cat config.yaml
You should see something like this:
You will need to replace the bedfile listed here with whatever bedfile you would like to use. I have put two that I have been using in the refs
directory if you would like to use them.
A bedfile should look like this. For more information regarding bedfiles, see here, here, here or here:
chr# start_pos end_pos MH_name score strand_direction . track_lines
Example:
chr1 1486826 1487087 mh01KK-172 0 + . GENE_ID=SP_1.27585
chr1 216634369 216634540 mh01kk-002 0 + . GENE_ID=SP_5.6236
chr2 228092294 228092491 mh02KK-136 0 + . GENE_ID=SP_29.8907
chr9 135862470 135862637 mh09KK-157 0 + . GENE_ID=SP_45.18475
- Testing to see what FONtANA will run
A dry run of FONtANA, to see what the program is going to do, and if your samples are set up correctly. Make sure this dryrun finishes correctly (all green text and no errors thrown).
snakemake --use-conda --dryrun
Should look something like this to begin with:
Running FONtANA:
snakemake --use-conda
Output should look like this:
Other -
/groups/podinigrp/shared/apps/miniconda3/4.3.27.1
Create documentation with https://readthedocs.org/