First thing: To do the hands-on work directly, you need a running Linux system (or terminal on a Mac, or "Windows Subsystem for Linux"), and decent disk space (~20 GB) and RAM (~8 GB).
It might also be a good idea to work on the USV virtual machines! (remember the slides that were circulated earlier!)
# you can use this slightly modified command to connect w/ X11 display forwarding
# depending on your local machine, this should work directly...
ssh -X fmsb@<ip-adress-you-choose>
Here are some example commands and things you can try on your own. Always remember:
- use auto completion as often as possible, you can always use the tab/tabulator key to get suggestions for a command you are typing and to auto-complete folder/file names and paths - it's much faster and less error prone! (preventing typos!)
- prevent whitespaces in all folder and file names! You can use whitespaces in general, but it will complicate your work on a Linux system! Use
-
,_
, etc... instead, e.g.new-file.txt
- be always careful when you delete a folder or file! It's not as easy as on windows to get your data back!
Now, open a terminal and try the following commands.
# when opening a new terminal, you always start in your home directory
# the following command shows the current path you are located (remember the tree-like structure of folders on a linux system!)
pwd
# create a new directory
mkdir testdir
# change into that new directory
cd testdir
# check where you are located now
pwd
# generate a new empty file
touch genome.fasta
# list content of the current directory
ls
# list more details, in a human readable format
ls -lah
# write some content into that file
printf ">Sequence\nATCGTACGTACGTAC\n" > genome.fasta
# check content of the file
cat genome.fasta
# change to your home directory
# ~ is a short version of /home/$USER
cd ~
# check again the content of the file you created
# now you have to type the full path to find the file! Use auto-complete! Here we use the so-called relative path
cat testdir/genome.fasta
# you can also use the absolute path
cat /home/$USER/testdir/genome.fasta
# Hint: $USER is a so-called variable. To see the content of a variable you can also use echo:
echo $USER
# in $USER your terminal stored the information about the current user running the session. You can also define your own variables, for example you could store the absolute path to your file in a variable for easier re-usage:
GENOME=/home/$USER/testdir/genome.fasta
cat $GENOME
# please notice that we always use a leading $ sign when we want to access the content of a variable! See the difference:
echo GENOME
echo $GENOME
# generate another file
touch genome2.fasta
# copy the file to the test folder
cp genome2.fasta testdir/
# list the content of the test folder
ls -lah testdir/
# remove the original file we just generated in your home dir
rm genome2.fasta
# is it gone?
ls -lah
# however, remember we copied the file so a copy of the file we just deleted is still in the test folder
ls -lah testdir/
- Mamba is a packaging manager that will help us to install bioinformatics tools and to handle their dependencies automatically
- Mamba works together with the conda package manager, and makes installing packages faster
- You will use the mamba command to create environments and install packages, and conda command for some other package management tasks like configuration and activating environments (yes it can be a bit confusing)
- Hint: You can create as many environments as you want! It is often convenied to have separate environments for separate tasks, pipelines, or even tools
- In the terminal enter:
# Switch to a directory with enough space, this can be /scratch on a HPC or
# your ~ (remember that's short for /home/$USER) on your laptop
cd /home/$USER
# make a new folder called 'nanopore-workshop'
mkdir nanopore-workshop
# switch to this folder
cd nanopore-workshop
# Download mamba installer
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh"
# ATTENTION: the space in your home directory might be limited (e.g. 10 GB) and per default conda installs tools into ~/.conda/envs
# Thus, take care of your disk space!
# On the HPC you can take care of this by moving ~/.conda to /scratch and making a symlink from your home directory:
# mv ~/.conda /scratch/dot-conda
# ln -s /scratch/dot-conda ~/.conda
# Run installer
bash Mambaforge-Linux-x86_64.sh
# Use space to scroll down the license agreement
# then type 'yes'
# accept the default install location with ENTER
# when asked whether to initialize mamba type 'yes'
# Now start a new shell or simply reload your current shell via
bash
# You should now be able to create environments, install tools and run them
- Set up mamba
# add repository channels for bioconda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
- Create and activate a new conda/mamba environment
# -n parameter to specify the name
mkdir -p envs
mamba create -p envs/qc
# activate this environment
mamba activate envs/qc
# or if that does not work, fall-back to
conda activate envs/qc
# You should now see (qc) at the start of each line.
# You switched from the default 'base' environment to the 'qc' environment.
# Which is placed in a folder envs/qc
- Note: Bioinformatics tools are regulary updated and input parameters might change (use
--help
or-h
to see the manual for a tool!) - Install most of them into our environment
- we will already install many tools that we will use over the next days!
mkdir -p envs
mamba create -y -p envs/qc nanoplot filtlong minimap2 samtools igv
conda activate envs/qc
# test
NanoPlot --help
minimap2 --version
Reminder: You can also install specific versions of a tool!
- important for full reproducibility
- e.g.
mamba install minimap2==2.26
- per default,
mamba
will try to install the newest tool version based on your configured channels and system architecture and dependencies to other tools
Below are just example paths, you can also adjust them and use other folder names! Assuming you are on a Linux system on a local machine (laptop, workstation):
# Switch to a path on your system where you want to store your data and results, for example
cd /home/$USER
# Create new folder (if not already done)
mkdir nanopore-workshop
cd nanopore-workshop
Attention: the Docker
part will not work on a VM or HPC. But you can try it on your own machine with Docker
. OR you use Singularity
instead.
Check the small example at https://github.com/hoelzer/nf_example. Clone the repository using git
. If the command is not available: try sudo apt install git
git clone https://github.com/hoelzer/nf_example.git
cd nf_example
Then investigate the Dockerfile
and try to build the container image locally using docker build .
. Remember that you can also give your container image a specific name using the -t
parameter.
Install nextflow
, for example directly from https://nextflow.io/ or using conda
or mamba
.
curl -s https://get.nextflow.io | bash
# (it creates a file nextflow in the current dir which you can place also somewhere else)
# check if it worked
./nextflow -version
Try to get the little nextflow
example workflow running. The workflow is using sourmash
so you either need to install the dependency or provide an available container image, see these code lines.
If you want to use Singularity
instead of Docker
, change the appropriate code lines in the nextflow.config
. Also, you need to install Singularity
which you can either do via mamba install singularity
(make sure you are in an activated environment where you want to install it) or using sudo apt-get
.