Awesome Bioinfo-tools

A curated list of awesome Bioinformatics tools.

Table of contents

Awesome Bioinfo-tools

Awesome existing topics related to bioinformatics

awesome-bioinformatics: some informations on Bioinformatics.
awesome-alternative-splicing: some programs for alternative splicing analysis.
awesome-pipeline: found the best pipeline workflow for your applications.
awesome-deep-learning: deep learning informations.

[top↑]

Suite tools

BBtools: BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data.
samtools: The original samtools package has been split into three separate but tightly coordinated projects:
- htslib: C-library for handling high-throughput sequencing data
- samtools: mpileup and other tools for handling SAM, BAM, CRAM
- bcftools: calling and other tools for handling VCF, BCF
GATK: A genomic analysis toolkit focused on variant discovery.
EA-Utils: Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. Primarily written to support an Illumina based pipeline - but should work with any FASTQs.

[top↑]

Quality analysis & trimming tools

quality analysis checking

FastQC: A quality control tool for high throughput sequence data.
FastQ Screen: FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate.

trimming

Sickle: A windowed adaptive trimming tool for FASTQ files using quality.
Cutadapt: Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
bbduk: “Duk” stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool.
trimgalore: A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries.
trimmomatic: A flexible read trimming tool for Illumina NGS data.
Sortmerna: Fast filtering, mapping and OTU picking.

read merger

PEAR: PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger.
Fastq-join: Joins two paired-end reads on the overlapping ends.
Seq-prep: SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read.
FLASH: Fast Length Adjustment of SHort reads

demultiplexing

fastq-multx: The goal of this program is to make it easier to demultiplex possibly paired-end sequences, and also to allow the "guessing" of barcode sets based on master lists of barcoding protocols (fluidigm, truseq, etc.)
UMI-tools: This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
NextGenSeqUtils: Notebook for demultiplexing with custom barcoded primers.

[top↑]

Multiviewer

MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report.

[top↑]

Mapping tools

aligner

BWA: BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Bowtie2: Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
PANDASEQ: PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
MPscan: MPscan: index free mapping of multiple short reads on a genome
DIAMOND: DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.

splice-aligner

Tophat2: TopHat is a fast splice junction mapper for RNA-Seq reads.
STAR: Spliced Transcripts Alignment to a Reference.
CRAC: RNA-Seq mapping software that include the discovery of transcriptomic and genomic variants like splice junction, chimeric junction, SNVs, Indels in a single analysis step using a built-in error detection method enabling high precison and sensitivity.

[top↑]

Assembly tools

Genome & Transcriptome de novo assembly

Velvet: Sequence assembler for very short reads
SPAdes: SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
Minia: Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day.
Trinity: Trinity assembles transcript sequences from Illumina RNA-Seq data.

Metagenome & Metatranscriptome assembly

Megahit: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
MetaVelvetSL: An extension of Velvet assembler to de novo metagenomic assembly
MetaSPADES: Assemble metagenomic reads using the SPAdes assembler.
Minia for metagenome: GATB-Minia-Pipeline is a de novo assembly pipeline for Illumina data. It can assemble genomes and metagenomes.

Viewers

Bandage: Bandage is a program for visualising de novo assembly graphs.
IGV: visualization tool for interactive exploration of large, integrated genomic datasets.

Correction tools

rnaQUAST: rnaQUAST is a software designed for quality evaluation and assessment of de novo transcriptome assemblies.

[top↑]

Variant calling & alternative splicing tools

variant calling

VarScan: variant detection in massively parallel sequencing data.
KisSplice: A local transcriptome assembler for SNPs, indels and AS events
Farline: FaRLine is a pipeline to analyse the alternative splicing.
SplAdder: SplAdder, short for Splicing Adder, a toolbox for alternative splicing analysis based on RNA-Seq alignment data.
Whippet: Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop.
freebayes: freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

Motif discovery

MaxENTScan: MaxEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions.

Peak calling

MACS2: computational method used to identify areas in the genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing experiment.
m6a Viewer: m6a Viewer is a cross-platform java application for detecting and visualising peaks in ME-RIP/ m6a-seq data.

Learning tools

DeepVariant: DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
LaBrachoR: LaBranchoR uses a LSTM network built with keras to predict the position of RNA splicing branchpoints relative to a three prime splice site.
SpliceAI: A deep learning-based tool to identify splice variants
SpliceAI-wrapper: SpliceAI Wrapper, is an attempt to use caching for reducing the number of required predictions. Please note that the authors of SpliceAI Wrapper are unrelated to the authors of SpliceAI.

Correction tools

Portcullis: Portcullis stands for PORTable CULLing of Invalid Splice junctions from pre-aligned RNA-seq data.

[top↑]

Counting tools

FeatureCounts: counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
Kallisto: kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
HTSeqCount: Analysing high-throughput sequencing data with Python
StringTie: Transcript assembly and quantification for RNA-Seq
RSEM: RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.

[top↑]

Statistical analysis tools

RNA-seq

DESeq2: Differential gene expression analysis based on the negative binomial distribution.
EdgeR: Empirical Analysis of Digital Gene Expression Data in R.
NBAMSeq: NBAMSeq is a Bioconductor package for differential expression analysis based on negative binomial additive model.
NOISeq: Exploratory analysis and differential expression for RNA-seq data
Sleuth: sleuth is a program for analysis of RNA-Seq experiments for which transcript abundances have been quantified with kallisto.

Metagenomics

Metagenassist: A comprehensive web server for comparative metagenomics
MG-RAST: A Metagenomics Service for Analysis of Microbial Community Structure and Function.
MEGAN: Metagenome Analyzer - MEGAN6 is a comprehensive toolbox for interactively analyzing microbiome data.

Metabarcoding | Community Ecology

vegan: Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Alternative-splicing

DEX-seq: Inference of differential exon usage in RNA-Seq.
KissDE: Retrieves Condition-Specific Variants in RNA-Seq Data.

RIBO-seq

Xtail: Genome-wide assessment of differential translations with ribosome profiling data.
Anota2Seq: Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq.

[top↑]

Phylogenomics

Aligner

RAPPAS: RAPPAS: Rapid alignment-free phylogenetic identification of metagenomic sequences.
Clustalw: Multiple Sequence Alignment.
MEGA: Molecular Evolutionary Genetics Analysis.
MAFFT: Multiple alignment program for amino acid or nucleotide sequences.
MUSCLE: MUSCLE stands for MUltiple Sequence Comparison by Log- Expectation.

Phylogenetic inference

PhyML: PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.
RAxML: RAxML - Randomized Axelerated Maximum Likelihood.
FastTree: FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
FastME: FastME provides distance algorithms to infer phylogenies.
MrBayes: MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models.

Model test

jModelTest2: jModelTest is a tool to carry out statistical selection of best-fit models of nucleotide substitution.
ModelTest-NG: ModelTest-NG is a tool for selecting the best-fit model of evolution for DNA and protein alignments.
SMS: Smart Model Selection using likelihood-based criteria (e.g., AIC).

Visualization

Aquapony: Visualization and interpretation of phylogeographic information on phylogenetic trees
iTOL: Interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees.
ETE: A Python framework for the analysis and visualization of trees.
Krona: Krona allows hierarchical data to be explored with zooming, multi-layered pie charts.

Tree comparison

CompPhy: A web-based collaborative platform for comparing phylogenies
Phylo.io: A web app and library for visualising and comparing phylogenetic trees.

Platform

CIPRES: The CIPRES Science Gateway V. 3.3 is a public resource for inference of large phylogenetic trees.

[top↑]

Others

Exploration tools

RNA-Ribo Explorer (RRE): RRE is an interactive, stand-alone, and graphical software for analysing, viewing and mining both transcriptome (typically RNA-seq) and translatome (typically Ribosome profiling or Ribo-seq) datasets.
IGET: The Integrated Genomics Exploration Tools (IGET) website provides online access to a suite of tools for exploring biological pathways and DNA/RNA/protein regulatory elements associated with large-scale gene expression and protein behavior dynamics.

Network & Interaction visualisation

Gephi: visualization and exploration software for all kinds of graphs and networks.
Cytoscape: visualization of complex networks and integrating these with any type of attribute data.
String: Protein-Protein Interaction Networks Functional Enrichment Analysis

Clustering & homology

CD-HIT: CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
HMMER: HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments.
STRUCTURE: The program structure is a free software package for using multi-locus genotype data to investigate population structure.

Annotations tools

Trinotate: Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.
gProfiler: g:Profiler is a public web server for characterising and manipulating gene lists.
TransDecoder: TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Ontology & Pathway databases

Gene Ontology: The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes.
KEGG: KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
DAVID: Database for Annotation, Visualization and Integrated Discovery (DAVID).
PANTHER: Protein ANalysis THrough Evolutionary Relationships.
RNAcentral: The non-coding RNA sequence database

Metabarcoding databases

Silva: SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
ITS2: Internal transcribed spacer 2 (ITS2) ribosomal RNA Database
FunGuild: Over 13,000 fungal taxa now included in the database & functional annotation tools.

[top↑]

Specific workflow

Alternative splicing

KisSplice: Training alternative splicing analysis with KisSplice & suite tools.

Community analysis

QIIME2: QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.
Mothur: This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
Vsearch: Open source tool for metagenomics.

[top↑]

Bioinformatic analysis informations

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

RipollJ/awesome-bioinfo-tools

Folders and files

Latest commit

History

Repository files navigation

Awesome Bioinfo-tools

Awesome existing topics related to bioinformatics

Suite tools

Quality analysis & trimming tools

quality analysis checking

trimming

read merger

demultiplexing

Multiviewer

Mapping tools

aligner

splice-aligner

Assembly tools

Genome & Transcriptome de novo assembly

Metagenome & Metatranscriptome assembly

Viewers

Correction tools

Variant calling & alternative splicing tools

variant calling

Motif discovery

Peak calling

Learning tools

Correction tools

Counting tools

Statistical analysis tools

RNA-seq

Metagenomics

Metabarcoding | Community Ecology

Alternative-splicing

RIBO-seq

Phylogenomics

Aligner

Phylogenetic inference

Model test

Visualization

Tree comparison

Platform

Others

Exploration tools

Network & Interaction visualisation

Clustering & homology

Annotations tools

Ontology & Pathway databases

Metabarcoding databases

Specific workflow

Alternative splicing

Community analysis

Bioinformatic analysis informations

Metagenomic

Metatranscriptomic

Metabarcoding

Alternative-splicing

Ribo-seq

Merip-seq

mi-CLIP

Proteomics

MASS-SPEC

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages