A curated list of awesome Bioinformatics tools.
Table of contents
- Awesome Bioinfo-tools
- awesome-bioinformatics: some informations on Bioinformatics.
- awesome-alternative-splicing: some programs for alternative splicing analysis.
- awesome-pipeline: found the best pipeline workflow for your applications.
- awesome-deep-learning: deep learning informations.
- BBtools: BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data.
- samtools: The original samtools package has been split into three separate but tightly coordinated projects:
- htslib: C-library for handling high-throughput sequencing data
- samtools: mpileup and other tools for handling SAM, BAM, CRAM
- bcftools: calling and other tools for handling VCF, BCF
- GATK: A genomic analysis toolkit focused on variant discovery.
- EA-Utils: Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. Primarily written to support an Illumina based pipeline - but should work with any FASTQs.
- FastQC: A quality control tool for high throughput sequence data.
- FastQ Screen: FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate.
- Sickle: A windowed adaptive trimming tool for FASTQ files using quality.
- Cutadapt: Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
- bbduk: “Duk” stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool.
- trimgalore: A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries.
- trimmomatic: A flexible read trimming tool for Illumina NGS data.
- Sortmerna: Fast filtering, mapping and OTU picking.
- PEAR: PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger.
- Fastq-join: Joins two paired-end reads on the overlapping ends.
- Seq-prep: SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read.
- FLASH: Fast Length Adjustment of SHort reads
- fastq-multx: The goal of this program is to make it easier to demultiplex possibly paired-end sequences, and also to allow the "guessing" of barcode sets based on master lists of barcoding protocols (fluidigm, truseq, etc.)
- UMI-tools: This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
- NextGenSeqUtils: Notebook for demultiplexing with custom barcoded primers.
- MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report.
- BWA: BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
- Bowtie2: Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
- PANDASEQ: PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
- MPscan: MPscan: index free mapping of multiple short reads on a genome
- DIAMOND: DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
- Tophat2: TopHat is a fast splice junction mapper for RNA-Seq reads.
- STAR: Spliced Transcripts Alignment to a Reference.
- CRAC: RNA-Seq mapping software that include the discovery of transcriptomic and genomic variants like splice junction, chimeric junction, SNVs, Indels in a single analysis step using a built-in error detection method enabling high precison and sensitivity.
- Velvet: Sequence assembler for very short reads
- SPAdes: SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
- Minia: Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day.
- Trinity: Trinity assembles transcript sequences from Illumina RNA-Seq data.
- Megahit: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
- MetaVelvetSL: An extension of Velvet assembler to de novo metagenomic assembly
- MetaSPADES: Assemble metagenomic reads using the SPAdes assembler.
- Minia for metagenome: GATB-Minia-Pipeline is a de novo assembly pipeline for Illumina data. It can assemble genomes and metagenomes.
- Bandage: Bandage is a program for visualising de novo assembly graphs.
- IGV: visualization tool for interactive exploration of large, integrated genomic datasets.
- rnaQUAST: rnaQUAST is a software designed for quality evaluation and assessment of de novo transcriptome assemblies.
- VarScan: variant detection in massively parallel sequencing data.
- KisSplice: A local transcriptome assembler for SNPs, indels and AS events
- Farline: FaRLine is a pipeline to analyse the alternative splicing.
- SplAdder: SplAdder, short for Splicing Adder, a toolbox for alternative splicing analysis based on RNA-Seq alignment data.
- Whippet: Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop.
- freebayes: freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
- MaxENTScan: MaxEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions.
- MACS2: computational method used to identify areas in the genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing experiment.
- m6a Viewer: m6a Viewer is a cross-platform java application for detecting and visualising peaks in ME-RIP/ m6a-seq data.
- DeepVariant: DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
- LaBrachoR: LaBranchoR uses a LSTM network built with keras to predict the position of RNA splicing branchpoints relative to a three prime splice site.
- SpliceAI: A deep learning-based tool to identify splice variants
- SpliceAI-wrapper: SpliceAI Wrapper, is an attempt to use caching for reducing the number of required predictions. Please note that the authors of SpliceAI Wrapper are unrelated to the authors of SpliceAI.
- Portcullis: Portcullis stands for PORTable CULLing of Invalid Splice junctions from pre-aligned RNA-seq data.
- FeatureCounts: counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
- Kallisto: kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
- HTSeqCount: Analysing high-throughput sequencing data with Python
- StringTie: Transcript assembly and quantification for RNA-Seq
- RSEM: RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.
- DESeq2: Differential gene expression analysis based on the negative binomial distribution.
- EdgeR: Empirical Analysis of Digital Gene Expression Data in R.
- NBAMSeq: NBAMSeq is a Bioconductor package for differential expression analysis based on negative binomial additive model.
- NOISeq: Exploratory analysis and differential expression for RNA-seq data
- Sleuth: sleuth is a program for analysis of RNA-Seq experiments for which transcript abundances have been quantified with kallisto.
- Metagenassist: A comprehensive web server for comparative metagenomics
- MG-RAST: A Metagenomics Service for Analysis of Microbial Community Structure and Function.
- MEGAN: Metagenome Analyzer - MEGAN6 is a comprehensive toolbox for interactively analyzing microbiome data.
- vegan: Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
- DEX-seq: Inference of differential exon usage in RNA-Seq.
- KissDE: Retrieves Condition-Specific Variants in RNA-Seq Data.
- Xtail: Genome-wide assessment of differential translations with ribosome profiling data.
- Anota2Seq: Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq.
- RAPPAS: RAPPAS: Rapid alignment-free phylogenetic identification of metagenomic sequences.
- Clustalw: Multiple Sequence Alignment.
- MEGA: Molecular Evolutionary Genetics Analysis.
- MAFFT: Multiple alignment program for amino acid or nucleotide sequences.
- MUSCLE: MUSCLE stands for MUltiple Sequence Comparison by Log- Expectation.
- PhyML: PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.
- RAxML: RAxML - Randomized Axelerated Maximum Likelihood.
- FastTree: FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
- FastME: FastME provides distance algorithms to infer phylogenies.
- MrBayes: MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models.
- jModelTest2: jModelTest is a tool to carry out statistical selection of best-fit models of nucleotide substitution.
- ModelTest-NG: ModelTest-NG is a tool for selecting the best-fit model of evolution for DNA and protein alignments.
- SMS: Smart Model Selection using likelihood-based criteria (e.g., AIC).
- Aquapony: Visualization and interpretation of phylogeographic information on phylogenetic trees
- iTOL: Interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees.
- ETE: A Python framework for the analysis and visualization of trees.
- Krona: Krona allows hierarchical data to be explored with zooming, multi-layered pie charts.
- CompPhy: A web-based collaborative platform for comparing phylogenies
- Phylo.io: A web app and library for visualising and comparing phylogenetic trees.
- CIPRES: The CIPRES Science Gateway V. 3.3 is a public resource for inference of large phylogenetic trees.
- RNA-Ribo Explorer (RRE): RRE is an interactive, stand-alone, and graphical software for analysing, viewing and mining both transcriptome (typically RNA-seq) and translatome (typically Ribosome profiling or Ribo-seq) datasets.
- IGET: The Integrated Genomics Exploration Tools (IGET) website provides online access to a suite of tools for exploring biological pathways and DNA/RNA/protein regulatory elements associated with large-scale gene expression and protein behavior dynamics.
- Gephi: visualization and exploration software for all kinds of graphs and networks.
- Cytoscape: visualization of complex networks and integrating these with any type of attribute data.
- String: Protein-Protein Interaction Networks Functional Enrichment Analysis
- CD-HIT: CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
- HMMER: HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments.
- STRUCTURE: The program structure is a free software package for using multi-locus genotype data to investigate population structure.
- Trinotate: Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.
- gProfiler: g:Profiler is a public web server for characterising and manipulating gene lists.
- TransDecoder: TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
- Gene Ontology: The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes.
- KEGG: KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
- DAVID: Database for Annotation, Visualization and Integrated Discovery (DAVID).
- PANTHER: Protein ANalysis THrough Evolutionary Relationships.
- RNAcentral: The non-coding RNA sequence database
- Silva: SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
- ITS2: Internal transcribed spacer 2 (ITS2) ribosomal RNA Database
- FunGuild: Over 13,000 fungal taxa now included in the database & functional annotation tools.
- KisSplice: Training alternative splicing analysis with KisSplice & suite tools.
- QIIME2: QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.
- Mothur: This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
- Vsearch: Open source tool for metagenomics.