A transposon sequencing protocol that selects insertions in-frame to expressed genes.
Once the Tn-Seq data is produced with the modified transposon, these can be normaly processed using your favorite insertion caller to retrieve the genome positions contiguous to the Inverted Repeat (IR) used.
We recommend the use of FASTQINS. Please follow the previous link for details on the installation of this tools. Keep in mind specific libraries are required by this tool, including standard tools commonly used in high-throughput sequencing analysis:
Fastuniq
Bowtie2
Samtools
Bedtools
Requirements to run an experiment are:
-i [fastq files with transposon mapped, if no -i2 is passed, single-end mapping by default]
-t [IR transposon sequence, expected to be found contiguous genome sequence]
-g [genome sequence, fasta or genbank format]
-o [output directory to locate the results]
As example, a pair of files that you can use to test the pipeline are included in the repository:
fastqins -i ./test/test_read2.fastq.gz -i2 ./test/test_read1.fastq.gz -t TACGGACTTTATC -g ./test/NC_000912.fna -o test -v -r 0
To see additional arguments:
fastqins --help
The following files are generated as default output:
- *_fw.qins - read counts of insertions mapping to forward strand [example]
- *_rv.qins - read counts of insertions mapping to reverse strand [example]
- *.qins - read counts of insertions mapping to both strands [example]
- *.bam - file generated with the aligned reads
- *.log - log file with general features of the process run [example]
For control libraries, no difference is expected. Please refer to our previous publication FASTQINS and ANUBIS: two bioinformatic tools to explore facts and artifacts in transposon sequencing and essentiality studies to follow the best practices in analyzing this data.
We recommend ANUBIS for the analysis of this type of data. However, simple data analysis can be performed to extract a relation of genomic annotations and relevant metrics by frame. We have included a set of useful functions in the file protinseq.py.
A small demonstration on how to apply them is included in demonstration.
Please ensure you have the most recent installation of the packages in the requirements by:
pip install -r requirements.txt
By running the previous notebook you can obtain:
- Comprehensive omics information for M. pneumoniae.
- Sample exploration.
- Table with metrics by genomic annotation including genes, smORFs and intergenic regions (used as control)
- Basic plotting of loci of interest coloring insertions by frame.
- Metagene exploration.
No special system requirements are required to run these pipelines. The presented analysis has been run in a Linux operative system and tested in MacOS and Windows running WSL. We expect a Python 3.6 or higher version to run the processes.
This project has been fully developed at Centre for Genomic Regulation at the group of Design of Biological Systems.
If you experience any problem at any step involving the program, you can use the 'Issues' page of this repository or contact:
Miravet-Verde, Samuel
Serrano, Luis
If you use the tools and workflow presented in this repository, please cite:
- FASTQINS and ANUBIS: two bioinformatic tools to explore facts and artifacts in transposon sequencing and essentiality studies
- ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs
ProTInSeq is under a common GNU GENERAL PUBLIC LICENSE. Plese, check LICENSE for further information.