This pipeline analyses data sequencing data from DBS-Pro experiments for protein and PrEST quantification. The DBS-Pro method uses barcoded antibodies for surface protein quantification in droplets. For example to study single exosomes.
Overview of DBS-Pro pipeline run on three samples.
The pipeline takes input of single end FASTQs with a construct such as those specified in standard constructs. For each sample the DBS is extracted (extract_dbs
) and clustered (dbs_cluster
) to enable error correction of the DBS sequences (correct_dbs
). At the same time the ABC and UMI are extracted from the same read (extract_abc_umi
)and then the UMIs are demultiplexed based on their ABC (demultiplex_abc
). For each ABC the UMIs are grouped by DBS then clustered to correct errors (umi_cluster
). Finaly the corrected sequences are combined into a read specific DBS, ABC and UMI combination that are tallied to create the final output in the form of a TSV (integrate
). If there are multiple sampels these are also merged to generate a combined TSV (merge_data
). A final report is also generated to enable some basic QC of the data. Also see the demo for a step-by-step of a typical workflow.
DBS: Droplet Barcode Sequence. Reads sharing this sequence originate from the same droplet.
ABC: Antibody Barcodes Sequence. Identifies which antibody was present in the droplet.
UMI: Unique Molecular Identifier. Identifies how many antibodies with a particular ABC that was present in the droplet.
First, make sure conda is installed on your system.
-
Clone the git repository.
git clone https://github.com/FrickTobias/DBS-Pro
-
Move into the git folder and install all dependencies in a conda environment.
cd DBS-Pro
For reproducibility the
*.lock
files are used.2.1. For OSX use:
conda create --name dbspro --file environment.osx-64.lock
2.2. For LINUX use:
conda create --name dbspro --file environment.linux-64.lock
2.3. Using flexible dependancies (Not recommended)
conda env create --name dbspro --file environment.yml
This option will likely introduce newer versions the softwares and depenencies which have not yet been tested.
-
Activate the conda environment.
conda activate dbspro
-
Install the dbspro package.
pip install .
For development, please use
pip install -e .[dev]
.
Prepare a FASTA with each of the antibody barcodes used in your experiment. The entry name will be used to define the
targets. Also make sure that each sequence is prepended with ^
, this is used for demultiplexing. See the example FASTA below:
>ABC01
^ATGCTG
>ABC02
^GTAGAT
>ABC03
^CTAGCA
Use dbspro init
to create an analysis folder. Provide the FASTA with the antibody barcodes (here named ABCs.fasta
),
an directory name and one or more FASTQ for the samples.
dbspro init --abc ABCs.fasta <output-folder> <sample1.fastq>
If you have several samples you could also provide a CSV file in the line format: </path/to/sample.fastq>,<sample_name>
.
This enables you to name your samples as you wish. With a CSV the initialization is as follows:
dbspro init --abc ABCs.fasta --sample-csv samples.csv <output-folder>
Once the directory has been successfully initialized, moving into the directory
cd <output-folder>
and check the current (default) configs using
dbspro config
Any changes to the configs should be primaraly be done through the dbspro config
command to validate the parameters. You can check the construct layout by running dbspro config --print-construct
. Some standard constructs are also defined, see Standard constructs. Once the configs are updated you are ready to run the full analysis using this command.
dbspro run
For more information on how to run use dbspro run -h
.
The main output is a TSV file data.tsv.gz
with the following columns:
Column name | Description |
---|---|
Barcode |
The DBS sequence |
Target |
Target name (accuired from ABC FASTA headers) |
UMI |
The UMI sequence |
ReadCount |
Number of reads with this DBS, Target and UMI combination |
Sample |
Sample name |
For convenience, anndata h5ad
files with count matrices are also generated for each sample. These can be used for downstream analysis using Scanpy. To import the data use the following code:
import scanpy as sc
adata = sc.read_h5ad("mysample.h5ad")
adata
The pipeline also generates a report report.html
with some basic QC metrics.
The most common construct are included as presets which can be initialized using the -c/--construct
parameter in dbspro config
. Currently available constructs include:
Sequence: 5'-CGATGCTAATCAGATCA BDVHBDVHBDVHBDVHBDVH AAGAGTCAATAGACCATCTAACAGGATTCAGGTA XXXXX NNNNNN TTATATCACGACAAGAG-3'
Name: | H1 | | DBS | | H2 | |ABC| |UMI | | H3 |
Size (bp): | 17 | | 20 | | 34 | | 5 | | 6 | | 17 |
This is the DBS-Pro construct used in the publication Stiller et al. 2019.
Sequence: 5'-CAGTCTGAGCGGTTCAACAGG BDVHBDVHBDVHBDVHBDVH GCGGTCGTGCTGTATTGTCTCCCACCATGACTAACGCGCTTG XXXXX NNNNNN CACCTGACGCACTGAATACGC-3'
Name: | H1 | | DBS | | H2 | |ABC| |UMI | | H3 |
Size (bp): | 21 | | 20 | | 42 | | 5 | | 6 | | 21 |
This is the DBS-Pro construct used in the publication Banijamali et al. 2022.
Sequence: 5'-NNNNNNNNNNNNNNN ACCTGAGACATCATAATAGCA XXXXX NNNNNN CATTACTAGGAATCACACGCAGAT-3'
Name: | DBS | | H2 | |ABC| |UMI | | H3 |
Size (bp): | 15 | | 21 | | 5 | | 6 | | 24 |
This is the construct used in the article Wu et al. 2019 which introduces the Proximity Barcoding Assay (PBA).
A short demostration of the pipeline and some downstream analysis is available in the following Jupyter Notebook. This can also be used to test that the conda environment is properly setup.
For notes on development see doc/development.
Checkout version v0.1 for the pipeline used in:
Version v0.3 was used in: