Skip to content

FrickTobias/DBS-Pro

Repository files navigation

CI Linux CI MacOS

DBS-Pro Analysis

About

This pipeline analyses data sequencing data from DBS-Pro experiments for protein and PrEST quantification. The DBS-Pro method uses barcoded antibodies for surface protein quantification in droplets. For example to study single exosomes.

DBS-Pro pipeline overview

Overview of DBS-Pro pipeline run on three samples.

The pipeline takes input of single end FASTQs with a construct such as those specified in standard constructs. For each sample the DBS is extracted (extract_dbs) and clustered (dbs_cluster) to enable error correction of the DBS sequences (correct_dbs). At the same time the ABC and UMI are extracted from the same read (extract_abc_umi)and then the UMIs are demultiplexed based on their ABC (demultiplex_abc). For each ABC the UMIs are grouped by DBS then clustered to correct errors (umi_cluster). Finaly the corrected sequences are combined into a read specific DBS, ABC and UMI combination that are tallied to create the final output in the form of a TSV (integrate). If there are multiple sampels these are also merged to generate a combined TSV (merge_data). A final report is also generated to enable some basic QC of the data. Also see the demo for a step-by-step of a typical workflow.

DBS: Droplet Barcode Sequence. Reads sharing this sequence originate from the same droplet.
ABC: Antibody Barcodes Sequence. Identifies which antibody was present in the droplet.
UMI: Unique Molecular Identifier. Identifies how many antibodies with a particular ABC that was present in the droplet.

Setup

First, make sure conda is installed on your system.

  1. Clone the git repository.

    git clone https://github.com/FrickTobias/DBS-Pro
    
  2. Move into the git folder and install all dependencies in a conda environment.

    cd DBS-Pro
    

    For reproducibility the *.lock files are used.

    2.1. For OSX use:

    conda create --name dbspro --file environment.osx-64.lock
    

    2.2. For LINUX use:

    conda create --name dbspro --file environment.linux-64.lock
    

    2.3. Using flexible dependancies (Not recommended)

    conda env create --name dbspro --file environment.yml
    

    This option will likely introduce newer versions the softwares and depenencies which have not yet been tested.

  3. Activate the conda environment.

    conda activate dbspro
    
  4. Install the dbspro package.

    pip install .
    

    For development, please use pip install -e .[dev].

Usage

Prepare a FASTA with each of the antibody barcodes used in your experiment. The entry name will be used to define the targets. Also make sure that each sequence is prepended with ^, this is used for demultiplexing. See the example FASTA below:

>ABC01
^ATGCTG
>ABC02
^GTAGAT
>ABC03
^CTAGCA

Use dbspro init to create an analysis folder. Provide the FASTA with the antibody barcodes (here named ABCs.fasta), an directory name and one or more FASTQ for the samples.

dbspro init --abc ABCs.fasta <output-folder> <sample1.fastq>

If you have several samples you could also provide a CSV file in the line format: </path/to/sample.fastq>,<sample_name>. This enables you to name your samples as you wish. With a CSV the initialization is as follows:

dbspro init --abc ABCs.fasta --sample-csv samples.csv <output-folder>

Once the directory has been successfully initialized, moving into the directory

cd <output-folder>

and check the current (default) configs using

dbspro config

Any changes to the configs should be primaraly be done through the dbspro config command to validate the parameters. You can check the construct layout by running dbspro config --print-construct. Some standard constructs are also defined, see Standard constructs. Once the configs are updated you are ready to run the full analysis using this command.

dbspro run

For more information on how to run use dbspro run -h.

Output files

The main output is a TSV file data.tsv.gz with the following columns:

Column name Description
Barcode The DBS sequence
Target Target name (accuired from ABC FASTA headers)
UMI The UMI sequence
ReadCount Number of reads with this DBS, Target and UMI combination
Sample Sample name

For convenience, anndata h5ad files with count matrices are also generated for each sample. These can be used for downstream analysis using Scanpy. To import the data use the following code:

import scanpy as sc
adata = sc.read_h5ad("mysample.h5ad")
adata

The pipeline also generates a report report.html with some basic QC metrics.

Standard constructs

The most common construct are included as presets which can be initialized using the -c/--construct parameter in dbspro config. Currently available constructs include:

dbspro_v1

Sequence: 5'-CGATGCTAATCAGATCA BDVHBDVHBDVHBDVHBDVH AAGAGTCAATAGACCATCTAACAGGATTCAGGTA XXXXX NNNNNN TTATATCACGACAAGAG-3'
Name:        |       H1      | |       DBS        | |               H2               | |ABC| |UMI | |       H3      |
Size (bp):   |       17      | |        20        | |               34               | | 5 | | 6  | |       17      |

This is the DBS-Pro construct used in the publication Stiller et al. 2019.

dbspro_v2

Sequence: 5'-CAGTCTGAGCGGTTCAACAGG BDVHBDVHBDVHBDVHBDVH GCGGTCGTGCTGTATTGTCTCCCACCATGACTAACGCGCTTG XXXXX NNNNNN CACCTGACGCACTGAATACGC-3'
Name:        |         H1        | |       DBS        | |                   H2                   | |ABC| |UMI | |         H3        |
Size (bp):   |         21        | |        20        | |                   42                   | | 5 | | 6  | |         21        |

This is the DBS-Pro construct used in the publication Banijamali et al. 2022.

pba

Sequence: 5'-NNNNNNNNNNNNNNN ACCTGAGACATCATAATAGCA XXXXX NNNNNN CATTACTAGGAATCACACGCAGAT-3'
Name:        |     DBS     | |         H2        | |ABC| |UMI | |          H3          |
Size (bp):   |      15     | |         21        | | 5 | | 6  | |          24          |

This is the construct used in the article Wu et al. 2019 which introduces the Proximity Barcoding Assay (PBA).

Demo

A short demostration of the pipeline and some downstream analysis is available in the following Jupyter Notebook. This can also be used to test that the conda environment is properly setup.

Development

For notes on development see doc/development.

Publications

Checkout version v0.1 for the pipeline used in:

Stiller, C., Aghelpasand, H., Frick, T., Westerlund, K., Ahmadian, A., & Eriksson Karlström, A. (2019). Fast and efficient Fc-specific photoaffinity labelling to produce antibody-DNA-conjugates. Bioconjugate chemistry.

Version v0.3 was used in:

Banijamali, M., Höjer, P., Nagy, A., Hååg, P., Gomero, E. P., Stiller, C., Kaminskyy, V. O., Ekman, S., Lewensohn, R., Karlström, A. E., Viktorsson, K., & Ahmadian, A. (2022). Characterizing Single Extracellular Vesicles by Droplet Barcode Sequencing for Protein Analysis. Journal of Extracellular Vesicles, e12277.