Demultiplexing multiomic sequencing data #95

DHelix · 2024-04-01T07:00:52Z

First of all, huge thanks for developing Vireo!
I've been testing it using a synthetic pool (3 donors), and I've noticed a high number of unassigned cells, particularly from one donor, based on scRNA-seq data alone. I found a potential solution by combining scRNA and scATAC data to increase the coverage, described in this #39 (comment):
"... you can use bcftools concat if you have *cells.vcf.gz (by using --genotype in cellsnp-lite). Alternatively, you may try combining the sparse matrices directly."

So I tried:

Ran cellsnp-lite on scRNA and scATAC data separately, with --genotype
Sorted and indexed the two cellSNP.cells.vcf.gz files, generated in Step 1:

# scRNA
bcftools sort \
-m 2G \
-o ./scRNA/cellSNP.cells.vcf.sort.gz \
-O z9 \
-T TMP_DIR \
--write-index \
./scRNA/cellSNP.cells.vcf.gz

# scATAC
bcftools sort \
-m 2G \
-o ./scATAC/cellSNP.cells.vcf.sort.gz \
-O z9 \
-T TMP_DIR \
--write-index \
./scATAC/cellSNP.cells.vcf.gz

Concatenated the two cellSNP.cells.vcf.sort.gz files

bcftools concat \
--allow-overlaps \
-o ./scRNA_scATAC/cellSNP.cells.vcf.gz \
-O z9 \
--threads 32 \
./scRNA/cellSNP.cells.vcf.sort.gz \
./scATAC/cellSNP.cells.vcf.sort.gz

Ran Vireo on the concatenated cellSNP.cells.vcf.gz file

vireo \
-c ./scRNA_scATAC/cellSNP.cells.vcf.gz \
-N 3 \
-o ./scRNA_scATAC/sd1 \
--randSeed=1 \
-p 16

When I ran Vireo separately on the scRNA and scATAC data (providing the cellsnp-lite output folders, rather than the cellSNP.cells.vcf.gz files), it worked well and usually finished in < 20 mins. However, when I demultiplexed using the combined cellSNP.cells.vcf.gz file, it ran for several hours and finally got the following error:

[vireo] Loading cell VCF file ...
[vireo] Demultiplex 18491 cells to 3 donors with 908898 variants.
Traceback (most recent call last):
  File "/projects/Installs/python_virtualenv/vireo/bin/vireo", line 8, in <module>
    sys.exit(main())
  File "/projects/Installs/python_virtualenv/vireo/lib/python3.7/site-packages/vireoSNP/vireo.py", line 209, in main
    nproc=options.nproc)
  File "/projects/Installs/python_virtualenv/vireo/lib/python3.7/site-packages/vireoSNP/utils/vireo_wrap.py", line 76, in vireo_wrap
    pool = multiprocessing.Pool(processes = nproc)
  File "/linux-x86_64-centos7/python-3.7.2/lib/python3.7/multiprocessing/context.py", line 117, in Pool
    from .pool import Pool
  File "/linux-x86_64-centos7/python-3.7.2/lib/python3.7/multiprocessing/pool.py", line 17, in <module>
    import queue
  File "/linux-x86_64-centos7/python-3.7.2/lib/python3.7/queue.py", line 16, in <module>
    from _queue import Empty
ImportError: /linux-x86_64-centos7/python-3.7.2/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so: failed to map segment from shared object: Cannot allocate memory

I'm hoping you could give me some suggestions:

Did I do it correctly?
Could you please provide more details on "Alternatively, you may try combining the sparse matrices directly"?
What's the best approach to combine scRNA and scATAC for demultiplexing?
Do you think combining scRNA and scATAC data can also improve doublet detection?

Thanks a lot for your time!

The text was updated successfully, but these errors were encountered:

DHelix · 2024-04-03T00:12:25Z

Hi,
It seems that the cellSNP.cells.vcf.gz file generated by concatenating the scRNA and scATACcellSNP.cells.vcf.gz files using bcftools concat is too large (740M).
I wonder if it's possible to generate the cellSNP.tag.AD.mtx, cellSNP.tag.DP.mtx, cellSNP.base.vcf.gz, and cellSNP.samples.tsv files from the cellSNP.cells.vcf.gz file?
Thanks!

huangyh09 · 2024-04-05T03:19:44Z

Hi, it looks like after concatenating, you got 908898 SNPs, which is quite a lot.

If your scATAC is better covered, you may consider demultiplexing just with scATAC. Also, the inferred genotype there can be used as input for demultiplexing scRNA if needed.

In either case, I never tested these and it only based on experiences in other settings, so your results may be different.

Yuanhua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demultiplexing multiomic sequencing data #95

Demultiplexing multiomic sequencing data #95

DHelix commented Apr 1, 2024

DHelix commented Apr 3, 2024

huangyh09 commented Apr 5, 2024

Demultiplexing multiomic sequencing data #95

Demultiplexing multiomic sequencing data #95

Comments

DHelix commented Apr 1, 2024

DHelix commented Apr 3, 2024

huangyh09 commented Apr 5, 2024