You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, huge thanks for developing Vireo!
I've been testing it using a synthetic pool (3 donors), and I've noticed a high number of unassigned cells, particularly from one donor, based on scRNA-seq data alone. I found a potential solution by combining scRNA and scATAC data to increase the coverage, described in this #39 (comment):
"... you can use bcftools concat if you have *cells.vcf.gz (by using --genotype in cellsnp-lite). Alternatively, you may try combining the sparse matrices directly."
So I tried:
Ran cellsnp-lite on scRNA and scATAC data separately, with --genotype
Sorted and indexed the two cellSNP.cells.vcf.gz files, generated in Step 1:
When I ran Vireo separately on the scRNA and scATAC data (providing the cellsnp-lite output folders, rather than the cellSNP.cells.vcf.gz files), it worked well and usually finished in < 20 mins. However, when I demultiplexed using the combined cellSNP.cells.vcf.gz file, it ran for several hours and finally got the following error:
[vireo] Loading cell VCF file ...
[vireo] Demultiplex 18491 cells to 3 donors with 908898 variants.
Traceback (most recent call last):
File "/projects/Installs/python_virtualenv/vireo/bin/vireo", line 8, in <module>
sys.exit(main())
File "/projects/Installs/python_virtualenv/vireo/lib/python3.7/site-packages/vireoSNP/vireo.py", line 209, in main
nproc=options.nproc)
File "/projects/Installs/python_virtualenv/vireo/lib/python3.7/site-packages/vireoSNP/utils/vireo_wrap.py", line 76, in vireo_wrap
pool = multiprocessing.Pool(processes = nproc)
File "/linux-x86_64-centos7/python-3.7.2/lib/python3.7/multiprocessing/context.py", line 117, in Pool
from .pool import Pool
File "/linux-x86_64-centos7/python-3.7.2/lib/python3.7/multiprocessing/pool.py", line 17, in <module>
import queue
File "/linux-x86_64-centos7/python-3.7.2/lib/python3.7/queue.py", line 16, in <module>
from _queue import Empty
ImportError: /linux-x86_64-centos7/python-3.7.2/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so: failed to map segment from shared object: Cannot allocate memory
I'm hoping you could give me some suggestions:
Did I do it correctly?
Could you please provide more details on "Alternatively, you may try combining the sparse matrices directly"?
What's the best approach to combine scRNA and scATAC for demultiplexing?
Do you think combining scRNA and scATAC data can also improve doublet detection?
Thanks a lot for your time!
The text was updated successfully, but these errors were encountered:
Hi,
It seems that the cellSNP.cells.vcf.gz file generated by concatenating the scRNA and scATACcellSNP.cells.vcf.gz files using bcftools concat is too large (740M).
I wonder if it's possible to generate the cellSNP.tag.AD.mtx, cellSNP.tag.DP.mtx, cellSNP.base.vcf.gz, and cellSNP.samples.tsv files from the cellSNP.cells.vcf.gz file?
Thanks!
Hi, it looks like after concatenating, you got 908898 SNPs, which is quite a lot.
If your scATAC is better covered, you may consider demultiplexing just with scATAC. Also, the inferred genotype there can be used as input for demultiplexing scRNA if needed.
In either case, I never tested these and it only based on experiences in other settings, so your results may be different.
Hi @huangyh09,
First of all, huge thanks for developing Vireo!
I've been testing it using a synthetic pool (3 donors), and I've noticed a high number of unassigned cells, particularly from one donor, based on scRNA-seq data alone. I found a potential solution by combining scRNA and scATAC data to increase the coverage, described in this #39 (comment):
"... you can use bcftools concat if you have *cells.vcf.gz (by using --genotype in cellsnp-lite). Alternatively, you may try combining the sparse matrices directly."
So I tried:
cellsnp-lite
on scRNA and scATAC data separately, with--genotype
cellSNP.cells.vcf.gz
files, generated in Step 1:cellSNP.cells.vcf.sort.gz
filescellSNP.cells.vcf.gz
fileWhen I ran Vireo separately on the scRNA and scATAC data (providing the
cellsnp-lite
output folders, rather than thecellSNP.cells.vcf.gz
files), it worked well and usually finished in < 20 mins. However, when I demultiplexed using the combinedcellSNP.cells.vcf.gz
file, it ran for several hours and finally got the following error:I'm hoping you could give me some suggestions:
Thanks a lot for your time!
The text was updated successfully, but these errors were encountered: