-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10x Multiome RNA vs ATAC data from same cells confidently disagrees on donor identities #70
Comments
I tried merely rerunning the RNA-demultiplexing with the same result, and another time using --minMAF = 0 instead of 0.1 in cellsnp-lite - giving very different but still non-concordant results. My main concern is how it seems confident in these seemingly wrongly labeled cells? Using the --minMAF 0.1 I get a median of 1k vars for the barcodes using the ATAC data, and 1.7k vars using the RNA data. |
Thanks Thomas for sharing the experience. Agreed that the results on the RNA module are (probably) wrong. One potential reason is the SNPs are used, as you tried. Using Nonetheless, the RNA module usually works well. Another potential issue could local optima, so you may consider trying a large number of initializations, e.g., Yuanhua |
@koefoeden, did you ever figure out what's going on here, and optimise it to work for multiome RNA? Thanks |
@ollieeknight Unfortunately I haven't investigated it further -but will just start using it in a "larger scale" in the coming week, so will likely get some new insight. Let's keep in touch! Are you also using the Multiome kit? |
Yep, I also found out my issue. I was running |
@ollieeknight Hmm - we haven't used cellbender or any other kind of pre-processing - so should not be the case. Also relevant to you @huangyh09, we just tried to demultiplex 8 additional 10x Genomics Multiome runs, and are again experiencing that the ATAC and GEX data confidently disagree on donor identities, using now default --minMAF values. We will likely try with a higher number of initializations - but shouldn't any disagreement due to local optima be reflected in the reported probabilities? |
damn, sorry that you couldn't use the same fix. could you post the full command you're using for cellsnp-lite and vireo, for both GEx and ATAC? |
Hi @koefoeden can you also provide the coverage information, e.g., distribution of n_SNPs per cells, in both scRNA and scATAC. Do you see that the discordant cells have lower coverage in one of the modularity? Thanks both for keeping discussing and troubleshooting this. Yuanhua |
It might help to limit your VCF file for RNA to gene regions only (exons+introns) to avoid mutations out of gene regions having a negative impact on patient identification (as you shouldn't have reads there). # Create BED file which contains the whole gene regions for all genes in CellRangerARC index.
zcat /genomes/10xgenomics/CellRangerARC/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/genes/genes.gtf.gz \
| awk -F '\t' '$3 == "gene"' \
| bedtools merge \
> refdata-cellranger-arc-GRCh38-2020-A-2.0.0-gene-regions.bed
# Both BED files are the same, so we only need one to generate a 1000 Genomes file resticted to gene regions.
bcftools view \
-R refdata-cellranger-arc-GRCh38-2020-A-2.0.0-gene-regions.bed \
-O z -o 1000g_hg38_high_cov_snps_af_0.1_to_0.9.gene_regions_only.hg38.vcf.gz \
1000g_hg38_high_cov_snps_af_0.1_to_0.9.hg38.vcf.gz |
@koefoeden were you able to figure this out? @ollieeknight is there a tutorial for how you used I have 10X multiome data where the scATAC-seq component has the correct indices documented for the respective sample names, however this is, unfortunately, not the case for the scRNA-seq component. I am hoping to use the scATAC-seq component as a reference in some way to help identify what the correct sample names should be for the scRNA-seq component. Any insights or clues to help me get up to speed would be appreciated. |
For all multiome data, I now run cellranger multi for the RNA component, and cellranger-atac for the ATAC component separately. Here is what I do for cellbender/cellsnp-lite/vireo For scRNA-seq outputs from
|
Hi - thanks for a very cool tool!
So I'm trying to demultiplex a 10x Multiome run, and tried to demultiplex the data using either the ATAC or the RNA bam file. The issue is that they give completely different donor identities, while both being quite confident that they got it right. Because we sequenced it with HTO's as well, we know however that it is the ATAC which is right. Can you help me uncover what is going on here??
Please see the figures below, where I only focused on cells that were confidently called, using ATAC, HTO or RNA data.
Best, Thomas
The text was updated successfully, but these errors were encountered: