10x Multiome RNA vs ATAC data from same cells confidently disagrees on donor identities #70

koefoeden · 2022-08-26T10:23:40Z

Hi - thanks for a very cool tool!

So I'm trying to demultiplex a 10x Multiome run, and tried to demultiplex the data using either the ATAC or the RNA bam file. The issue is that they give completely different donor identities, while both being quite confident that they got it right. Because we sequenced it with HTO's as well, we know however that it is the ATAC which is right. Can you help me uncover what is going on here??

Please see the figures below, where I only focused on cells that were confidently called, using ATAC, HTO or RNA data.

Best, Thomas

koefoeden · 2022-08-26T10:34:38Z

I tried merely rerunning the RNA-demultiplexing with the same result, and another time using --minMAF = 0 instead of 0.1 in cellsnp-lite - giving very different but still non-concordant results. My main concern is how it seems confident in these seemingly wrongly labeled cells?

Using the --minMAF 0.1 I get a median of 1k vars for the barcodes using the ATAC data, and 1.7k vars using the RNA data.

huangyh09 · 2022-08-27T06:33:43Z

Thanks Thomas for sharing the experience. Agreed that the results on the RNA module are (probably) wrong. One potential reason is the SNPs are used, as you tried. Using --minMAF = 0 is probably not ideal and --minMAF 0.1 (or even a bit higher, e.g., 0.15) would be a better threshold to identify SNPs with different genotypes between donors. Generally, ATAC may cover the whole gene body while RNA only covers the 3' or 5' of the transcript so may lose a proportion of informative SNPs.

Nonetheless, the RNA module usually works well. Another potential issue could local optima, so you may consider trying a large number of initializations, e.g., -M 200.

Yuanhua

ollieeknight · 2023-05-02T12:32:22Z

@koefoeden, did you ever figure out what's going on here, and optimise it to work for multiome RNA? Thanks

koefoeden · 2023-05-11T13:03:02Z

@ollieeknight Unfortunately I haven't investigated it further -but will just start using it in a "larger scale" in the coming week, so will likely get some new insight. Let's keep in touch! Are you also using the Multiome kit?

ollieeknight · 2023-05-31T10:55:39Z

Yep, I also found out my issue. I was running cellbender on each modality separately, to basically find the venn diagram of high quality of each modality and where they intersect - this was messing up the order of barcodes and once I set the order everything was much better, and works perfectly for both RNA and ATAC. hope this helps you?

koefoeden · 2023-06-04T19:18:01Z

@ollieeknight Hmm - we haven't used cellbender or any other kind of pre-processing - so should not be the case. Also relevant to you @huangyh09, we just tried to demultiplex 8 additional 10x Genomics Multiome runs, and are again experiencing that the ATAC and GEX data confidently disagree on donor identities, using now default --minMAF values. We will likely try with a higher number of initializations - but shouldn't any disagreement due to local optima be reflected in the reported probabilities?

ollieeknight · 2023-06-05T06:16:06Z

damn, sorry that you couldn't use the same fix. could you post the full command you're using for cellsnp-lite and vireo, for both GEx and ATAC?

huangyh09 · 2023-06-08T02:07:01Z

@ollieeknight Hmm - we haven't used cellbender or any other kind of pre-processing - so should not be the case. Also relevant to you @huangyh09, we just tried to demultiplex 8 additional 10x Genomics Multiome runs, and are again experiencing that the ATAC and GEX data confidently disagree on donor identities, using now default --minMAF values. We will likely try with a higher number of initializations - but shouldn't any disagreement due to local optima be reflected in the reported probabilities?

Hi @koefoeden can you also provide the coverage information, e.g., distribution of n_SNPs per cells, in both scRNA and scATAC. Do you see that the discordant cells have lower coverage in one of the modularity?

Thanks both for keeping discussing and troubleshooting this.

Yuanhua

ghuls · 2023-08-21T15:53:25Z

It might help to limit your VCF file for RNA to gene regions only (exons+introns) to avoid mutations out of gene regions having a negative impact on patient identification (as you shouldn't have reads there).

# Create BED file which contains the whole gene regions for all genes in CellRangerARC index.
zcat /genomes/10xgenomics/CellRangerARC/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/genes/genes.gtf.gz \
  | awk -F '\t' '$3 == "gene"' \
  | bedtools merge \
  > refdata-cellranger-arc-GRCh38-2020-A-2.0.0-gene-regions.bed

# Both BED files are the same, so we only need one to generate a 1000 Genomes file resticted to gene regions.
bcftools view \
    -R refdata-cellranger-arc-GRCh38-2020-A-2.0.0-gene-regions.bed \
    -O z -o 1000g_hg38_high_cov_snps_af_0.1_to_0.9.gene_regions_only.hg38.vcf.gz \
    1000g_hg38_high_cov_snps_af_0.1_to_0.9.hg38.vcf.gz

Pancreas-Pratik · 2024-12-16T06:16:22Z

@koefoeden were you able to figure this out?

@ollieeknight is there a tutorial for how you used cellblender together with vireo in your comment above (#70 (comment))? If there isn't could you at the very least just link the tutorials you "pieced" together, please?

I have 10X multiome data where the scATAC-seq component has the correct indices documented for the respective sample names, however this is, unfortunately, not the case for the scRNA-seq component. I am hoping to use the scATAC-seq component as a reference in some way to help identify what the correct sample names should be for the scRNA-seq component.

Any insights or clues to help me get up to speed would be appreciated.
Thank you very much in advance.

ollieeknight · 2024-12-18T11:32:35Z

@Pancreas-Pratik

For all multiome data, I now run cellranger multi for the RNA component, and cellranger-atac for the ATAC component separately. Here is what I do for cellbender/cellsnp-lite/vireo

For scRNA-seq outputs from `cellranger multi`

cellbender remove-background \
        --cuda \
        --input ${feature_matrix_path} \
        --output ${project_outs}/${library}/cellbender/output.h5

cellsnp-lite \
        -s ${project_outs}/${library}/outs/per_sample_outs/${library}/count/sample_alignments.bam \
        -b ${project_outs}/${library}/cellbender/output_cell_barcodes.csv \
        -O ${project_outs}/${library}/vireo \
        -R /opt/SNP/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz \
        --minMAF 0.1 \
        --minCOUNT 20 \
        --gzip \
        -p $(nproc)

vireo \
        -c ${project_outs}/${library}/vireo \
        -o ${project_outs}/${library}/vireo \
        -N $n_donors \
        -p $(nproc)

When I then analyse this in R, you need to change the barcode names, which I do with

libraries <- readRDS('llist_of_RNA_seurat_objects.rds')

barcode_mapping <- data.frame(
  gene_expression_barcodes = readLines('~/group/work/bin/cellranger-arc-2.0.2/lib/python/cellranger/barcodes/737K-arc-v1.txt.gz'),
  atac_barcodes = readLines('~/group/work/bin/cellranger-arc-2.0.2/lib/python/atac/barcodes/737K-arc-v1.txt.gz')
)

# We need to correct barcodes, as different ones are used for ATAC and GEX in DOGMA/MULTIOME
for (i in 1:length(libraries)) {
  libraries[[i]] <- RenameCells(libraries[[i]], new.names = (colnames(libraries[[i]]) %>% gsub(pattern = '-1', replacement = '')))
  
  new_names <- barcode_mapping$atac_barcodes[
    match(
      colnames(libraries[[i]]),
      barcode_mapping$gene_expression_barcodes
    )
  ]
  libraries[[i]] <- RenameCells(libraries[[i]], new.names = paste0(new_names, '_', i))
}

rm(barcode_mapping, new_names)

Hope this is helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10x Multiome RNA vs ATAC data from same cells confidently disagrees on donor identities #70

10x Multiome RNA vs ATAC data from same cells confidently disagrees on donor identities #70

koefoeden commented Aug 26, 2022 •

edited

Loading

koefoeden commented Aug 26, 2022

huangyh09 commented Aug 27, 2022

ollieeknight commented May 2, 2023

koefoeden commented May 11, 2023

ollieeknight commented May 31, 2023

koefoeden commented Jun 4, 2023

ollieeknight commented Jun 5, 2023

huangyh09 commented Jun 8, 2023

ghuls commented Aug 21, 2023 •

edited

Loading

Pancreas-Pratik commented Dec 16, 2024

ollieeknight commented Dec 18, 2024

10x Multiome RNA vs ATAC data from same cells confidently disagrees on donor identities #70

10x Multiome RNA vs ATAC data from same cells confidently disagrees on donor identities #70

Comments

koefoeden commented Aug 26, 2022 • edited Loading

koefoeden commented Aug 26, 2022

huangyh09 commented Aug 27, 2022

ollieeknight commented May 2, 2023

koefoeden commented May 11, 2023

ollieeknight commented May 31, 2023

koefoeden commented Jun 4, 2023

ollieeknight commented Jun 5, 2023

huangyh09 commented Jun 8, 2023

ghuls commented Aug 21, 2023 • edited Loading

Pancreas-Pratik commented Dec 16, 2024

ollieeknight commented Dec 18, 2024

For scRNA-seq outputs from cellranger multi

koefoeden commented Aug 26, 2022 •

edited

Loading

ghuls commented Aug 21, 2023 •

edited

Loading

For scRNA-seq outputs from `cellranger multi`