Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to use the reads rather than UMIs for counting SNPs in spatial transcriptome (10x Visium) #133

Open
wJDKnight opened this issue Aug 2, 2024 · 5 comments

Comments

@wJDKnight
Copy link

I am working on calling variants in spatial transcriptomics data (10x Visium). Since the sequencing depth of spatial transcriptome is poorer than single-cell data, I want to treat all reads in the bam independently. Therefore, I used --UMItag None. That means, I changed the code from this (using the default UMI tag)

cellsnp-lite -s $OUT_BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p ${n_processes} --minMAF 0.05 --minCOUNT 20 --gzip --genotype

to this (using UMItag None)

cellsnp-lite -s $OUT_BAM -b $BARCODE --UMItag None -O $OUT_DIR -R $REGION_VCF -p ${n_processes} --minMAF 0.05 --minCOUNT 20 --gzip --genotype

I expected a higher sequencing depth (DP) in the output VCF but it wasn't. The overall DP decreased.

Could it be because of some filtering criteria? When should I use --countORPHAN?

@hxj5
Copy link
Collaborator

hxj5 commented Aug 2, 2024

Hi, thanks for the detailed feedback. The --exclFLAG option probably matters in this case. It is used for read filtering based on BAM FLAGs: skip reads with any mask bits set. Default is UNMAP,SECONDARY,QCFAIL (when use UMI) or UNMAP,SECONDARY,QCFAIL,DUP (otherwise).

In other words, when you set --UMItag None, by default the reads marked as duplicates in FLAG will be filtered. To keep these reads, you can manually set "--exclFLAG", e.g., to --exclFLAG 772.

It is not recommended to use --countORPHAN in pair-end sequencing. You may check out the details of all the read filtering options in the manual.

@wJDKnight
Copy link
Author

Thank you very much for such a quick response. I will check the usage of the "--exclFLAG" and update feedback later.

@wJDKnight
Copy link
Author

By using that flag, the overall DP increases to three times what it was before. It seems to be working well. Thanks a lot. The cellsnp-lite is really a very nice tool.

@wJDKnight wJDKnight reopened this Aug 16, 2024
@wJDKnight
Copy link
Author

Though I got a larger DP by including DUP, I am wondering why excluding DP will decrease the DP. Here is an example of a loci with 4 reads in one UMI group.
截屏2024-08-16 下午8 28 50
In scenario A, DP for that loci will 1. In C, it will be 4. I think B should be 2, am I right?
But in the real data, I found that the DP of B is smaller than A, for every loci they both detected. How does that happen?

@hxj5
Copy link
Collaborator

hxj5 commented Aug 17, 2024

Hi, --exclFALG option simply filters the reads by checking whether the DUP bit is set in the sam FLAG. In the above example, if all the three "blue" reads are masked as DUP, they will all be filtered and the count for them is 0 instead of 1. Following this rule, the DP for scenario B will be between 0 (if all the 4 reads are set DUP) and 4 (if none of the reads is DUP), based on their FLAG.

Cellsnp-lite totally relies on the FLAG set by the upstream alignment tool. You may further check the FLAG of the reads to investigate the DP difference between the three scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants