-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) #318
Comments
Hello Velocyto, BAM file is from 10X generated by Cell Ranger v3.1. Reference genome is refdata-gex-mm10-2020-A.tar.gz from 10X. Repeat annotation file is mm10_rmsk.gtf from UCSC. Is it because cell ranger v3.1 is too old and doesn't have the same chromosome name as refdata-gex-mm10-2020-A.tar.gz and mm10_rmsk.gtf? But when I check the chromosome names in BAM, I found it is ChrM, the same as refdata-gex-mm10-2020-A.tar.gz and mm10_rmsk.gtf? Then why the hint shows the .bam file refers to a chromosome 'MT-'? Here is the code to check chromosome names in bam. |
Hello velocyto,
Because most people cannot do cell ranger, is it possible to fix this bug in the next release of velocyto.py? |
@hyjforesight --- Thank you so much for looking into this! Hopefully @gioelelm will have some time to recommend a way to do this more seamlessly. 🙌 My cellsorted_possorted and possorted bam and gtf files indeed have chrM instead of chrMT
(1) Did you just "find and replace", use some script, or bioSyntax extension in VS Code to change the naming of chrM to chrMT in the reference genome folder, including genome.fa, genome.fa.fai, genes.gtf, chrName.txt, chrNamelength.txt files? No change to genes.pickle nor other files in the /refdata-gex-GRCh38-2020-A/star directory? (2) How did you change chrM to chrMT in rmsk.gtf? I guess it will be similar to how it is done in (1). (3) Subsequently, I understand we will use the name-changed files to redo the cellranger count as "we should start from the fastq file to generate new bam file" with these new gtfs (velocyto-team/velocyto.R#96). To do this, did you simply use the --transcriptome=/opt/refdata-gex-GRCh38-2020-A \ with this new folder, and nothing needing to be done with the new name-changed masked gtf in cellranger, just needing to input this new masked gtf to run10x? Ref: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count Sorry for basic questions. Thank you so much for pointing me here in my issue #320 (I still have issue of it not finding the index file for the cellsorted bam yet it still generated the loom file --- #321 🥵). |
Updated: For (1) --- Use VS Code to Find and Replace. And, yes, there is nothing "chrM" in any other files, e.g., genes.pickle etc. For (2) --- Do similarly as (1) |
Hi @denvercal1234GitHub, |
Thank you very much, @hyjforesight! Did you do some additional filtering of the barcodes.tsv? Because usually cellranger will only name it barcodes.tsv, and not filtered_barcodes. Given the memory limitation, I needed to samtools sort the possorted_.. bam file first before running velocyto run. Do you by chance know if I can still use the index file (bam.bai) that came originally with the possorted_ bam file? Or I will need to index the manually sorted bam? |
@denvercal1234GitHub |
Humm... I think the bam file from cellranger is actually only sorted by position, and velocyto then sorts it by cell (CB). That is why it renames it to cellsorted_...bam. That is why if memory is an issue, then it is recommended to cellsorted first then run run10x. |
hello @denvercal1234GitHub To make it sure, let me try CB sorting individually and let you know whether the results are consistent. BTW, 16GB memory is enough for velocyto.py |
hope velocyto team can fix this BUG in the next release, just change searching chrMT to search chrM. |
Hi,
Do you have any suggestion? |
@AY-LIANG I am not sure, but between which that the gene expression did not match? What do you meant by 10X results? You meant between run and run10X? |
@denvercal1234GitHub I use scanpy to call the highest expression genes,and there are some differences between the loom file and CellRanger result. It seems that some gene information is missing during velocyto.I guess these warnings might be the reason. |
hi @AY-LIANG ldata = scv.read(filename.loom, cache=True)
adata = scv.utils.merge(adata, ldata) |
@hyjforesight |
Acoording to this comment, and cellranger has been updated for serveral years, incongruences between .gtf and .bam are now removed, thus this issue may be solved by deleting two line of code of counter.py: |
thank you @0106WeiWeiDeng I just tested your suggestion (deleting the two lines of code in counter.py) and the velocyto run seems to be fine, at least I have mitochondrial genes again. |
Hi @AY-LIANG, I know this post is quite old, but I just want to add that this warning happens because the code for masking will check if every read in bam file and see if the chromosome it is mapped is in the gtf file or not. In the case of reads mapped to regions like GL000194.1, it is in fasta file, but not gtf file, so it will generate the warning. |
Hello velocyto,
I transfer 10X bam file into loom file by velocyto.py and use Scanpy to do cell clustering. However, when doing the data processing by Scanpy, I found there are no mito genes included in loom file, see below.
There is a similar question raised here (velocyto-team/velocyto.R#96). I understand that "mito genes do not have introns, so no velocyto-type analysis can be done on these genes.----carlosf79". But the data processing by Scanpy is the pre-operation before RNA velocity analysis.
In my understanding, transferring bam file to loom file by velocyto.py is just re-doing the alignment to the reference genome by including the information of spliced and unspliced mRNAs.
If no mito genes were included in loom file, does it mean that the method to make loom file by velocyto is wrong, becasue the strategy of velocyto.py seems like "we check whether the genes have spliced and unspliced mRNAs, if not, removes the genes", rather than "we align the genes, and then check whether the genes have spliced or unspliced mRNAs, if not, mark this gene's spliced and unspliced mRNAs=0". If the first strategy is the truth, it means that velocyto.py removes thousands of genes because only 15-25% genes' unspliced RNAs can be detected in scRNA-seq, which means this strategy is a little biased.
So velocyto generated loom file cannot be used for Scanpy??
Thanks!
Best,
YJ
The text was updated successfully, but these errors were encountered: