no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) #318

hyjforesight · 2021-11-12T02:36:56Z

Hello velocyto,
I transfer 10X bam file into loom file by velocyto.py and use Scanpy to do cell clustering. However, when doing the data processing by Scanpy, I found there are no mito genes included in loom file, see below.

There is a similar question raised here (velocyto-team/velocyto.R#96). I understand that "mito genes do not have introns, so no velocyto-type analysis can be done on these genes.----carlosf79". But the data processing by Scanpy is the pre-operation before RNA velocity analysis.

In my understanding, transferring bam file to loom file by velocyto.py is just re-doing the alignment to the reference genome by including the information of spliced and unspliced mRNAs.

If no mito genes were included in loom file, does it mean that the method to make loom file by velocyto is wrong, becasue the strategy of velocyto.py seems like "we check whether the genes have spliced and unspliced mRNAs, if not, removes the genes", rather than "we align the genes, and then check whether the genes have spliced or unspliced mRNAs, if not, mark this gene's spliced and unspliced mRNAs=0". If the first strategy is the truth, it means that velocyto.py removes thousands of genes because only 15-25% genes' unspliced RNAs can be detected in scRNA-seq, which means this strategy is a little biased.
So velocyto generated loom file cannot be used for Scanpy??

Thanks!
Best,
YJ

hyjforesight · 2021-11-15T17:10:29Z

Hello Velocyto,
When running velocyto.py, it shows:
"WARNING - The .bam file refers to a chromosome 'MT+' not present in the annotation (.gtf) file"
"WARNING - The .bam file refers to a chromosome 'MT-' not present in the annotation (.gtf) file"

BAM file is from 10X generated by Cell Ranger v3.1. Reference genome is refdata-gex-mm10-2020-A.tar.gz from 10X. Repeat annotation file is mm10_rmsk.gtf from UCSC.

Is it because cell ranger v3.1 is too old and doesn't have the same chromosome name as refdata-gex-mm10-2020-A.tar.gz and mm10_rmsk.gtf? But when I check the chromosome names in BAM, I found it is ChrM, the same as refdata-gex-mm10-2020-A.tar.gz and mm10_rmsk.gtf? Then why the hint shows the .bam file refers to a chromosome 'MT-'?

Here is the code to check chromosome names in bam.
$ samtools view /home/hyjforesight/mybam.bam | cut -f3 | perl -ne 'print unless $seen{$_}++' -
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY

hyjforesight · 2021-11-18T06:31:48Z

Hello velocyto,
I solved this bug.

it's not caused by samtools. Please check the response from samtools.
The .bam file refers to a chromosome 'MT+' not present in the annotation (.gtf) file. samtools/samtools#1545
it's probably caused by velocyto.py using chrMT to search chrM when doing alignment. chrM is the default name of mito chromosome in bam generated by 10X cell ranger by aligning fastq to the reference genome.
Solution: regenerate bam by cell ranger count pipeline, change all naming of chrM to chrMT in the reference genome folder, including genome.fa, genome.fa.fai, gene.gtf, chrName.txt, chrNamelength.txt files. I did all these renaming by VS Code. By these means, the bam file will include chrMT name. See below.
Use this new bam file for velocyto.py. Also change chrM to chrMT in rmsk.gtf. The new loom will include all infomation as 10X 3 file outputs did.
Enjoy the new loom file for downstream analysis. scVelo on the same UMAP of Scanpy.

Because most people cannot do cell ranger, is it possible to fix this bug in the next release of velocyto.py?
Thanks!
Best,
YJ

denvercal1234GitHub · 2021-11-20T13:40:01Z

@hyjforesight --- Thank you so much for looking into this! Hopefully @gioelelm will have some time to recommend a way to do this more seamlessly. 🙌

My cellsorted_possorted and possorted bam and gtf files indeed have chrM instead of chrMT

samtools view cellsorted_possorted_genome_bam.bam | cut -f3 | perl -ne 'print unless $seen{$_}++' -

(1) Did you just "find and replace", use some script, or bioSyntax extension in VS Code to change the naming of chrM to chrMT in the reference genome folder, including genome.fa, genome.fa.fai, genes.gtf, chrName.txt, chrNamelength.txt files? No change to genes.pickle nor other files in the /refdata-gex-GRCh38-2020-A/star directory?

(2) How did you change chrM to chrMT in rmsk.gtf? I guess it will be similar to how it is done in (1).

(3) Subsequently, I understand we will use the name-changed files to redo the cellranger count as "we should start from the fastq file to generate new bam file" with these new gtfs (velocyto-team/velocyto.R#96). To do this, did you simply use the --transcriptome=/opt/refdata-gex-GRCh38-2020-A \ with this new folder, and nothing needing to be done with the new name-changed masked gtf in cellranger, just needing to input this new masked gtf to run10x?

Ref: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count

Sorry for basic questions. Thank you so much for pointing me here in my issue #320 (I still have issue of it not finding the index file for the cellsorted bam yet it still generated the loom file --- #321 🥵).

denvercal1234GitHub · 2021-11-20T15:50:10Z

Updated:

For (1) --- Use VS Code to Find and Replace. And, yes, there is nothing "chrM" in any other files, e.g., genes.pickle etc.

For (2) --- Do similarly as (1)

hyjforesight · 2021-11-21T02:08:26Z

Hi @denvercal1234GitHub,
for (3), cellranger 6.1.2 outputs have no outs/analys folder, which is required for run10x command in velocyto.py. so i use this command velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf with the new bam file, and chrM replaced gtf.
BTW, the new bam file is important. Start from fastq. Run cell ranger count pipeline by yourself (need 90GB memory and 200GB free disk). Get new bam. Then velocyto.py.
Thanks!
Best,
YJ

denvercal1234GitHub · 2021-11-21T13:21:57Z

Thank you very much, @hyjforesight! Did you do some additional filtering of the barcodes.tsv? Because usually cellranger will only name it barcodes.tsv, and not filtered_barcodes.

Given the memory limitation, I needed to samtools sort the possorted_.. bam file first before running velocyto run. Do you by chance know if I can still use the index file (bam.bai) that came originally with the possorted_ bam file? Or I will need to index the manually sorted bam?

hyjforesight · 2021-11-21T21:47:44Z

@denvercal1234GitHub
filtered matrices files are at \SC_RNA_COUNTER_CS\SC_MULTI_CORE\MULTI_GEM_WELL_PROCESSOR\COUNT_GEM_WELL_PROCESSOR_BASIC_SC_RNA_COUNTER\FILTER_BARCODES\fork0\files\filtered_matrices_mex
check the velocyto.py tutorial. no need to sort Cell Ranger generated bam file. it's already position sorted. just change the name to possorted_yourbamname.bam
The index file is not needed for velocyto.py

denvercal1234GitHub · 2021-11-21T22:31:31Z

Humm... I think the bam file from cellranger is actually only sorted by position, and velocyto then sorts it by cell (CB). That is why it renames it to cellsorted_...bam. That is why if memory is an issue, then it is recommended to cellsorted first then run run10x.

hyjforesight · 2021-11-22T15:53:30Z

hello @denvercal1234GitHub
Yor are right. cellranger did position sorting. the CB sorting will be done by velocyto. we use $ velotcyto run command, which requires the filtered barcodes.tsv, not $ velocyto run10x command. the CB sorting will be done internally according to this barcodes.tsv.
Check the velocyto run --help
-b, --bcfile FILE Valid barcodes file, to filter the bam. If --bcfile is not specified all the cell barcodes will be included.
Cell barcodes should be specified in the bcfile as the CB tag for each read

To make it sure, let me try CB sorting individually and let you know whether the results are consistent.

BTW, 16GB memory is enough for velocyto.py

hyjforesight · 2021-11-22T19:26:18Z

Updata: individual cell sorting generates errors. so no need to do this step. cell sorting will be done internally by $ velocyto run

hyjforesight · 2021-12-02T17:48:36Z

hope velocyto team can fix this BUG in the next release, just change searching chrMT to search chrM.

AY-LIANG · 2022-03-22T09:04:21Z

Hi,
I rename all chM to chMT in the reference genome folder,and run cellranger count to get a new bam file.
After velocyto,I use scanpy to analyse the loom file,but the gene expression does not match the 10x results well.
There is still some chromosome not found in gtf(refdata-gex-GRCh38-2020-A,download from 10x Genomics ) during velocyto

2022-03-22 11:18:44,717 - WARNING - The .bam file refers to a chromosome 'KI270442.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,717 - WARNING - The .bam file refers to a chromosome 'KI270442.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,725 - DEBUG - Marking up chromosome KI270729.1
2022-03-22 11:18:44,725 - WARNING - The .bam file refers to a chromosome 'KI270729.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,725 - WARNING - The .bam file refers to a chromosome 'KI270729.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,726 - DEBUG - Marking up chromosome GL000225.1
2022-03-22 11:18:44,726 - WARNING - The .bam file refers to a chromosome 'GL000225.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,726 - WARNING - The .bam file refers to a chromosome 'GL000225.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,729 - DEBUG - Marking up chromosome KI270743.1
2022-03-22 11:18:44,729 - WARNING - The .bam file refers to a chromosome 'KI270743.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,729 - WARNING - The .bam file refers to a chromosome 'KI270743.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,731 - DEBUG - Marking up chromosome GL000009.2
2022-03-22 11:18:44,731 - WARNING - The .bam file refers to a chromosome 'GL000009.2+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,733 - DEBUG - Marking up chromosome KI270722.1
2022-03-22 11:18:44,734 - WARNING - The .bam file refers to a chromosome 'KI270722.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,734 - WARNING - The .bam file refers to a chromosome 'KI270722.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,734 - DEBUG - Marking up chromosome GL000194.1
2022-03-22 11:18:44,734 - WARNING - The .bam file refers to a chromosome 'GL000194.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,834 - DEBUG - Marking up chromosome KI270742.1
2022-03-22 11:18:44,834 - WARNING - The .bam file refers to a chromosome 'KI270742.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,834 - WARNING - The .bam file refers to a chromosome 'KI270742.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,837 - DEBUG - Marking up chromosome GL000205.2
2022-03-22 11:18:44,837 - WARNING - The .bam file refers to a chromosome 'GL000205.2+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,877 - DEBUG - Marking up chromosome GL000195.1
2022-03-22 11:18:44,901 - DEBUG - Marking up chromosome KI270736.1
2022-03-22 11:18:44,901 - WARNING - The .bam file refers to a chromosome 'KI270736.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,901 - WARNING - The .bam file refers to a chromosome 'KI270736.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,902 - DEBUG - Marking up chromosome KI270733.1
2022-03-22 11:18:44,902 - WARNING - The .bam file refers to a chromosome 'KI270733.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,902 - WARNING - The .bam file refers to a chromosome 'KI270733.1-' not present in the annotation (.gtf) file

Do you have any suggestion?

denvercal1234GitHub · 2022-03-22T12:28:07Z

@AY-LIANG I am not sure, but between which that the gene expression did not match? What do you meant by 10X results? You meant between run and run10X?

AY-LIANG · 2022-03-22T12:37:21Z

@denvercal1234GitHub
10x result is the output of CellRanger,which is, these three files
barcodes.tsv.gz features.tsv.gz matrix.mtx.gz

I use scanpy to call the highest expression genes,and there are some differences between the loom file and CellRanger result.

It seems that some gene information is missing during velocyto.I guess these warnings might be the reason.

hyjforesight · 2022-03-22T15:03:04Z

hi @AY-LIANG
we can omit these GL KL chromosomes. it's useless.
velocyto removes some counts, check this theislab/scvelo#813
I don't use loom file for Scanpy. I use 10X matrics. when I need unspliced and spliced information. I just copy them from loom to the matrix.

ldata = scv.read(filename.loom, cache=True)
adata = scv.utils.merge(adata, ldata)

AY-LIANG · 2022-03-23T02:06:10Z

@hyjforesight
Thanks!It's very helpful

0106WeiWeiDeng · 2023-02-07T02:22:46Z

Acoording to this comment, and cellranger has been updated for serveral years, incongruences between .gtf and .bam are now removed, thus this issue may be solved by deleting two line of code of counter.py:
if chrom == "M":
chrom = "MT"
https://github.com/velocyto-team/velocyto.py/blob/0963dd2d/velocyto/counter.py#L282
https://github.com/velocyto-team/velocyto.py/blob/0963dd2d/velocyto/counter.py#L283

hernandezvargash · 2023-03-16T13:39:58Z

thank you @0106WeiWeiDeng I just tested your suggestion (deleting the two lines of code in counter.py) and the velocyto run seems to be fine, at least I have mitochondrial genes again.

shuailinli · 2023-06-06T23:31:51Z

Hi, I rename all chM to chMT in the reference genome folder,and run cellranger count to get a new bam file. After velocyto,I use scanpy to analyse the loom file,but the gene expression does not match the 10x results well. There is still some chromosome not found in gtf(refdata-gex-GRCh38-2020-A,download from 10x Genomics ) during velocyto

2022-03-22 11:18:44,717 - WARNING - The .bam file refers to a chromosome 'KI270442.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,717 - WARNING - The .bam file refers to a chromosome 'KI270442.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,725 - DEBUG - Marking up chromosome KI270729.1
2022-03-22 11:18:44,725 - WARNING - The .bam file refers to a chromosome 'KI270729.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,725 - WARNING - The .bam file refers to a chromosome 'KI270729.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,726 - DEBUG - Marking up chromosome GL000225.1
2022-03-22 11:18:44,726 - WARNING - The .bam file refers to a chromosome 'GL000225.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,726 - WARNING - The .bam file refers to a chromosome 'GL000225.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,729 - DEBUG - Marking up chromosome KI270743.1
2022-03-22 11:18:44,729 - WARNING - The .bam file refers to a chromosome 'KI270743.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,729 - WARNING - The .bam file refers to a chromosome 'KI270743.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,731 - DEBUG - Marking up chromosome GL000009.2
2022-03-22 11:18:44,731 - WARNING - The .bam file refers to a chromosome 'GL000009.2+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,733 - DEBUG - Marking up chromosome KI270722.1
2022-03-22 11:18:44,734 - WARNING - The .bam file refers to a chromosome 'KI270722.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,734 - WARNING - The .bam file refers to a chromosome 'KI270722.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,734 - DEBUG - Marking up chromosome GL000194.1
2022-03-22 11:18:44,734 - WARNING - The .bam file refers to a chromosome 'GL000194.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,834 - DEBUG - Marking up chromosome KI270742.1
2022-03-22 11:18:44,834 - WARNING - The .bam file refers to a chromosome 'KI270742.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,834 - WARNING - The .bam file refers to a chromosome 'KI270742.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,837 - DEBUG - Marking up chromosome GL000205.2
2022-03-22 11:18:44,837 - WARNING - The .bam file refers to a chromosome 'GL000205.2+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,877 - DEBUG - Marking up chromosome GL000195.1
2022-03-22 11:18:44,901 - DEBUG - Marking up chromosome KI270736.1
2022-03-22 11:18:44,901 - WARNING - The .bam file refers to a chromosome 'KI270736.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,901 - WARNING - The .bam file refers to a chromosome 'KI270736.1-' not present in the annotation (.gtf) file
2022-03-22 11:18:44,902 - DEBUG - Marking up chromosome KI270733.1
2022-03-22 11:18:44,902 - WARNING - The .bam file refers to a chromosome 'KI270733.1+' not present in the annotation (.gtf) file
2022-03-22 11:18:44,902 - WARNING - The .bam file refers to a chromosome 'KI270733.1-' not present in the annotation (.gtf) file

Do you have any suggestion?

Hi @AY-LIANG, I know this post is quite old, but I just want to add that this warning happens because the code for masking will check if every read in bam file and see if the chromosome it is mapped is in the gtf file or not. In the case of reads mapped to regions like GL000194.1, it is in fasta file, but not gtf file, so it will generate the warning.

This was referenced Nov 18, 2021

The .bam file refers to a chromosome 'MT+' not present in the annotation (.gtf) file. samtools/samtools#1545

Closed

Is it the right genome annotation file for cellranger 10X human to use? Chromosome MT+/- and others not present in the annotation #319

Closed

hyjforesight changed the title ~~no mitochondrial genes are included in loom file~~ no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) Nov 18, 2021

hyjforesight mentioned this issue Nov 18, 2021

mitochondrial genes velocyto-team/velocyto.R#96

Open

denvercal1234GitHub mentioned this issue Nov 20, 2021

Does loompy take into account mitochrondial genes in loom file output? linnarsson-lab/loompy#161

Open

hyjforesight closed this as completed Jan 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) #318

no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) #318

hyjforesight commented Nov 12, 2021 •

edited

Loading

hyjforesight commented Nov 15, 2021

hyjforesight commented Nov 18, 2021 •

edited

Loading

denvercal1234GitHub commented Nov 20, 2021 •

edited

Loading

denvercal1234GitHub commented Nov 20, 2021

hyjforesight commented Nov 21, 2021 •

edited

Loading

denvercal1234GitHub commented Nov 21, 2021

hyjforesight commented Nov 21, 2021 •

edited

Loading

denvercal1234GitHub commented Nov 21, 2021

hyjforesight commented Nov 22, 2021 •

edited

Loading

hyjforesight commented Nov 22, 2021 •

edited

Loading

hyjforesight commented Dec 2, 2021 •

edited

Loading

AY-LIANG commented Mar 22, 2022

denvercal1234GitHub commented Mar 22, 2022 •

edited

Loading

AY-LIANG commented Mar 22, 2022

hyjforesight commented Mar 22, 2022

AY-LIANG commented Mar 23, 2022

0106WeiWeiDeng commented Feb 7, 2023

hernandezvargash commented Mar 16, 2023

shuailinli commented Jun 6, 2023

no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) #318

no mitochondrial genes are included in loom file (bug fixed, velocyto team please check this) #318

Comments

hyjforesight commented Nov 12, 2021 • edited Loading

hyjforesight commented Nov 15, 2021

hyjforesight commented Nov 18, 2021 • edited Loading

denvercal1234GitHub commented Nov 20, 2021 • edited Loading

denvercal1234GitHub commented Nov 20, 2021

hyjforesight commented Nov 21, 2021 • edited Loading

denvercal1234GitHub commented Nov 21, 2021

hyjforesight commented Nov 21, 2021 • edited Loading

denvercal1234GitHub commented Nov 21, 2021

hyjforesight commented Nov 22, 2021 • edited Loading

hyjforesight commented Nov 22, 2021 • edited Loading

hyjforesight commented Dec 2, 2021 • edited Loading

AY-LIANG commented Mar 22, 2022

denvercal1234GitHub commented Mar 22, 2022 • edited Loading

AY-LIANG commented Mar 22, 2022

hyjforesight commented Mar 22, 2022

AY-LIANG commented Mar 23, 2022

0106WeiWeiDeng commented Feb 7, 2023

hernandezvargash commented Mar 16, 2023

shuailinli commented Jun 6, 2023

hyjforesight commented Nov 12, 2021 •

edited

Loading

hyjforesight commented Nov 18, 2021 •

edited

Loading

denvercal1234GitHub commented Nov 20, 2021 •

edited

Loading

hyjforesight commented Nov 21, 2021 •

edited

Loading

hyjforesight commented Nov 21, 2021 •

edited

Loading

hyjforesight commented Nov 22, 2021 •

edited

Loading

hyjforesight commented Nov 22, 2021 •

edited

Loading

hyjforesight commented Dec 2, 2021 •

edited

Loading

denvercal1234GitHub commented Mar 22, 2022 •

edited

Loading