Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene name is incorrectly replaced by biotype for GTF of which the gene_name field is missing #628

Closed
shenyang1981 opened this issue Apr 29, 2019 · 1 comment
Labels

Comments

@shenyang1981
Copy link

Hello,

I was trying to run STARSolo on poorly annotated species Macaca fascicularis. The GTF file was downloaded from Ensembl (v95). For genes that don't have gene name annotated, the biotype was used by STARSolo to represent the gene name.

For example, gene "ENSMFAG00000010714" only gets gene_id annotated in GTF file.

(from GTF:)

1 ensembl gene 1174457 1175395 . - . gene_id "ENSMFAG00000010714"; gene_version "1"; gene_source "ensembl"; gene_biotype "protein_coding";

In the genes.tsv file generated by STARSolo, this gene was shown as:

ENSMFAG00000010714 protein_coding

And "protein_coding" was used as gene name (or row name) in count matrix (from matrix.mtx).

Please help fix this issue for those poorly annotation species. And it would be nice to offer an option to select gene_id/gene_name as the row name in count matrix. Thanks a lot.

Best,
Yang

@alexdobin alexdobin added the bug label Apr 29, 2019
@alexdobin
Copy link
Owner

Hi Yang,

thanks for reporting this problem, 2.7.1a should fix it.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants