-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scglue.data.get_gene_annotation gives many NaN values. #122
Comments
same problem for me, do you solve it? |
The same problem and it came always in my 3 datasets. First I tried to remove these NA gene's rows of rna.var.loc[:, ["chrom", "chromStart", "chromEnd"]], and everything went well in the step1 of pre-treating data with 0 bugs. But it came a new problem in the beginning of step2, the model training, that I can't load RNA-pp.h5ad files came from step1, just because of the before deletion of these NA rows: Then I tried to fill these "chrom" NA by "chrn" and "chromStart&end" NA by some random number, but it still couldn't work when I ran: So how should this problem be handled? Around 10% genes fail to assign the ranges and how to do with these genes NA rows? Thanks for advance. |
Hi all, and thank you for the report! This is likely caused by the fact that the GTF file being used does not contain annotation for these genes. You may deal with this in two ways:
Let me know if these solutions work. |
Thanks for reply. I tried 2 and it did work well! #find NA delete NA rowsrna.var.dropna(subset=["chromStart", "chromEnd"], inplace=True) #delete NA rows in anondata #change the type to int |
thanks for getting back; 1 works well for me. for anyone encountering similar issues in the future: if you are working with data from 10x genomics you can download their reference file from their website. |
Hello!
Thank you for developing this tool.
I wanted to reach out to see if something was going wrong with my code. I'm using dataset preprocessed with Seurat for GLUE and following the tutorial for scRNA and scATAC integration. When I run this code segment:
scglue.data.get_gene_annotation(
pbmc_rna, gtf="gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz",
gtf_by="gene_name"
)
many of the genes end up getting NaN in chrom, chromStart, and chromEnd. The genes that fail to assign the ranges seem to be those that start with AL such as AL627309.1, AL590822.1. How can I fix this issue? Thank you in advance.
The text was updated successfully, but these errors were encountered: