Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This CL adds a new import for NCBI Gene. The data cleaning and testin…
…g is documented on [GitHub](datacommonsorg/data#1084). NCBI Gene is updated daily. We included the following datasets in this import: 1. [NCBI Gene](https://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz). 2. [gene2pubmed](https://ftp.ncbi.nih.gov/gene/DATA/gene2pubmed.gz). 3. [gene_neighbors](https://ftp.ncbi.nih.gov/gene/DATA/gene_neighbors.gz). 4. [gene_orthologs](https://ftp.ncbi.nih.gov/gene/DATA/gene_orthologs.gz). 5. [gene_group](https://ftp.ncbi.nih.gov/gene/DATA/gene_group.gz). 6. [mim2gene_medgen](https://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen). 7. [gene2go](https://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz). 8. [gene2accession](https://ftp.ncbi.nih.gov/gene/DATA/gene2accession.gz). 9. [gene2ensembl](https://ftp.ncbi.nih.gov/gene/DATA/gene2ensembl.gz). 10. [generifs_basic](https://ftp.ncbi.nih.gov/gene/GeneRIF/generifs_basic.gz). [NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) is a comprehensive resource containing information about genes from a wide range of species. It serves as a central hub for gene-specific data, integrating information from various sources and providing links to other relevant resources. It includes gene identification (e.g. official gene symbols, aliases, and cross-references to other databases), sequence information (e.g. genomic location and reference sequences (RefSeqs) for genomic DNA, transcripts, proteins, and mature peptides), functional information (gene function descriptions, associated pathways, related biological processes, orthologs, and related genes), phenotypic associations, (i.e. links to phenotypes and diseases associated with the gene), and links to relevant scientific papers (i.e. PubMed IDs). "[NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) supplies gene-specific connections in the nexus of map, sequence, expression, structure, function, citation, and homology data. Unique identifiers are assigned to genes with defining sequences, genes with known map positions, and genes inferred from phenotypic information. These gene identifiers are used throughout NCBI's databases and tracked through updates of annotation. Gene includes genomes represented by [NCBI Reference Sequences](https://www.ncbi.nlm.nih.gov/refseq/) (or RefSeqs) and is integrated for indexing and query and retrieval from NCBI's Entrez and [E-Utilities](https://www.ncbi.nlm.nih.gov/books/NBK25501/) systems. Gene comprises sequences from thousands of distinct taxonomic identifiers, ranging from viruses to bacteria to eukaryotes. It represents chromosomes, organelles, plasmids, viruses, transcripts, and millions of proteins." PiperOrigin-RevId: 690868739
- Loading branch information