Skip to content

Commit

Permalink
Updating Data Sources pages (#501)
Browse files Browse the repository at this point in the history
  • Loading branch information
beets authored Sep 10, 2024
1 parent fae5eb8 commit 0176775
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions datasets/Biomedical.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,10 @@ The NIH NCBI gene info datasets from NCBI Gene for a subset of species contains
* _Xenepus laevis_


#### [NCBI Assembly](https://www.ncbi.nlm.nih.gov/assembly)
"The [NCBI Assembly database](www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project" (Kitts et al. 2016). In this import we include the metadata for all genome assemblies documented in `assembly_summary_genbank.txt` and `assembly_summary_refseq.txt`. Assemblies are stored in GenomeAssembly nodes whose information is integrated from both the GenBank and RefSeq datasets.


#### [NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy/)
"NCBI Taxonomy "consists of a curated set of names and classifications for all of the source organisms represented in the International Nucleotide Sequence Database Collaboration (INSDC). The NCBI Taxonomy database contains a list of names that are determined to be nomenclaturally correct or valid (as defined according to the different codes of nomenclature), classified in an approximately phylogenetic hierarchy (depending on the level of knowledge regarding phylogenetic relationships of a given group) as well as a number of names that exist outside the jurisdiction of the codes. That is, it focuses on nomenclature and systematics, rather than documenting the description of taxa."

Expand Down

0 comments on commit 0176775

Please sign in to comment.