Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Data Sources pages #332

Merged
merged 1 commit into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 48 additions & 27 deletions datasets/Biomedical.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ parent: Data Sources
* TOC
{:toc}

### [Broad Institute](https://www.broadinstitute.org/resources-services-and-tools)

#### [GTEx Analysis V8](https://www.gtexportal.org/home/datasets)
The GTEx eGene and significant variant-gene association data were generated from samples "collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank." The single-tissue cis-eQTL data from the v8 release was used. Due to the size of the datasets only Skin - Not Sun Exposed and Skin - Sun Exposed are made available on the main graph. The data for all tissues can be accessed on the Biomedical Data Commons knowledge graph.

GTEx is an NIH human genomic data unrestricted-access data repository and the data was made available in compliance with [GTEx Data Release and Publication Policy](https://www.gtexportal.org/home/documentationPage#staticTextPublicationPolicy). GTEx outlines [how to cite](https://www.gtexportal.org/home/faq#citePortal) use of GTEx data in journal publication.


### [ELIXIR Core Data Resources](https://elixir-europe.org/platforms/data/core-data-resources)

#### [The Molecular INTeraction (MINT) Database](https://mint.bio.uniroma2.it/)
Expand All @@ -32,22 +40,33 @@ Data made available under: [ENCODE Data Use Policy for External Users](https://w
#### [ChEMBL](https://www.ebi.ac.uk/chembl/)
"ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs." It includes information on drugs at all stages of drug discovery.


#### [UniProt](https://www.uniprot.org/)
Data Commons includes protein sequence and functional information including protein interaction with chemical compounds maintained by the UniProt Consortium. The data is made available by the [Creative Commons Attribution (CC BY 4.0) License](https://creativecommons.org/licenses/by/4.0/). Further information on UniProt License and Disclaimer can be found [here](https://www.uniprot.org/help/license). The UniProt Consortium states [how to cite](https://www.uniprot.org/help/publications) UniProt data used in a journal article.

This data is made available by [EMBL-EPI Terms of Use](https://www.ebi.ac.uk/about/terms-of-use/).


### [Genotype-Tissue Expression (GTEx)](https://www.gtexportal.org/home/)
### [International Committee on Taxonomy of Viruses (ICTV)](https://ictv.global/)

#### [GTEx Analysis V8](https://www.gtexportal.org/home/datasets)
The GTEx eGene and significant variant-gene association data were generated from samples "collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank." The single-tissue cis-eQTL data from the v8 release was used. Due to the size of the datasets only Skin - Not Sun Exposed and Skin - Sun Exposed are made available on the main graph. The data for all tissues can be accessed on the Biomedical Data Commons knowledge graph.
#### [Master Species List](https://ictv.global/msl)
The official, current virus taxonomy approved by the ICTV. To accomplish the task of organizing and maintaining this virus taxonomy, the ICTV is composed of 7 subcommittees covering Animal DNA viruses and Retroviruses, Animal dsRNA and ssRNA (-) viruses, Animal ssRNA (+) viruses, Bacterial viruses, Archaeal Viruses, Fungal and Protist viruses, and Plant viruses. The ICTV has established over 100 international Study Groups (SGs) covering all major virus families and genera.

GTEx is an NIH human genomic data unrestricted-access data repository and the data was made available in compliance with [GTEx Data Release and Publication Policy](https://www.gtexportal.org/home/documentationPage#staticTextPublicationPolicy). GTEx outlines [how to cite](https://www.gtexportal.org/home/faq#citePortal) use of GTEx data in journal publication.
#### [Virus Metadata Resource](https://ictv.global/vmr)
The ICTV chooses an exemplar virus for each species and the VMR provides a list of these exemplars. An exemplar virus serves as an example of a well-characterized virus isolate of that species and includes the GenBank accession number for the genomic sequence of the isolate as well as the virus name, isolate designation, suggested abbreviation, genome composition, and host source.
This data is made available under Creative Commons Attribution ShareAlike 4.0 International (CC BY-SA 4.0).


### [International Committee on Taxonomy of Viruses (ICTV)](https://ictv.global/)
### [Jensen Lab (University of Copenhagen)](https://jensenlab.org/resources/)

#### [Master Species List and Virus Metadata Resource](https://ictv.global/)
The official, current virus taxonomy approved by the International Committee on Taxonomy of Viruses (ICTV). This includes data from the Master Species List and the Virus Metadata Resource.
This data is made available under Creative Commons Attribution ShareAlike 4.0 International (CC BY-SA 4.0).
#### [DISEASES](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. We further unify the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. For further details please refer to the following Open Access article about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831).


#### [Side Effect Resource (SIDER) 4.1](http://sideeffects.embl.de/)
SIDER is a database of adverse drug reactions. Available information includes side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations. However, this database uses MEDRA ontology, which is under the UMLS license that is limited to non-commercial use. Therefore, only the data under zero license - mappings of drug names, PubChem Compound IDs (CIDs), and ATC Codes - are hosted. Data Commons hosts version 4.1 of SIDER released on October 21, 2015. Information about citing SIDER can be found [here](http://sideeffects.embl.de/about/).

This data is made available under the [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).


### [New York Botanical Garden (NYBG)](http://sweetgum.nybg.org/science/)
Expand All @@ -56,20 +75,32 @@ This data is made available under Creative Commons Attribution ShareAlike 4.0 In
C. V. Starr Virtual Herbarium is a public specimen database with photos and detailed records about millions of plants, fungi, and algae.


### [Proteomics Standards Initiative](https://psidev.info/)
### [PharmGKB](https://www.pharmgkb.org/)

#### [PharmGKB Primary Data](https://www.pharmgkb.org/)
The Pharmacogenomics Knowledge Base, PharmGKB, is an interactive tool for researchers investigating how genetic variation affects drug response. The PharmGKB Web site, http://www.pharmgkb.org, displays genotype, molecular, and clinical knowledge integrated into pathway representations and Very Important Pharmacogene (VIP) summaries with links to additional external resources. Users can search and browse the knowledgebase by genes, variants, drugs, diseases, and pathways. The Primary Data contains summary information on chemicals, drugs, genes, genetic variants, and phenotypes.


#### [PharmGKB Relationships Data](https://www.pharmgkb.org/)
PharmGKB reports association between chemicals, diseases, genes, and genetic variants, both with themselves and with each other.

Data made available under Creative Commons Attribution-ShareAlike 4.0 Intergovernmental Organization (CC BY-SA 4.0 IGO) licence. Explicit licensing for PharmGKB can be viewed on the [download page](https://www.pharmgkb.org/downloads).

#### [HUPO-PSI Working Groups and Outputs](https://psidev.info/)
The Molecular Interactions Controlled Vocabulary from the HUPO Proteomics Standards Initiative working groups is "a structured controlled vocabulary for the annotation of experiments concerned with protein-protein interactions". The ontologies dictionary is represented in a tree structure in the [EMBL-EBI Ontology Lookup Service](https://www.ebi.ac.uk/ols/ontologies/mi). Data Commons includes three subsets of the ontologies: "interaction detection method", "interaction type" and "database citation", which are commonly used in protein-protein interactions.

Data Made available under [Apache License 2.0](https://github.com/EBISPOT/OLS/blob/main/LICENSE). The license information of HUPO PSI can be found at the [Community Practice](http://www.psidev.info/sites/default/files/CommunityPractice-revised.doc). See also [EBI term of use](http://www.ebi.ac.uk/about/terms-of-use/).
### [Swiss Institute of Bioinformatics (SIB)](https://www.expasy.org/)

#### [Antibodies Chemically Defined (ABCD)](https://web.expasy.org/abcd/))
The ABCD database is part of a broader project, with the mission of promoting the widespread use of recombinant antibodies by academic researchers and, ultimately, the replacement of animal-produced antibodies. This concerted effort also includes the [Geneva Antibody Facility](https://www.unige.ch/medecine/antibodies/) (for discovery and production of antibodies) and the scientific journal [Antibody Reports](https://oap.unige.ch/journals/abrep) (publishing technical articles on antibody characterization). If you'd like to cite the ABCD database: Lima WC, Gasteiger E, Marcatili P, Duek P, Bairoch A, Cosson P. The ABCD database: a repository for chemically defined antibodies. [Nucleic Acids Res. 2020, 48:D261-D264.](https://academic.oup.com/nar/article/48/D1/D261/5549708)

### [Side Effect Resource (SIDER)](http://sideeffects.embl.de/)
[Terms and Conditions](https://www.statcan.gc.ca/en/reference/terms-conditions/general?MM=as).

#### [SIDER 4.1](http://sideeffects.embl.de/)
SIDER is a database of adverse drug reactions curated by the EMBL collaboration. "SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations." Data Commons hosts version 4.1 of SIDER released on October 21, 2015.

This data is made available under the [Creative Commons Attribution-Noncommercial-Share Alike 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/). Information about citing SIDER can be found [here](http://sideeffects.embl.de/about/).
### [Temporary Data Commons Data](https://www.datacommons.org/)

#### [Temporary Gene Mappings](https://www.datacommons.org/)
This maps the new way of generating Gene dcids (bio/<gene_symbol>) with the old, preexisting Gene dcids(bio/<genome_assembly>_<gene_symbol>). These are temporary mappings until all data using the old method of Gene dcid generation has been updated.

Data is publicly available via Data Commons.


### [The Human Protein Atlas](https://www.proteinatlas.org/)
Expand Down Expand Up @@ -160,16 +191,6 @@ The All SNPs files were downloaded from the UCSC Table Browser on August 13, 201
The annotation data is made freely available under the UCSC Genome Browser [terms of use](https://genome.ucsc.edu/conditions.html). The UCSC Genome Browser states [how to cite](https://genome.ucsc.edu/cite.html) use of their data in a journal article publication.


### [UniProt](https://www.uniprot.org/)
Data Commons includes protein sequence and functional information including protein interaction with chemical compounds maintained by the UniProt Consortium.


#### [UniProt Controlled Vocabulary of Species](https://www.uniprot.org/docs/speclist)
UniProt’s Controlled Vocabulary of Species contains organism species UniProt identification codes, [NCBI Taxonomy database identifiers](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi), scientific names, common names, synonyms, and organism kingdoms.

The data is made available by the [Creative Commons Attribution (CC BY 4.0) License](https://creativecommons.org/licenses/by/4.0/). Further information on UniProt License and Disclaimer can be found [here](https://www.uniprot.org/help/license). The UniProt Consortium states [how to cite](https://www.uniprot.org/help/publications) UniProt data used in a journal article.


### [University of Maryland School of Medicine, Institute of Genome Sciences](https://www.igs.umaryland.edu/)

#### [Disease Ontology](https://disease-ontology.org/)
Expand All @@ -180,7 +201,7 @@ The data is made available under [C0 1.0 Universal (CC0 1.0) Public Domain Dedic

### [World Health Organization (WHO)](https://www.who.int/)

#### [ATC_Codes](https://www.whocc.no/atc_ddd_index/)
#### [ATC Codes](https://www.whocc.no/atc_ddd_index/)
Anatomical Therapeutic Chemical (ATC) is a heirarchical classification system for pharmacological substances. 'In the ATC classification system, the active substances are classified in a hierarchy with five different levels. The system has fourteen main anatomical/pharmacological groups or 1st levels. Each ATC main group is divided into 2nd levels which could be either pharmacological or therapeutic groups. The 3rd and 4th levels are chemical, pharmacological or therapeutic subgroups and the 5th level is the chemical substance. The 2nd, 3rd and 4th levels are often used to identify pharmacological subgroups when that is considered more appropriate than therapeutic or chemical subgroups.'

Data made available under [CC BY-NC-SA 3.0 IGO](https://www.who.int/about/policies/publishing/copyright).
Expand Down
28 changes: 28 additions & 0 deletions datasets/Demographics.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ Population Census and Statistics for Brazil.
Data Commons has imported variables related to demographics, in particular concerning literacy, work, housing, and religion from the Indian Census on the state, district, and city level.


### [Central Bureau of Statistics (CBS), Israel](https://www.cbs.gov.il/he/Pages/default.aspx)

#### [Israel Census](https://www.cbs.gov.il/en/Statistics/Pages/Generators/Time-Series-DataBank.aspx?level_1=4)
Israel Demographics, Health, Economy statistics for Israel at country, district and sub-district Level.

### [Colombia DANE National Administrative Department of Statistics](https://www.dane.gov.co)

#### [Colombia Census](https://www.dane.gov.co/index.php)
Expand Down Expand Up @@ -63,16 +68,28 @@ Data made publicly available under the standard [Google Terms of Service](https:
India Local Government Directory provides unique codes for revenue entities such as districts, villages, local government bodies.
[Copyright](https://lgdirectory.gov.in/copyRightPolicy.do), [Terms of Use](https://lgdirectory.gov.in/termsconditions.do).

### [Mexico National Institute of Statistics and Geography(INEGI)](https://www.inegi.org.mx/default.html)

#### [Mexico Census](https://en.www.inegi.org.mx/temas/)
Population Census and Statistics for Mexico at Country and State levels.

### [National Institute of Statistics and Censuses (INDEC)](https://www.indec.gob.ar)

#### [Argentina Census](https://www.indec.gob.ar)
Population Census and Statistics for Argentina.

### [Open Data for Africa](https://dataportal.opendataforafrica.org/)

#### [Kenya Census](https://kenya.opendataforafrica.org/)
Kenya Demographics, Health and Education data from Kenya National Bureau Of Statistics by country, county and towns and suburbs.
[Terms of use](https://kenya.opendataforafrica.org/gdlkmgb).

#### [Nigeria Statistics](https://nigeria.opendataforafrica.org)
Demographics, Health, Agriculture and Education Statistics for Nigeria.

#### [SouthAfrica Census](https://southafrica.opendataforafrica.org/)
South Africa Demographics, Health and Education data from South Africa Data Portal by country, province and district municipality.

### [Opportunity Insights](https://opportunityinsights.org/)

#### [The Opportunity Atlas](https://opportunityinsights.org/data/)
Expand Down Expand Up @@ -100,6 +117,9 @@ Japan Demographics, Economy, Health, Education data from Portal Site of Official

### [Statistics Canada](https://www.statcan.gc.ca/en/start)

#### [Canada Statistics](https://www150.statcan.gc.ca/n1/en/type/data?MM=1)
Canada Demographics, Health, Education and Economy statistics at Canada country and subnational levels.

#### [Population estimates](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000501#tables)
Yearly population estimates for Canada.

Expand Down Expand Up @@ -154,6 +174,9 @@ Weekly and Annual cases of selected national notifiable (infectious and non-infe
Mortality counts for all US states and counties broken down by underlying cause of death, age, race, sex, and year.


#### [Wonder: Mortality, Underlying Cause Of Death](https://wonder.cdc.gov/ucd-icd10.html)
The Underlying Cause of Death database contains mortality data based on death certificates for U.S. residents.

#### [Wonder: Natality](https://wonder.cdc.gov/natality.html)
Includes "counts of live births occurring within the United States to U.S. residents. Counts can be obtained by a variety of demographic characteristics, such as state and county of residence, mother's race, and mother's age, and health and medical items, such as tobacco use, method of delivery, and congenital anomalies. The data are derived from birth certificates."

Expand Down Expand Up @@ -198,6 +221,11 @@ Population data for countries, capital cities, urban and rural areas not covered
[Terms of Use](http://data.un.org/Host.aspx?Content=UNdataUse).


### [United Nations Office for the Coordination of Humanitarian Affairs(UN OCHA)](https://www.unocha.org/)

#### [Mexico Subnational Population Statistics](https://data.humdata.org/dataset/cod-ps-mex)
Population Census and Statistics for Mexico at Municipal level.

### [Wikimedia Foundation](https://wikimediafoundation.org/)

#### [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page)
Expand Down
17 changes: 17 additions & 0 deletions datasets/Economy.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ parent: Data Sources
#### [Australia Statistics](https://www.abs.gov.au/statistics)
Australia Demographics, Health, Economy statistics at country,state,territories and 4 levels of statistial areas.

### [Central Bureau of Statistics (CBS), Israel](https://www.cbs.gov.il/he/Pages/default.aspx)

#### [Israel Census](https://www.cbs.gov.il/en/Statistics/Pages/Generators/Time-Series-DataBank.aspx?level_1=4)
Israel Demographics, Health, Economy statistics for Israel at country, district and sub-district Level.

### [European Union (EU) Eurostat](https://ec.europa.eu/eurostat)

#### [Regional Statistics by NUTS Classification](https://ec.europa.eu/eurostat/)
Expand All @@ -40,6 +45,11 @@ The exchange rate of currency broken down by country, currency standardization t
Wage and salary data for Indian states.


### [Mexico National Institute of Statistics and Geography(INEGI)](https://www.inegi.org.mx/default.html)

#### [Mexico Census](https://en.www.inegi.org.mx/temas/)
Population Census and Statistics for Mexico at Country and State levels.

### [Organisation for Economic Co-operation and Development (OECD)](https://stats.oecd.org/)

#### [OECD Statistics](https://stats.oecd.org)
Expand All @@ -58,6 +68,13 @@ Japan Demographics, Economy, Health, Education data from Portal Site of Official
Data Commons includes variables related to poverty and unemployment in Indian states from the Reserve Bank of India.


### [Statistics Canada](https://www.statcan.gc.ca/en/start)

#### [Canada Statistics](https://www150.statcan.gc.ca/n1/en/type/data?MM=1)
Canada Demographics, Health, Education and Economy statistics at Canada country and subnational levels.
[Terms and Conditions](https://www.statcan.gc.ca/en/reference/terms-conditions/general?MM=as).


### [U.S. Bureau of Economic Analysis (BEA)](https://www.bea.gov/)

#### [GDP by County, Metro, and Other Areas](https://apps.bea.gov/regional)
Expand Down
Loading