From 6c71a4f555b9c7ea01414885b1bd84f240638201 Mon Sep 17 00:00:00 2001 From: Keyur Shah Date: Tue, 12 Nov 2024 08:31:35 -0800 Subject: [PATCH] Updating Data Sources pages (#539) --- datasets/Agriculture.md | 5 ----- datasets/Biomedical.md | 16 ++++++++++++++-- datasets/Crime.md | 5 ----- datasets/Demographics.md | 11 ----------- datasets/Economy.md | 3 --- datasets/Education.md | 20 -------------------- datasets/Energy.md | 15 --------------- datasets/Environment.md | 28 ---------------------------- datasets/Health.md | 19 ------------------- 9 files changed, 14 insertions(+), 108 deletions(-) diff --git a/datasets/Agriculture.md b/datasets/Agriculture.md index d851fe819..ad776a502 100644 --- a/datasets/Agriculture.md +++ b/datasets/Agriculture.md @@ -11,11 +11,6 @@ parent: Data Sources * TOC {:toc} -### [Open Data for Africa](https://dataportal.opendataforafrica.org/) - -#### [Nigeria Statistics](https://nigeria.opendataforafrica.org) -Demographics, Health, Agriculture and Education Statistics for Nigeria. - ### [U.S. Department of Agriculture (USDA)](https://www.usda.gov/) #### [Agricultural Survey](https://quickstats.nass.usda.gov/) diff --git a/datasets/Biomedical.md b/datasets/Biomedical.md index bc9e2ead6..ba15400da 100644 --- a/datasets/Biomedical.md +++ b/datasets/Biomedical.md @@ -53,8 +53,16 @@ This data is made available under Creative Commons Attribution ShareAlike 4.0 In ### [Jensen Lab (University of Copenhagen)](https://jensenlab.org/resources/) -#### [DISEASES](https://diseases.jensenlab.org/Search) -DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. +#### [DISEASES: Experiment](https://diseases.jensenlab.org/Search) +DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The knowledge files further contain the source database, the evidence type, and the confidence score. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. + + +#### [DISEASES: Knowledge](https://diseases.jensenlab.org/Search) +DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The experiments files instead contain the source database, the source score, and the confidence score. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. + + +#### [DISEASES: Textmining](https://diseases.jensenlab.org/Search) +DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The textmining files contain the z-score, the confidence score, and a URL to a viewer of the underlying abstracts. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license. #### [Side Effect Resource (SIDER) 4.1](http://sideeffects.embl.de/) @@ -103,6 +111,10 @@ This data is made available through [openFDA terms of service](https://open.fda. "The [NCBI Assembly database](www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project" (Kitts et al. 2016). In this import we include the metadata for all genome assemblies documented in `assembly_summary_genbank.txt` and `assembly_summary_refseq.txt`. Assemblies are stored in GenomeAssembly nodes whose information is integrated from both the GenBank and RefSeq datasets. +#### [NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) +"[NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) supplies gene-specific connections in the nexus of map, sequence, expression, structure, function, citation, and homology data. Unique identifiers are assigned to genes with defining sequences, genes with known map positions, and genes inferred from phenotypic information. These gene identifiers are used throughout NCBI's databases and tracked through updates of annotation. Gene includes genomes represented by [NCBI Reference Sequences](https://www.ncbi.nlm.nih.gov/refseq/) (or RefSeqs) and is integrated for indexing and query and retrieval from NCBI's Entrez and [E-Utilities](https://www.ncbi.nlm.nih.gov/books/NBK25501/) systems. Gene comprises sequences from thousands of distinct taxonomic identifiers, ranging from viruses to bacteria to eukaryotes. It represents chromosomes, organelles, plasmids, viruses, transcripts, and millions of proteins." + + #### [NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy/) "NCBI Taxonomy "consists of a curated set of names and classifications for all of the source organisms represented in the International Nucleotide Sequence Database Collaboration (INSDC). The NCBI Taxonomy database contains a list of names that are determined to be nomenclaturally correct or valid (as defined according to the different codes of nomenclature), classified in an approximately phylogenetic hierarchy (depending on the level of knowledge regarding phylogenetic relationships of a given group) as well as a number of names that exist outside the jurisdiction of the codes. That is, it focuses on nomenclature and systematics, rather than documenting the description of taxa." diff --git a/datasets/Crime.md b/datasets/Crime.md index 19862f9b5..7d8d60f8d 100644 --- a/datasets/Crime.md +++ b/datasets/Crime.md @@ -11,11 +11,6 @@ parent: Data Sources * TOC {:toc} -### [National Institution for Transforming India.](https://niti.gov.in/) - -#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download) -Sustainable Development Goals India Index - ### [U.S. Bureau of Justice Statistics (BJS)](https://bjs.ojp.gov/) #### [National Prisoner Statistics (NPS) Program](https://bjs.ojp.gov/data-collection/national-prisoner-statistics-nps-program) diff --git a/datasets/Demographics.md b/datasets/Demographics.md index 8a3aad620..ed4b3185e 100644 --- a/datasets/Demographics.md +++ b/datasets/Demographics.md @@ -46,11 +46,6 @@ Israel Demographics, Health, Economy statistics for Israel at country, district #### [Ireland Census](https://www.cso.ie/en/statistics/) Ireland Demographics, Health and Economy data from Central Statistics Office(CSO) by country, county and city. -### [Colombia DANE National Administrative Department of Statistics](https://www.dane.gov.co) - -#### [Colombia Census](https://www.dane.gov.co/index.php) -Population Census and Statistics for Colombia at Country, Department and Municipality geo-levels. - ### [DataMeet](https://datameet.org/) #### [DataMeet Maps](http://projects.datameet.org/maps) @@ -128,18 +123,12 @@ India National Family Health Survey - Data on population dynamics and health ind #### [India Poverty Status](https://www.niti.gov.in/sites/default/files/2021-11/National_MPI_India-11242021.pdf) India poverty statistics - percentage of people below poverty line. -#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download) -Sustainable Development Goals India Index - ### [Open Data for Africa](https://dataportal.opendataforafrica.org/) #### [Kenya Census](https://kenya.opendataforafrica.org/) Kenya Demographics, Health and Education data from Kenya National Bureau Of Statistics by country, county and towns and suburbs. [Terms of use](https://kenya.opendataforafrica.org/gdlkmgb). -#### [Nigeria Statistics](https://nigeria.opendataforafrica.org) -Demographics, Health, Agriculture and Education Statistics for Nigeria. - #### [SouthAfrica Census](https://southafrica.opendataforafrica.org/) South Africa Demographics, Health and Education data from South Africa Data Portal by country, province and district municipality. diff --git a/datasets/Economy.md b/datasets/Economy.md index af228afb0..10271c6af 100644 --- a/datasets/Economy.md +++ b/datasets/Economy.md @@ -80,9 +80,6 @@ Population Census and Statistics for Mexico at Country and State levels. #### [India Poverty Status](https://www.niti.gov.in/sites/default/files/2021-11/National_MPI_India-11242021.pdf) India poverty statistics - percentage of people below poverty line. -#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download) -Sustainable Development Goals India Index - ### [OpenFIGI](https://www.openfigi.com/) #### [FIGI](https://www.openfigi.com/) diff --git a/datasets/Education.md b/datasets/Education.md index 43a815d55..3dce91665 100644 --- a/datasets/Education.md +++ b/datasets/Education.md @@ -22,11 +22,6 @@ Population Census and Statistics for Brazil. California schools performance data across different grade levels and sub-groups (like race, disability etc). [CAASPP](https://caaspp-elpac.ets.org/elpac/) -### [Colombia DANE National Administrative Department of Statistics](https://www.dane.gov.co) - -#### [Colombia Census](https://www.dane.gov.co/index.php) -Population Census and Statistics for Colombia at Country, Department and Municipality geo-levels. - ### [European Union (EU) Eurostat](https://ec.europa.eu/eurostat) #### [EuroStat Early Education and Training](https://ec.europa.eu/eurostat/web/education-and-training/database) @@ -54,32 +49,17 @@ Data on schools, such as dropout rate and access to computers and toilets, in In #### [Unified District Information System for Education (UDISE)](https://udiseplus.gov.in/#/home) The Unified District Information System for Education (UDISE), by the Ministry of Education, India, collects and provides data related to schools and their resources. -### [India National Sample Survey](https://mospi.gov.in/web/nss) -The India National Sample Survey (NSS) Organizes and conducts large scale all-India sample surveys on different population groups in diverse socio economic areas, such as employment, consumer expenditure, housing conditions and environment, literacy levels, health, nutrition, family welfare, etc. - -#### [India Literacy](https://www.mospi.gov.in/sites/default/files/publication_reports/Report_585_75th_round_Education_final_1507_0.pdf) -Data covering the literacy status and rate for states in India. -[Terms of Use](https://ndap.niti.gov.in/info?tab=termsandconditions) - ### [Mexico National Institute of Statistics and Geography(INEGI)](https://www.inegi.org.mx/default.html) #### [Mexico Census](https://en.www.inegi.org.mx/temas/) Population Census and Statistics for Mexico at Country and State levels. -### [National Institution for Transforming India.](https://niti.gov.in/) - -#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download) -Sustainable Development Goals India Index - ### [Open Data for Africa](https://dataportal.opendataforafrica.org/) #### [Kenya Census](https://kenya.opendataforafrica.org/) Kenya Demographics, Health and Education data from Kenya National Bureau Of Statistics by country, county and towns and suburbs. [Terms of use](https://kenya.opendataforafrica.org/gdlkmgb). -#### [Nigeria Statistics](https://nigeria.opendataforafrica.org) -Demographics, Health, Agriculture and Education Statistics for Nigeria. - #### [SouthAfrica Census](https://southafrica.opendataforafrica.org/) South Africa Demographics, Health and Education data from South Africa Data Portal by country, province and district municipality. diff --git a/datasets/Energy.md b/datasets/Energy.md index 6c780d6dc..681c42588 100644 --- a/datasets/Energy.md +++ b/datasets/Energy.md @@ -11,21 +11,6 @@ parent: Data Sources * TOC {:toc} -### [India Energy Dashboard](https://niti.gov.in/edm/) -The India Energy Dashboard is an open energy data portal for India that provides state-wise production and consumption of Electricity, Renewables, Coal, Oil and Gas. - -#### [India Energy Production and Consumption by States](https://niti.gov.in/edm/#stateOverview) -Historical data on energy generated and fuel quantiy consumed by different states in India. -[Disclaimer](https://niti.gov.in/edm/#help) - -### [Stanford University](https://www.stanford.edu/) - -#### [DeepSolar](http://web.stanford.edu/group/deepsolar/home) -Location and size of solar photovoltaic panels in the US based on satellite imagery. - -[Paper for Citation](https://www.cell.com/joule/fulltext/S2542-4351(18)30570-1). - - ### [U.S. Energy Information Administration (EIA)](https://www.eia.gov/) #### [Commercial Buildings Energy Consumption Survey (CBECS)](https://www.eia.gov/consumption/commercial/) diff --git a/datasets/Environment.md b/datasets/Environment.md index b7f96a5af..1e4db63d5 100644 --- a/datasets/Environment.md +++ b/datasets/Environment.md @@ -72,9 +72,6 @@ India's Central Pollution Control Board (CPCB) portal for Air Quality Management #### [India Air Quality Index](https://app.cpcbccr.com/AQI_India/) Air Quality Index and possible health impacts reported for states, cities and stations in India. -#### [India aqi pollutants](https://app.cpcbccr.com/AQI_India/) -India Air Quality Data contains mean values of various pollutants measured once in 4 hours along with other details like station name, state, city and date for the period. - ### [India Water Resources Information System](https://indiawris.gov.in/wris/#/) The Water Resources Information System (WRIS) is a repository of water resources and related data for India at national, state and district level. @@ -84,11 +81,6 @@ Water quality data measured at ground and surface water qualiy stations across I #### [WRIS India Rainfall](https://indiawris.gov.in/wris/#/DataDownload) WRIS India monthly rainfall data of district level. -### [National Institution for Transforming India.](https://niti.gov.in/) - -#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download) -Sustainable Development Goals India Index - ### [Organisation for Economic Co-operation and Development (OECD)](https://stats.oecd.org/) #### [Air and GHG emissions](https://data.oecd.org/air/air-and-ghg-emissions.htm) @@ -100,21 +92,9 @@ Population connected to the waste water treatment using different methods from t ### [Resources for the Future (RFF)](https://www.rff.org/) -#### [US Forecast Weather Variability - 0.25 degree resolution](https://www.rff.org/publications/data-tools/) -This dataset includes US forecast (till 2100) weather variability at 0.25 degree resolution, expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables. - -#### [US Forecast Weather Variability - County](https://www.rff.org/publications/data-tools/) -This dataset includes US county-level forecast (till 2100) weather variability expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables. These were aggregated from stats at 0.25 degree resolution by Data Commons. - #### [US Geo Grids for RFF](https://www.rff.org/publications/data-tools/) This dataset includes geo grid places in US at 0.25 degree resolution and 4km resolution. -#### [US Historical Weather Variability - 4km resolution](https://www.rff.org/publications/data-tools/) -This dataset includes US historical weather variability at 4 KM resolution, expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables. - -#### [US Historical Weather Variability - County](https://www.rff.org/publications/data-tools/) -This dataset includes US county-level historical weather variability expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables. These were aggregated from stats at 4 KM resolution by RFF. - #### [US Wildfire, Smoke and Drought statistics - County and State](https://www.rff.org/publications/data-tools/) This dataset incorporates statistics aggregated by RFF from the following sources: @@ -207,18 +187,10 @@ Includes data about various heat-stress-induced medical incidents. Air quality data collected from outdoor monitors on the county, CBSA, and site monitor level. -#### [EJSCREEN](https://www.epa.gov/ejscreen) -Environmental justice mapping tool based on environmental and demographic indicators. - - #### [Greenhouse Gas Reporting Program](https://www.epa.gov/enviro/greenhouse-gas-overview) Annual reporting of greenhouse gases from large emission sources. -#### [National Emissions Inventory](https://www.epa.gov/air-emissions-inventories/national-emissions-inventory-nei) -The National Emissions Inventory (NEI) is a comprehensive and detailed estimate of air emissions of criteria pollutants, hazardous pollutants and greenhouse gases from 188 OnRoad air emission sources (such as Mobile Sources Highway Vehicles Electricity and Mobile Sources Border Crossing), 248 NonRoad air emissions sources (such as Mobile Sources Off-highway Vehicle Gasoline and LPG Construction Mining Equipment), 703 NonPoint air emissions sources (such as Industrial Processes Oil Gas Exploration Production and LPG Distribution) and 5818 Point air emissions sources (such as Chemical Evaporation Organic Solvent Evaporation and External Combustion Electric Generation Boilers)at US County Level. - - #### [Superfund Sites](https://www.epa.gov/superfund) Site contamination data, hazard scores and more. diff --git a/datasets/Health.md b/datasets/Health.md index 330359bdf..72c8c5a12 100644 --- a/datasets/Health.md +++ b/datasets/Health.md @@ -31,11 +31,6 @@ Ireland Demographics, Health and Economy data from Central Statistics Office(CSO #### [Food Resources in California](https://controllerdata.lacity.org/dataset/Food-Resources-in-California/v2mg-qsxf/data) This dataset contains a list of resources such food pantries, food banks, and other food distribution sites throughout the state of California. The data is current as of May 1, 2020. Due to the coronavirus pandemic, many food pantries may change their hours or close temporarily. Please contact a food pantry prior to visiting them to confirm their operating hours and ensure you are in their service area. -### [Colombia DANE National Administrative Department of Statistics](https://www.dane.gov.co) - -#### [Colombia Census](https://www.dane.gov.co/index.php) -Population Census and Statistics for Colombia at Country, Department and Municipality geo-levels. - ### [Dartmouth Atlas Project](https://www.dartmouthatlas.org/) #### [Dartmouth Atlas of Health Care](https://www.dartmouthatlas.org/) @@ -117,18 +112,12 @@ Population Census and Statistics for Argentina. #### [India National Family Health Survey](https://ndap.niti.gov.in/dataset/6822) India National Family Health Survey - Data on population dynamics and health indicators as well as data on emerging issues in health and family welfare and associated domains. -#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download) -Sustainable Development Goals India Index - ### [Open Data for Africa](https://dataportal.opendataforafrica.org/) #### [Kenya Census](https://kenya.opendataforafrica.org/) Kenya Demographics, Health and Education data from Kenya National Bureau Of Statistics by country, county and towns and suburbs. [Terms of use](https://kenya.opendataforafrica.org/gdlkmgb). -#### [Nigeria Statistics](https://nigeria.opendataforafrica.org) -Demographics, Health, Agriculture and Education Statistics for Nigeria. - #### [SouthAfrica Census](https://southafrica.opendataforafrica.org/) South Africa Demographics, Health and Education data from South Africa Data Portal by country, province and district municipality. @@ -258,13 +247,5 @@ Data Commons includes variables related to demographics, energy, health, labor, #### [Coronavirus Disease (COVID-19) Dashboard](https://covid19.who.int/) The World Health Organization publishes national COVID-19 cases and death counts for countries across the world. Data Commons imports this data on a daily basis. - -#### [Global Health Observatory](https://www.who.int/data/gho/data/indicators/indicators-index) -Data Commons imports variables about a variety of health indicators at a country level. - - -#### [WHO Data](https://www.who.int/data) -The UN curated this import of data from the World Health Organization. - Data made available under [CC BY-NC-SA 3.0 IGO](https://www.who.int/about/policies/publishing/copyright).