Skip to content

Commit

Permalink
Updating Data Sources pages (#539)
Browse files Browse the repository at this point in the history
  • Loading branch information
keyurva authored Nov 12, 2024
1 parent 4ba51ef commit 6c71a4f
Show file tree
Hide file tree
Showing 9 changed files with 14 additions and 108 deletions.
5 changes: 0 additions & 5 deletions datasets/Agriculture.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,6 @@ parent: Data Sources
* TOC
{:toc}

### [Open Data for Africa](https://dataportal.opendataforafrica.org/)

#### [Nigeria Statistics](https://nigeria.opendataforafrica.org)
Demographics, Health, Agriculture and Education Statistics for Nigeria.

### [U.S. Department of Agriculture (USDA)](https://www.usda.gov/)

#### [Agricultural Survey](https://quickstats.nass.usda.gov/)
Expand Down
16 changes: 14 additions & 2 deletions datasets/Biomedical.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,16 @@ This data is made available under Creative Commons Attribution ShareAlike 4.0 In

### [Jensen Lab (University of Copenhagen)](https://jensenlab.org/resources/)

#### [DISEASES](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.
#### [DISEASES: Experiment](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The knowledge files further contain the source database, the evidence type, and the confidence score. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.


#### [DISEASES: Knowledge](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The experiments files instead contain the source database, the source score, and the confidence score. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.


#### [DISEASES: Textmining](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The textmining files contain the z-score, the confidence score, and a URL to a viewer of the underlying abstracts. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.


#### [Side Effect Resource (SIDER) 4.1](http://sideeffects.embl.de/)
Expand Down Expand Up @@ -103,6 +111,10 @@ This data is made available through [openFDA terms of service](https://open.fda.
"The [NCBI Assembly database](www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project" (Kitts et al. 2016). In this import we include the metadata for all genome assemblies documented in `assembly_summary_genbank.txt` and `assembly_summary_refseq.txt`. Assemblies are stored in GenomeAssembly nodes whose information is integrated from both the GenBank and RefSeq datasets.


#### [NCBI Gene](https://www.ncbi.nlm.nih.gov/gene)
"[NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) supplies gene-specific connections in the nexus of map, sequence, expression, structure, function, citation, and homology data. Unique identifiers are assigned to genes with defining sequences, genes with known map positions, and genes inferred from phenotypic information. These gene identifiers are used throughout NCBI's databases and tracked through updates of annotation. Gene includes genomes represented by [NCBI Reference Sequences](https://www.ncbi.nlm.nih.gov/refseq/) (or RefSeqs) and is integrated for indexing and query and retrieval from NCBI's Entrez and [E-Utilities](https://www.ncbi.nlm.nih.gov/books/NBK25501/) systems. Gene comprises sequences from thousands of distinct taxonomic identifiers, ranging from viruses to bacteria to eukaryotes. It represents chromosomes, organelles, plasmids, viruses, transcripts, and millions of proteins."


#### [NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy/)
"NCBI Taxonomy "consists of a curated set of names and classifications for all of the source organisms represented in the International Nucleotide Sequence Database Collaboration (INSDC). The NCBI Taxonomy database contains a list of names that are determined to be nomenclaturally correct or valid (as defined according to the different codes of nomenclature), classified in an approximately phylogenetic hierarchy (depending on the level of knowledge regarding phylogenetic relationships of a given group) as well as a number of names that exist outside the jurisdiction of the codes. That is, it focuses on nomenclature and systematics, rather than documenting the description of taxa."

Expand Down
5 changes: 0 additions & 5 deletions datasets/Crime.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,6 @@ parent: Data Sources
* TOC
{:toc}

### [National Institution for Transforming India.](https://niti.gov.in/)

#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download)
Sustainable Development Goals India Index

### [U.S. Bureau of Justice Statistics (BJS)](https://bjs.ojp.gov/)

#### [National Prisoner Statistics (NPS) Program](https://bjs.ojp.gov/data-collection/national-prisoner-statistics-nps-program)
Expand Down
11 changes: 0 additions & 11 deletions datasets/Demographics.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,6 @@ Israel Demographics, Health, Economy statistics for Israel at country, district
#### [Ireland Census](https://www.cso.ie/en/statistics/)
Ireland Demographics, Health and Economy data from Central Statistics Office(CSO) by country, county and city.

### [Colombia DANE National Administrative Department of Statistics](https://www.dane.gov.co)

#### [Colombia Census](https://www.dane.gov.co/index.php)
Population Census and Statistics for Colombia at Country, Department and Municipality geo-levels.

### [DataMeet](https://datameet.org/)

#### [DataMeet Maps](http://projects.datameet.org/maps)
Expand Down Expand Up @@ -128,18 +123,12 @@ India National Family Health Survey - Data on population dynamics and health ind
#### [India Poverty Status](https://www.niti.gov.in/sites/default/files/2021-11/National_MPI_India-11242021.pdf)
India poverty statistics - percentage of people below poverty line.

#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download)
Sustainable Development Goals India Index

### [Open Data for Africa](https://dataportal.opendataforafrica.org/)

#### [Kenya Census](https://kenya.opendataforafrica.org/)
Kenya Demographics, Health and Education data from Kenya National Bureau Of Statistics by country, county and towns and suburbs.
[Terms of use](https://kenya.opendataforafrica.org/gdlkmgb).

#### [Nigeria Statistics](https://nigeria.opendataforafrica.org)
Demographics, Health, Agriculture and Education Statistics for Nigeria.

#### [SouthAfrica Census](https://southafrica.opendataforafrica.org/)
South Africa Demographics, Health and Education data from South Africa Data Portal by country, province and district municipality.

Expand Down
3 changes: 0 additions & 3 deletions datasets/Economy.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,6 @@ Population Census and Statistics for Mexico at Country and State levels.
#### [India Poverty Status](https://www.niti.gov.in/sites/default/files/2021-11/National_MPI_India-11242021.pdf)
India poverty statistics - percentage of people below poverty line.

#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download)
Sustainable Development Goals India Index

### [OpenFIGI](https://www.openfigi.com/)

#### [FIGI](https://www.openfigi.com/)
Expand Down
20 changes: 0 additions & 20 deletions datasets/Education.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,6 @@ Population Census and Statistics for Brazil.
California schools performance data across different grade levels and sub-groups (like race, disability etc).
[CAASPP](https://caaspp-elpac.ets.org/elpac/)

### [Colombia DANE National Administrative Department of Statistics](https://www.dane.gov.co)

#### [Colombia Census](https://www.dane.gov.co/index.php)
Population Census and Statistics for Colombia at Country, Department and Municipality geo-levels.

### [European Union (EU) Eurostat](https://ec.europa.eu/eurostat)

#### [EuroStat Early Education and Training](https://ec.europa.eu/eurostat/web/education-and-training/database)
Expand Down Expand Up @@ -54,32 +49,17 @@ Data on schools, such as dropout rate and access to computers and toilets, in In
#### [Unified District Information System for Education (UDISE)](https://udiseplus.gov.in/#/home)
The Unified District Information System for Education (UDISE), by the Ministry of Education, India, collects and provides data related to schools and their resources.

### [India National Sample Survey](https://mospi.gov.in/web/nss)
The India National Sample Survey (NSS) Organizes and conducts large scale all-India sample surveys on different population groups in diverse socio economic areas, such as employment, consumer expenditure, housing conditions and environment, literacy levels, health, nutrition, family welfare, etc.

#### [India Literacy](https://www.mospi.gov.in/sites/default/files/publication_reports/Report_585_75th_round_Education_final_1507_0.pdf)
Data covering the literacy status and rate for states in India.
[Terms of Use](https://ndap.niti.gov.in/info?tab=termsandconditions)

### [Mexico National Institute of Statistics and Geography(INEGI)](https://www.inegi.org.mx/default.html)

#### [Mexico Census](https://en.www.inegi.org.mx/temas/)
Population Census and Statistics for Mexico at Country and State levels.

### [National Institution for Transforming India.](https://niti.gov.in/)

#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download)
Sustainable Development Goals India Index

### [Open Data for Africa](https://dataportal.opendataforafrica.org/)

#### [Kenya Census](https://kenya.opendataforafrica.org/)
Kenya Demographics, Health and Education data from Kenya National Bureau Of Statistics by country, county and towns and suburbs.
[Terms of use](https://kenya.opendataforafrica.org/gdlkmgb).

#### [Nigeria Statistics](https://nigeria.opendataforafrica.org)
Demographics, Health, Agriculture and Education Statistics for Nigeria.

#### [SouthAfrica Census](https://southafrica.opendataforafrica.org/)
South Africa Demographics, Health and Education data from South Africa Data Portal by country, province and district municipality.

Expand Down
15 changes: 0 additions & 15 deletions datasets/Energy.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,6 @@ parent: Data Sources
* TOC
{:toc}

### [India Energy Dashboard](https://niti.gov.in/edm/)
The India Energy Dashboard is an open energy data portal for India that provides state-wise production and consumption of Electricity, Renewables, Coal, Oil and Gas.

#### [India Energy Production and Consumption by States](https://niti.gov.in/edm/#stateOverview)
Historical data on energy generated and fuel quantiy consumed by different states in India.
[Disclaimer](https://niti.gov.in/edm/#help)

### [Stanford University](https://www.stanford.edu/)

#### [DeepSolar](http://web.stanford.edu/group/deepsolar/home)
Location and size of solar photovoltaic panels in the US based on satellite imagery.

[Paper for Citation](https://www.cell.com/joule/fulltext/S2542-4351(18)30570-1).


### [U.S. Energy Information Administration (EIA)](https://www.eia.gov/)

#### [Commercial Buildings Energy Consumption Survey (CBECS)](https://www.eia.gov/consumption/commercial/)
Expand Down
28 changes: 0 additions & 28 deletions datasets/Environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,6 @@ India's Central Pollution Control Board (CPCB) portal for Air Quality Management
#### [India Air Quality Index](https://app.cpcbccr.com/AQI_India/)
Air Quality Index and possible health impacts reported for states, cities and stations in India.

#### [India aqi pollutants](https://app.cpcbccr.com/AQI_India/)
India Air Quality Data contains mean values of various pollutants measured once in 4 hours along with other details like station name, state, city and date for the period.

### [India Water Resources Information System](https://indiawris.gov.in/wris/#/)
The Water Resources Information System (WRIS) is a repository of water resources and related data for India at national, state and district level.

Expand All @@ -84,11 +81,6 @@ Water quality data measured at ground and surface water qualiy stations across I
#### [WRIS India Rainfall](https://indiawris.gov.in/wris/#/DataDownload)
WRIS India monthly rainfall data of district level.

### [National Institution for Transforming India.](https://niti.gov.in/)

#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download)
Sustainable Development Goals India Index

### [Organisation for Economic Co-operation and Development (OECD)](https://stats.oecd.org/)

#### [Air and GHG emissions](https://data.oecd.org/air/air-and-ghg-emissions.htm)
Expand All @@ -100,21 +92,9 @@ Population connected to the waste water treatment using different methods from t

### [Resources for the Future (RFF)](https://www.rff.org/)

#### [US Forecast Weather Variability - 0.25 degree resolution](https://www.rff.org/publications/data-tools/)
This dataset includes US forecast (till 2100) weather variability at 0.25 degree resolution, expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables.

#### [US Forecast Weather Variability - County](https://www.rff.org/publications/data-tools/)
This dataset includes US county-level forecast (till 2100) weather variability expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables. These were aggregated from stats at 0.25 degree resolution by Data Commons.

#### [US Geo Grids for RFF](https://www.rff.org/publications/data-tools/)
This dataset includes geo grid places in US at 0.25 degree resolution and 4km resolution.

#### [US Historical Weather Variability - 4km resolution](https://www.rff.org/publications/data-tools/)
This dataset includes US historical weather variability at 4 KM resolution, expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables.

#### [US Historical Weather Variability - County](https://www.rff.org/publications/data-tools/)
This dataset includes US county-level historical weather variability expressed as standard deviation, skewness and kurtosis for daily min/max temperature and precipitation. Additionally, it includes statistics for Heavy Precipitation Index and Consecutive Dry Days variables. These were aggregated from stats at 4 KM resolution by RFF.

#### [US Wildfire, Smoke and Drought statistics - County and State](https://www.rff.org/publications/data-tools/)
This dataset incorporates statistics aggregated by RFF from the following
sources:
Expand Down Expand Up @@ -207,18 +187,10 @@ Includes data about various heat-stress-induced medical incidents.
Air quality data collected from outdoor monitors on the county, CBSA, and site monitor level.


#### [EJSCREEN](https://www.epa.gov/ejscreen)
Environmental justice mapping tool based on environmental and demographic indicators.


#### [Greenhouse Gas Reporting Program](https://www.epa.gov/enviro/greenhouse-gas-overview)
Annual reporting of greenhouse gases from large emission sources.


#### [National Emissions Inventory](https://www.epa.gov/air-emissions-inventories/national-emissions-inventory-nei)
The National Emissions Inventory (NEI) is a comprehensive and detailed estimate of air emissions of criteria pollutants, hazardous pollutants and greenhouse gases from 188 OnRoad air emission sources (such as Mobile Sources Highway Vehicles Electricity and Mobile Sources Border Crossing), 248 NonRoad air emissions sources (such as Mobile Sources Off-highway Vehicle Gasoline and LPG Construction Mining Equipment), 703 NonPoint air emissions sources (such as Industrial Processes Oil Gas Exploration Production and LPG Distribution) and 5818 Point air emissions sources (such as Chemical Evaporation Organic Solvent Evaporation and External Combustion Electric Generation Boilers)at US County Level.


#### [Superfund Sites](https://www.epa.gov/superfund)
Site contamination data, hazard scores and more.

Expand Down
Loading

0 comments on commit 6c71a4f

Please sign in to comment.