Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Data Sources pages #558

Merged
merged 5 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 0 additions & 26 deletions datasets/Biomedical.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,19 +64,9 @@ DISEASES is a weekly updated web resource that integrates evidence on disease-ge
#### [DISEASES: Textmining](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The textmining files contain the z-score, the confidence score, and a URL to a viewer of the underlying abstracts. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.


#### [Side Effect Resource (SIDER) 4.1](http://sideeffects.embl.de/)
SIDER is a database of adverse drug reactions. Available information includes side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations. However, this database uses MEDRA ontology, which is under the UMLS license that is limited to non-commercial use. Therefore, only the data under zero license - mappings of PubChem Compound IDs (CIDs), and ATC Codes - are hosted. Data Commons hosts version 4.1 of SIDER released on October 21, 2015. Information about citing SIDER can be found [here](http://sideeffects.embl.de/about/).

This data is made available under the [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).


### [New York Botanical Garden (NYBG)](https://sweetgum.nybg.org/science/)

#### [C. V. Starr Virtual Herbarium (Collaboration)](https://sweetgum.nybg.org/science/vh/learn-more/)
C. V. Starr Virtual Herbarium is a public specimen database with photos and detailed records about millions of plants, fungi, and algae.


### [PharmGKB](https://www.pharmgkb.org/)

#### [PharmGKB Primary Data](https://www.pharmgkb.org/)
Expand All @@ -97,14 +87,6 @@ The Human Protein Tissue Atlas contains information about the distribution of pr
This [dataset](https://www.proteinatlas.org/download/normal_tissue.tsv.zip) is available under [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/). Please also see their [Disclaimer](https://www.proteinatlas.org/about/disclaimer) and [Licence & Citation](https://www.proteinatlas.org/about/licence).


### [U.S. Adopted Names (USAN) Council](https://www.ama-assn.org/about/united-states-adopted-names/usan-council)

#### [USAN Stems](https://www.ama-assn.org/about/united-states-adopted-names/united-states-adopted-names-approved-stems)
USAN stems represent common stems for which chemical and/or pharmacologic parameters have been established. These council-approved stems and their definitions are recommended for use in coining new nonproprietary drug names belonging to an established series of related agents. USAN appropriately incorporates this established class stem system. By doing so, similar compounds maintain a common "family" name that provides immediate recognition.

This data is made available through [openFDA terms of service](https://open.fda.gov/license/).


### [U.S. National Institutes of Health: National Center for Biotechnology Information (NIH: NCBI)](https://www.ncbi.nlm.nih.gov/)

#### [NCBI Assembly](https://www.ncbi.nlm.nih.gov/assembly)
Expand Down Expand Up @@ -140,11 +122,3 @@ The Disease Ontology was developed as a project by the Institute of Genome Scien

The data is made available under [C0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://disease-ontology.org/resources/citing-do).


### [World Health Organization (WHO)](https://www.who.int/)

#### [ATC Codes](https://www.whocc.no/atc_ddd_index/)
Anatomical Therapeutic Chemical (ATC) is a heirarchical classification system for pharmacological substances. 'In the ATC classification system, the active substances are classified in a hierarchy with five different levels. The system has fourteen main anatomical/pharmacological groups or 1st levels. Each ATC main group is divided into 2nd levels which could be either pharmacological or therapeutic groups. The 3rd and 4th levels are chemical, pharmacological or therapeutic subgroups and the 5th level is the chemical substance. The 2nd, 3rd and 4th levels are often used to identify pharmacological subgroups when that is considered more appropriate than therapeutic or chemical subgroups.'

Data made available under [CC BY-NC-SA 3.0 IGO](https://www.who.int/about/policies/publishing/copyright).

9 changes: 3 additions & 6 deletions datasets/Demographics.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ Japan Demographics, Economy, Health, Education data from Portal Site of Official

### [Stadt Zurich open data](https://data.stadt-zuerich.ch//)

#### [Zurich GeoCoordinates](https://data.stadt-zuerich.ch/dataset/geo_stadtkreise)
Administrative Area divisions of Zurich.

#### [Zurich population](https://data.stadt-zuerich.ch/)
Population of Zurich city contains structure and organization of the city administration

Expand Down Expand Up @@ -295,12 +298,6 @@ Statistics on relative deprivation in small areas in England.
#### [UK Open Geography Portal](https://geoportal.statistics.gov.uk/)
The Open Geography portal from the Office for National Statistics (ONS) provides free and open access to the definitive source of geographic products, web applications, story maps, services and APIs.

### [Unique Identification Authority of India](https://uidai.gov.in/)
Unique Identification Authority of India issues Aadhaar to all residents of India.

#### [India Aadhaar Dashboard](https://uidai.gov.in/aadhaar_dashboard)
Aadhaar dashboard has the details of unique identifier (Aadhaar) enrollment, update, authentication and KYC statistics.

### [United Nations (UN)](https://www.un.org/en/)

#### [UN OCHA Subnational Administrative Boundaries](https://data.humdata.org/)
Expand Down
36 changes: 8 additions & 28 deletions datasets/Environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,6 @@ Data Commons includes relative measures of risk from the 18 natural hazards incl
A global inventory of glaciers, including surface areas.


### [India Central Pollution Control Board (CPCB)](https://app.cpcbccr.com)
India's Central Pollution Control Board (CPCB) portal for Air Quality Management.

#### [India Air Quality Index](https://app.cpcbccr.com/AQI_India/)
Air Quality Index and possible health impacts reported for states, cities and stations in India.

### [India Water Resources Information System](https://indiawris.gov.in/wris/#/)
The Water Resources Information System (WRIS) is a repository of water resources and related data for India at national, state and district level.

Expand All @@ -92,9 +86,6 @@ Population connected to the waste water treatment using different methods from t

### [Resources for the Future (RFF)](https://www.rff.org/)

#### [US Geo Grids for RFF](https://www.rff.org/publications/data-tools/)
This dataset includes geo grid places in US at 0.25 degree resolution and 4km resolution.

#### [US Wildfire, Smoke and Drought statistics - County and State](https://www.rff.org/publications/data-tools/)
This dataset incorporates statistics aggregated by RFF from the following
sources:
Expand Down Expand Up @@ -187,10 +178,18 @@ Includes data about various heat-stress-induced medical incidents.
Air quality data collected from outdoor monitors on the county, CBSA, and site monitor level.


#### [EJSCREEN](https://www.epa.gov/ejscreen)
Environmental justice mapping tool based on environmental and demographic indicators.


#### [Greenhouse Gas Reporting Program](https://www.epa.gov/enviro/greenhouse-gas-overview)
Annual reporting of greenhouse gases from large emission sources.


#### [National Emissions Inventory](https://www.epa.gov/air-emissions-inventories/national-emissions-inventory-nei)
The National Emissions Inventory (NEI) is a comprehensive and detailed estimate of air emissions of criteria pollutants, hazardous pollutants and greenhouse gases from 188 OnRoad air emission sources (such as Mobile Sources Highway Vehicles Electricity and Mobile Sources Border Crossing), 248 NonRoad air emissions sources (such as Mobile Sources Off-highway Vehicle Gasoline and LPG Construction Mining Equipment), 703 NonPoint air emissions sources (such as Industrial Processes Oil Gas Exploration Production and LPG Distribution) and 5818 Point air emissions sources (such as Chemical Evaporation Organic Solvent Evaporation and External Combustion Electric Generation Boilers)at US County Level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding the stuff in parentheses hard to process; are these actually names of some departments or something? If not, they should just all be written in lowercase, and some of them need some kind of noun to indicate what they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed about the capitalization. @ajaits do you know where these descriptions are coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

gmechali marked this conversation as resolved.
Show resolved Hide resolved

gmechali marked this conversation as resolved.
Show resolved Hide resolved
gmechali marked this conversation as resolved.
Show resolved Hide resolved

#### [Superfund Sites](https://www.epa.gov/superfund)
Site contamination data, hazard scores and more.

Expand Down Expand Up @@ -260,31 +259,12 @@ Information related to the wildland fire management incidents and resources.

### [United States Geological Service (USGS)](https://www.usgs.gov/)

#### [Advanced National Seismic System Comprehensive Earthquake Catalog (ComCat)](https://earthquake.usgs.gov/data/comcat)
Earthquake source parameters (e.g. hypocenters, magnitudes, phase picks and amplitudes) and other products (e.g. moment tensor solutions, macroseismic information, tectonic summaries, maps) produced by contributing seismic networks. Data Commons includes date, time, location, magnitudes, magnitude errors, depth, depth error, and review status of earthquakes of magnitude 3 onwards starting from 1900.


#### [National Water Use Data](https://waterdata.usgs.gov/nwis/wu)
Water use data for states and counties in the US, broken down by water source (ground water, surface water), water type (fresh water, saline water), and category of use (domestic, industrial, etc.).

[USGS Copyrights and Credits Terms of Service](https://www.usgs.gov/information-policies-and-instructions/copyrights-and-credits).


### [Wildland Fire Interagency Geospatial Services](https://data-nifc.opendata.arcgis.com/)

#### [WFIGS Wildland Fire Areas](https://data-nifc.opendata.arcgis.com/datasets/nifc::wfigs-wildland-fire-locations-full-history/about)
The Wildland Fire Interagency Geospatial Services (WFIGS) Group provides authoritative geospatial data products under the interagency Wildland Fire Data Program. This dataset provides areas for all reported wildland fires in the United States at county, state and country level.

#### [WFIGS Wildland Fire Locations](https://data-nifc.opendata.arcgis.com/datasets/nifc::wfigs-wildland-fire-locations-full-history/about)
Point Locations for all reported wildland fires in the United States.


#### [WFIGS Wildland Fire Perimeters](https://data-nifc.opendata.arcgis.com/datasets/nifc::wfigs-wildland-fire-perimeters-full-history/about)
The Wildland Fire Interagency Geospatial Services (WFIGS) Group provides authoritative geospatial data products under the interagency Wildland Fire Data Program. This dataset provides perimeters for all reported wildland fires in the United States. We simplify those parameters by using Ramer-Douglas-Peucker algorithm on geoJsonCoordinates with epsilon of 0.01.

[Terms of Use](https://www.nwcg.gov/publications/pms936/nifs/public-distribution).


### [World Bank](https://www.worldbank.org/en/home)

#### [World Bank Datasets](https://data.worldbank.org)
Expand Down
5 changes: 0 additions & 5 deletions datasets/Health.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,6 @@ Israel Demographics, Health, Economy statistics for Israel at country, district
#### [Ireland Census](https://www.cso.ie/en/statistics/)
Ireland Demographics, Health and Economy data from Central Statistics Office(CSO) by country, county and city.

### [City Controller, City of Los Angeles](https://controller.lacity.gov/)

#### [Food Resources in California](https://controllerdata.lacity.org/dataset/Food-Resources-in-California/v2mg-qsxf/data)
This dataset contains a list of resources such food pantries, food banks, and other food distribution sites throughout the state of California. The data is current as of May 1, 2020. Due to the coronavirus pandemic, many food pantries may change their hours or close temporarily. Please contact a food pantry prior to visiting them to confirm their operating hours and ensure you are in their service area.

### [Dartmouth Atlas Project](https://www.dartmouthatlas.org/)

#### [Dartmouth Atlas of Health Care](https://www.dartmouthatlas.org/)
Expand Down