Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small text updates #81

Merged
merged 2 commits into from
Oct 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 21 additions & 29 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
GEM
remote: https://rubygems.org/
specs:
activesupport (6.0.3.2)
activesupport (6.0.3.4)
concurrent-ruby (~> 1.0, >= 1.0.2)
i18n (>= 0.7, < 2)
minitest (~> 5.1)
Expand All @@ -19,7 +19,7 @@ GEM
concurrent-ruby (1.1.7)
dnsruby (1.61.4)
simpleidn (~> 0.1)
em-websocket (0.5.1)
em-websocket (0.5.2)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0.6.0)
ethon (0.12.0)
Expand All @@ -29,35 +29,33 @@ GEM
faraday (1.0.1)
multipart-post (>= 1.2, < 3)
ffi (1.13.1)
font-awesome-sass (5.13.0)
sassc (>= 1.11)
forwardable-extended (2.6.0)
gemoji (3.0.1)
github-pages (207)
github-pages (209)
github-pages-health-check (= 1.16.1)
jekyll (= 3.9.0)
jekyll-avatar (= 0.7.0)
jekyll-coffeescript (= 1.1.1)
jekyll-commonmark-ghpages (= 0.1.6)
jekyll-default-layout (= 0.1.4)
jekyll-feed (= 0.13.0)
jekyll-feed (= 0.15.1)
jekyll-gist (= 1.5.0)
jekyll-github-metadata (= 2.13.0)
jekyll-mentions (= 1.5.1)
jekyll-mentions (= 1.6.0)
jekyll-optional-front-matter (= 0.3.2)
jekyll-paginate (= 1.1.0)
jekyll-readme-index (= 0.3.0)
jekyll-redirect-from (= 0.15.0)
jekyll-redirect-from (= 0.16.0)
jekyll-relative-links (= 0.6.1)
jekyll-remote-theme (= 0.4.1)
jekyll-remote-theme (= 0.4.2)
jekyll-sass-converter (= 1.5.2)
jekyll-seo-tag (= 2.6.1)
jekyll-sitemap (= 1.4.0)
jekyll-swiss (= 1.0.0)
jekyll-theme-architect (= 0.1.1)
jekyll-theme-cayman (= 0.1.1)
jekyll-theme-dinky (= 0.1.1)
jekyll-theme-hacker (= 0.1.1)
jekyll-theme-hacker (= 0.1.2)
jekyll-theme-leap-day (= 0.1.1)
jekyll-theme-merlot (= 0.1.1)
jekyll-theme-midnight (= 0.1.1)
Expand All @@ -68,14 +66,14 @@ GEM
jekyll-theme-tactile (= 0.1.1)
jekyll-theme-time-machine (= 0.1.1)
jekyll-titles-from-headings (= 0.5.3)
jemoji (= 0.11.1)
jemoji (= 0.12.0)
kramdown (= 2.3.0)
kramdown-parser-gfm (= 1.1.0)
liquid (= 4.0.3)
mercenary (~> 0.3)
minima (= 2.5.1)
nokogiri (>= 1.10.4, < 2.0)
rouge (= 3.19.0)
rouge (= 3.23.0)
terminal-table (~> 1.4)
github-pages-health-check (1.16.1)
addressable (~> 2.3)
Expand Down Expand Up @@ -116,32 +114,30 @@ GEM
rouge (>= 2.0, < 4.0)
jekyll-default-layout (0.1.4)
jekyll (~> 3.0)
jekyll-feed (0.13.0)
jekyll-feed (0.15.1)
jekyll (>= 3.7, < 5.0)
jekyll-font-awesome-sass (0.1.1)
font-awesome-sass (>= 4)
jekyll (>= 2.5, < 4.0)
jekyll-gist (1.5.0)
octokit (~> 4.2)
jekyll-github-metadata (2.13.0)
jekyll (>= 3.4, < 5.0)
octokit (~> 4.0, != 4.4.0)
jekyll-mentions (1.5.1)
jekyll-mentions (1.6.0)
html-pipeline (~> 2.3)
jekyll (>= 3.7, < 5.0)
jekyll-optional-front-matter (0.3.2)
jekyll (>= 3.0, < 5.0)
jekyll-paginate (1.1.0)
jekyll-readme-index (0.3.0)
jekyll (>= 3.0, < 5.0)
jekyll-redirect-from (0.15.0)
jekyll-redirect-from (0.16.0)
jekyll (>= 3.3, < 5.0)
jekyll-relative-links (0.6.1)
jekyll (>= 3.3, < 5.0)
jekyll-remote-theme (0.4.1)
jekyll-remote-theme (0.4.2)
addressable (~> 2.0)
jekyll (>= 3.5, < 5.0)
rubyzip (>= 1.3.0)
jekyll-sass-converter (>= 1.0, <= 3.0.0, != 2.0.0)
rubyzip (>= 1.3.0, < 3.0)
jekyll-sass-converter (1.5.2)
sass (~> 3.4)
jekyll-seo-tag (2.6.1)
Expand All @@ -158,8 +154,8 @@ GEM
jekyll-theme-dinky (0.1.1)
jekyll (~> 3.5)
jekyll-seo-tag (~> 2.0)
jekyll-theme-hacker (0.1.1)
jekyll (~> 3.5)
jekyll-theme-hacker (0.1.2)
jekyll (> 3.5, < 5.0)
jekyll-seo-tag (~> 2.0)
jekyll-theme-leap-day (0.1.1)
jekyll (~> 3.5)
Expand Down Expand Up @@ -193,7 +189,7 @@ GEM
jekyll (>= 3.3, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
jemoji (0.11.1)
jemoji (0.12.0)
gemoji (~> 3.0)
html-pipeline (~> 2.2)
jekyll (>= 3.0, < 5.0)
Expand All @@ -211,7 +207,7 @@ GEM
jekyll (>= 3.5, < 5.0)
jekyll-feed (~> 0.9)
jekyll-seo-tag (~> 2.1)
minitest (5.14.1)
minitest (5.14.2)
multipart-post (2.1.1)
nokogiri (1.10.10)
mini_portile2 (~> 2.4.0)
Expand All @@ -225,7 +221,7 @@ GEM
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.4)
rouge (3.19.0)
rouge (3.23.0)
ruby-enum (0.8.0)
i18n
rubyzip (2.3.0)
Expand All @@ -235,8 +231,6 @@ GEM
sass-listen (4.0.0)
rb-fsevent (~> 0.9, >= 0.9.4)
rb-inotify (~> 0.9, >= 0.9.7)
sassc (2.4.0)
ffi (~> 1.9)
sawyer (0.8.2)
addressable (>= 2.3.5)
faraday (> 0.8, < 2.0)
Expand All @@ -261,9 +255,7 @@ PLATFORMS
DEPENDENCIES
github-pages
jekyll-feed (~> 0.6)
jekyll-font-awesome-sass
jekyll-redirect-from
jemoji

BUNDLED WITH
2.1.4
2 changes: 1 addition & 1 deletion contributing/adding_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ If you are seeking to contribute highly structured and clean data to the Data Co

### Cleaning the CSV

Sometimes the CSV needs some processing before it can be imported. There are no restrictions on your approach for this step, only requirements for its result.
Sometimes the CSV needs processing before it can be imported. There are no restrictions on your approach for this step, only requirements for its result.

1. Each [`StatisticalVariable`](https://datacommons.org/browser/StatisticalVariable) must have its own column for its observed value.
1. Each property in your schema must have its own column for its value, including the values of [`observationAbout`](https://datacommons.org/browser/observationAbout) and [`observationDate`](https://datacommons.org/browser/observationDate). ([`observationPeriod`](https://datacommons.org/browser/observationPeriod) is also helpful)
Expand Down
6 changes: 3 additions & 3 deletions contributing/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ has_children: true
---
# Contribute to Data Commons!

Data Commons has benefited greatly from our collaborations with different government organizations and academic institutions and are looking to expand the set of collaborative projects. In particular, we are looking for partner for:
Data Commons has benefited greatly from our collaborations with different government organizations and academic institutions and are looking to expand the set of collaborative projects. In particular, we are looking for partner for:

- [Create tools](#creating-a-new-tool): Build new tools or applications that bring the data in data commons to new categories of users.
- [Create new Curriculum](#sharing-analysis): Using Data Commons in data science and machine learning courses.
- [Create tools](#creating-a-new-tool): Build new tools or applications that bring the data in Data Commons to new categories of users.
- [Create new curriculum](#sharing-analysis): Using Data Commons in data science and machine learning courses.
- [Write documentation](#updating-documentation)


Expand Down
25 changes: 10 additions & 15 deletions data_model.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,25 @@
---
layout: default
title: Data Model
nav_order: 2
---
---
layout: default
title: Data Models
nav_order: 2
---


##Data models
The data included in data commons, even today, covers a wider range of domains, ranging from time series about demographics and employment to hurricanes to election results to protein structures. There is an inherent tension between using domain specific data models versus a more expressive but likely verbose data model capable of covering the breadth of domains we hope to have in data commons.
# Data models
The data included in Data Commons, even today, covers a wider range of domains: ranging from time series about demographics and employment, to hurricanes, to election results, and to protein structures. There is an inherent tension between using domain specific data models versus a more expressive, but likely verbose, data model capable of covering the breadth of domains we hope to have in Data Commons.

It is important for there to have an underlying model capable of expressing the breadth of data we might have about a single entity. For example, consider a place (such as Cook County, IL). There are time series about this place (related to demographics, jobs, etc.), there are specific events (like winter storms), information about historic events, etc. We would like a single uniform schema and query API for a client to access all this data. At the same time, for many applications that access a narrower slice of the data, it would be convenient to use data models that enable more compact encodings and/or more standard query languages such as SQL. To accomplish this, we use an expressive though verbose base representation layer into which everything can be mapped. And on top of this we layer APIs which provide alternate views of the data in more specific data models.
It is important to have an underlying model capable of expressing the breadth of data we might have about a single entity. For example, consider a place such as [Cook County, IL](http://datacommons.org/place/geoId/17031). There are time series about this place (related to demographics, jobs, etc.), there are specific events (like winter storms), information about historic events, etc. We would like a single uniform schema and query API for a client to access all this data. At the same time, for many applications that access a narrower slice of the data, it would be convenient to use data models that enable more compact encodings and/or more standard query languages such as SQL. To accomplish this, we use an expressive though verbose 'base' representation layer into which everything can be mapped. And on top of this we layer APIs which provide alternate views of the data in more specific data models.

Data Commons also provides access to the data in the following different views:
The data model for the base layer is the one used by schema.org <link>. This models the world as a set of entities, with attributes and relationships between entities. There is a taxonomy of entities and each entity is an instance of at least one of the types in the taxonomy. The types and relations types are also entities. This kind of structure has its origins in knowledge representation systems such as KRL and Cyc and has recently found adoption under the name knowledge graph. The node api and sparql apis provide access to this view. The KG browser (raw graph view) allows one to browse through data commons in this view
Time series view provides a set of time series for combinations of entities and variables (StatVars, in data commons parlance). The DCGet api provides API access to this view of the data and the timeline tool allows one to browse data commons in this view
The relational view provides access to a subset of the data commons data as a set of relational tables in Big Query (coming soon). This makes it easier for users to combine their data with data from Data Commons.
1. The data model for the base layer is the one used by [Schema.org/](https://schema.org). This models the world as a set of entities, with attributes and relationships between entities. There is a taxonomy of entities and each entity is an instance of at least one of the types in the taxonomy. The types and relations types are also entities. This kind of structure has its origins in knowledge representation systems such as KRL and Cyc, and has recently found adoption under the name 'knowledge graph'. The [Node and SPARQL APIs](/api) provide access to this view. The [Data Commons Graph Browser](https://datacommons.org/browser) allows one to browse through Data Commons in this raw graph view
1. Time series view provides a set of time series for combinations of entities and variables ([Statistical Varables](https://datacommons.org/browser/StatisticalVariable), in Data Commons parlance). The [DCGet API](/api/sheets/get_variable.html) provides API access to this view of the data and the [Data Commons Timelines tool](https://datacommons.org/tools/timelines) allows one to browse Data Commons in this view.
1. The relational view provides access to a subset of the Data Commons data as a set of relational tables in Big Query (coming soon). This makes it easier for users to combine their data with data from Data Commons.

##Schemas
A single schema (to the extent possible) for all the data is one of data common’s main goals. We would like this schema to be web friendly in the sense that it is an extension of some of the most widely used schemas on the web for structured data. To this end, Data Commons is built on top of Schema.org. We make heavy use of some of Schema.org term (notably StatisticalPopulation and Observation) and extend Schema.org as required, introducing both general constructs (such as Intervals) and values for common attribute values (e.g., Ethnicities, EducationalAttainments, etc.).
## Schemas
A single schema (to the extent possible) for all the data is one of Data Common’s main goals. We would like this schema to be 'web friendly' in the sense that it is an extension of some of the most widely used schemas on the web for structured data. To this end, Data Commons is built on top of [https://schema.org](Schema.org). We make heavy use of some of Schema.org's terms (notably [StatisticalPopulation](https://schema.org/StatisticalPopulation) and [Observation](https://schema.org/Observation) and extend Schema.org as required, introducing both general constructs (such as Intervals) and values for common attribute values (e.g., [Ethnicities](http://browser.datacommons.org/browser/race), [EducationalAttainments](http://browser.datacommons.org/browser/educationalAttainment), etc.).

##CrossWalks
A significant part of the work in building Data Commons is in aligning terms used to refer to the same or overlapping concepts across different datasets. Certain kinds of terms have widely shared meaning, e.g., age, life expectancy. Others, such as educational attainment are measured differently across different regions. Sometimes, different data sets about the same topic will use slightly different definitions of a term (e.g., BLS vs Census on the definition of what it means to be employed) and in some cases, the same dataset might even change its definition over time. Even in these cases, for many applications that aim to perform comparisons, it is useful to have mappings or aggregations between these different terminologies.
## CrossWalks
A significant part of the work in building Data Commons is in aligning terms used to refer to the same or overlapping concepts across different datasets. Certain kinds of terms have widely shared meaning, e.g., [age](http://browser.datacommons.org/browser/age), [life expectancy](http://browser.datacommons.org/browser/lifeExpectancy). Others, such as [educational attainment](http://browser.datacommons.org/browser/educationalAttainment) are measured differently across different regions. Sometimes, different data sets about the same topic will use slightly different definitions of a term (e.g., [BLS](https://www.bls.gov/bls/employment.htm) vs [Census](https://www.census.gov/topics/employment.html) on the definition of what it means to be employed) and in some cases, the same dataset might even change its definition over time. Even in these cases, for many applications that aim to perform comparisons, it is useful to have mappings or aggregations between these different terminologies.

In Data Commons, to the extent possible, we preserve the original encodings. We also introduce new derived attributes/time series that capture mappings. We hope that this will enable useful applications for end users, while preserving the ability for researchers to explore implications of alternate mappings.

Loading