Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data_model.md #449

Merged
merged 1 commit into from
Jul 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions data_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ As a simple example, here are a set of nodes and edges that represent the follow
- Santa Clara county and Berkeley are contained in the state of California
- The latitude of Berkeley, CA is 37.8703

![knowledge graph](/assets/images/dc/concept1.png){: width="600"}
![knowledge graph]({{site.url}}/assets/images/dc/concept1.png){: width="600"}

Each node consists of some kind of entity or value, and each edge describes some kind of property. More specifically, each node consists of the following objects:

Expand All @@ -43,7 +43,7 @@ As in other knowledge graphs, each pair of connected nodes is a _triple_ consist

You can get all the information about a node and its edges by looking at the Knowledge Graph browser. If you know the [DCID](#unique-identifier-dcid) for a node, you can access it directly by typing <code>https://datacommons.org/browser/<var>DCID</var></code>. For example, here is the entry for the `City` node, available at [https://datacommons.org/browser/City](https://datacommons.org/browser/City):

![KG browser](/assets/images/dc/concept2.png){: width="900"}
![KG browser]({{site.url}}/assets/images/dc/concept2.png){: width="900"}

Every node entry shows a list of outgoing edges, or _properties,_ and incoming edges. [Properties](#property) are discussed in more detail below.

Expand Down Expand Up @@ -79,21 +79,21 @@ Note that not all statistical variables have observations for all places or othe

For example, inspecting [Health > Health Insurance (Household) > No Health Insurance > Households Without Health Insurance](https://datacommons.org/tools/statvar#sv=Count_Household_NoHealthInsurance) shows us that the statistical variable `Count_Household_NoHealthInsurance` is available in the United States at state, county, and city levels:

![Stat Var Explorer](/assets/images/dc/concept4.png){: width="900"}
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept4.png){: width="900"}

On the other hand, the [Average Retail Price of Electricity](https://datacommons.org/tools/statvar#Quarterly_Average_RetailPrice_Electricity=&sv=Quarterly_Average_RetailPrice_Electricity), or `Quarterly_Average_RetailPrice_Electricity`, is only available at the state level states in the US but not at the city or county level.

![Stat Var Explorer](/assets/images/dc/concept5.png){: width="900"}
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept5.png){: width="900"}

## Unique identifier: DCID

Every node has a unique identifier, called a Data Commons ID, or DCID. In the [Knowledge Graph browser](https://datacommons.org/browser/), you can view the DCID for any node or edge. For example, the DCID for the city of Berkeley is `geoid/0606000`:

![KG browser](/assets/images/dc/concept6.png){: width="600"}
![KG browser]({{site.url}}/assets/images/dc/concept6.png){: width="600"}

DCIDs are not restricted to entities; statistical variables also have DCIDs. For example, the DCID for the Gini Index of Economic Activity is `GiniIndex_EconomicActivity`:

![Stat Var Explorer](/assets/images/dc/concept7.png){: width="900"}
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept7.png){: width="900"}

### Find a DCID for an entity or variable

Expand All @@ -106,15 +106,15 @@ To find the DCID for a place:
1. Scroll to the **In Arcs** section to look up the places of interest.
1. If necessary, continue to drill down on links until you find the place of interest.

![KG browser](/assets/images/dc/concept8.png){: width="900"}
![KG browser]({{site.url}}/assets/images/dc/concept8.png){: width="900"}

To find the DCID for a statistical variable:

1. Open the Statistical Variable Explorer.
1. Search for the variable of interest, and optionally filter by data source and dataset.
1. Look under the heading for the DCID.

![Stat Var Explorer](/assets/images/dc/concept9.png){: width="900"}
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept9.png){: width="900"}

To find a DCID programmatically for both entities and variables, you can use the REST v2 [Resolve API](/api/rest/v2/resolve.html).

Expand All @@ -126,7 +126,7 @@ Other properties are links to other entities/events/ etc. In the Knowledge Graph

For example, in this node for the city of Addis Ababa, Ethiopia, the `typeOf` and `containedInPlace` edges link to other entities, namely `City` and `Ethiopia`, whereas all the other values are terminal.

![KG browser](/assets/images/dc/concept10.png){: width="600"}
![KG browser]({{site.url}}/assets/images/dc/concept10.png){: width="600"}


Note that the DCID for a property is the same as its name.
Expand All @@ -139,7 +139,7 @@ For example, the value of the statistical variable [`Median Age of Female Popula

Time series made up of many observations underlie the data available in the [Timeline Explorer](https://datacommons.org/tools/visualization#visType=timeline) and timeline graphs. For example, here is the [median income in Berkeley, CA over a period of ten years](https://datacommons.org/tools/visualization#visType%3Dtimeline%26place%3DgeoId%2F0606000%26placeType%3DCensusZipCodeTabulationArea%26sv%3D%7B%22dcid%22%3A%22Median_Income_Person%22%7D), according to the US Census Bureau:

![Timeline Explorer](/assets/images/dc/concept11.png){: width="900"}
![Timeline Explorer]({{site.url}}/assets/images/dc/concept11.png){: width="900"}


## Provenance, Source, Dataset
Expand All @@ -150,16 +150,16 @@ Every node and triple also have some important properties that indicate the orig
- [`Source`](https://datacommons.org/browser/Source): This is a property of a provenance, and a dataset, usually the name of an organization that provides the data or the schema. For example, for provenance [www.abs.gov.au](www.abs.gov.au), the source is the [Australian Bureau of Statistics](https://datacommons.org/browser/dc/s/AustralianBureauOfStatistics).
- [`Dataset`](https://datacommons.org/browser/Dataset): This is the name of a specific dataset provided by a provider. Many sources provide multiple datasets. For example, the source Australian Bureau of Statistics provides two datasets, [Australia Statistics](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaStatistics) (not to be confused with the provenance above), and [Australia Subnational Administrative Boundaries](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaSubnationalAdministrativeBoundaries).

![Knowledge graph](/assets/images/dc/concept12.png){: width="600"}
![Knowledge graph]({{site.url}}/assets/images/dc/concept12.png){: width="600"}


Note that a given statistical variable may have multiple provenances, since many data sets define the same variables. You can see the list of all the data sources for a given statistical variable in the Statistical Variable Explorer. For example, the explorer shows multiple sources (Censuses from India, Mexico, Vietnam, OECD, World Bank, etc.) for the variable [Life Expectancy](https://datacommons.org/tools/statvar#LifeExpectancy_Person=&sv=LifeExpectancy_Person):

![Stat Var Explorer](/assets/images/dc/concept13.png){: width="900"}
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept13.png){: width="900"}

You can see a list of all sources and data sets in several places:

- The [Data sources](/datasets/) pages in this site.
- The **Data source** and **Dataset** drop-down menus in the Statistical Variable Explorer.

![Stat Var Explorer](/assets/images/dc/concept14.png){: width="600"}
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept14.png){: width="600"}