new overview docs #421

kmoscoe · 2024-06-17T19:22:29Z

No description provided.

kmoscoe · 2024-06-17T19:26:16Z

Staged at bullie.svl.corp.google.com:4000

beets

Thanks for sharing this Kara. Has this been reviewed by Prem and Guha?

how_to_use.md

index.md

kmoscoe · 2024-06-18T18:31:49Z

+Prem Ramaswami ***@***.***> It's been reviewed by Prem, who told me to "just publish it". :-) Prem, should Guha review?

…

On Tue, Jun 18, 2024 at 11:23 AM Carolyn Au ***@***.***> wrote: ***@***.**** commented on this pull request. Thanks for sharing this Kara. Has this been reviewed by Prem and Guha? ------------------------------ In how_to_use.md <#421 (comment)> : > +nav_order: 1 +--- + +{: .no_toc} +# How to use Data Commons + +* TOC +{:toc} + +## Learn about the data in Data Commons + +To find out what data is available in Data Commons, check out the [Statistical Variable Explorer](https://datacommons.org/tools/statvar) and see the [Data sources](/datasets/index.html) pages. + +## Issue interactive data queries + +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. It could be helpful to also link to the homepage. Also, to me, it reads as if they're also supposed ot use the search query bar on the vis tools. Should it be ... or use any of the ... or ... or explore using any of the visualization tools ... ------------------------------ In how_to_use.md <#421 (comment)> : > + +## Issue interactive data queries + +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. + +If you want to issue SQL queries, and you have a Google Cloud Platform account, use BigQuery Studio on Data Commons data in [Analytics Hub](https://cloud.google.com/analytics-hub). See the [Data Commons in BigQuery](/bigquery/index.html) page for more details. + +## Issue programmatic data queries + +Data Commons publishes REST, Python, Pandas, Google Sheets, and SPARQL [APIs](/api/index.html). The APIs support both low-level exploration of the knowledge graph as well as higher-level statistical analysis of variables. + +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. + +## Embed data analyses and visualizations in your site + +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. could we label the link Web Components API (it's not quite javascript only...) ------------------------------ In how_to_use.md <#421 (comment)> : > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. + +## Embed data analyses and visualizations in your site + +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. + +## Download data for offline analysis + +Data Commons provides several tools for downloading its data: + +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). + +## Build machine learning models + +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. We try not to use "data sets".. perhaps just Data Commons provides ideal training ... ------------------------------ In how_to_use.md <#421 (comment)> : > +## Embed data analyses and visualizations in your site + +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. + +## Download data for offline analysis + +Data Commons provides several tools for downloading its data: + +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). + +## Build machine learning models + +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. + +## Contribute data to Data Commons Can we point to this page instead, which lists other ways to contibute: https://docs.datacommons.org/contributing/ ------------------------------ In index.md <#421 (comment)> : > -One of the salient aspects of Data Commons is that it is not a repository of data sets. There are many great repositories (Dataverse, BQ public datasets, data.gov) that more than adequately address that need. Instead, it is a single unified database created by normalizing/aligning the schemas and entity references across these different datasets (to the extent possible). So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API. Of course, she would want to know the provenance of the data, which is recorded with every data point, something enabled in the APIs. +Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on – saving organizations months of tedious, costly and error-prone work. Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. + +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081), you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en) for Vietnam: + +![Google Search query result](/assets/images/dcoverview1.png){: width="800"} + +## A standards-based knowledge graph, schema, and APIs + +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. rather than "makes heavy use of schema.org constructs", data commons is an extension of schema.org ------------------------------ In index.md <#421 (comment)> : > + +## A standards-based knowledge graph, schema, and APIs + +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. + +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). + +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. + +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. + + + +## An open-source project and website platform + +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). Can we drop "intended" and just say that it is a community-based resource? We do have external contributions... ------------------------------ In index.md <#421 (comment)> : > + +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). + +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. + +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. + + + +## An open-source project and website platform + +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). + +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. + +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. Javascript --> Web Components ------------------------------ In index.md <#421 (comment)> : > + +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. + +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. + + + +## An open-source project and website platform + +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). + +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. + +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. + +Finally, Data Commons provides an open-source, [customizable website](/custom_dc/index.html) implementation, for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. is "customizable website" the best way to describe custom dc? could we call it [customizable implementation] ------------------------------ In index.md <#421 (comment)> : > + +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. + +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). + +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. + +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. + + + +## An open-source project and website platform + +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). + +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. how about "data coverage" instead of "available data sources" ------------------------------ In index.md <#421 (comment)> : > --- -# Why Data Commons? +{: .no_toc} +# What is Data Commons? Should we point to the About page at all, for the Why? — Reply to this email directly, view it on GitHub <#421 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BHMM7UGITGPZI56JUXP5233ZIB3ILAVCNFSM6AAAAABJOTB4EWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGE3DOMZRGI> . You are receiving this because you authored the thread.Message ID: <datacommonsorg/docsite/pull/421/review/2126167312 <(212)%20616-7312>@ github.com>

…erview

beets

thanks for the quick updates!

how_to_use.md

index.md

kmoscoe · 2024-06-18T20:54:10Z

Feel free to point this at Guha once before publishing.

…

On Tue, Jun 18, 2024 at 2:31 PM Kara Moscoe ***@***.***> wrote: +Prem Ramaswami ***@***.***> It's been reviewed by Prem, who told me to "just publish it". :-) Prem, should Guha review? On Tue, Jun 18, 2024 at 11:23 AM Carolyn Au ***@***.***> wrote: > ***@***.**** commented on this pull request. > > Thanks for sharing this Kara. Has this been reviewed by Prem and Guha? > ------------------------------ > > In how_to_use.md > <#421 (comment)> > : > > > +nav_order: 1 > +--- > + > +{: .no_toc} > +# How to use Data Commons > + > +* TOC > +{:toc} > + > +## Learn about the data in Data Commons > + > +To find out what data is available in Data Commons, check out the [Statistical Variable Explorer](https://datacommons.org/tools/statvar) and see the [Data sources](/datasets/index.html) pages. > + > +## Issue interactive data queries > + > +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. > > It could be helpful to also link to the homepage. > > Also, to me, it reads as if they're also supposed ot use the search query > bar on the vis tools. Should it be ... or use any of the ... or ... or > explore using any of the visualization tools ... > ------------------------------ > > In how_to_use.md > <#421 (comment)> > : > > > + > +## Issue interactive data queries > + > +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. > + > +If you want to issue SQL queries, and you have a Google Cloud Platform account, use BigQuery Studio on Data Commons data in [Analytics Hub](https://cloud.google.com/analytics-hub). See the [Data Commons in BigQuery](/bigquery/index.html) page for more details. > + > +## Issue programmatic data queries > + > +Data Commons publishes REST, Python, Pandas, Google Sheets, and SPARQL [APIs](/api/index.html). The APIs support both low-level exploration of the knowledge graph as well as higher-level statistical analysis of variables. > + > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. > + > +## Embed data analyses and visualizations in your site > + > +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. > > could we label the link Web Components API (it's not quite javascript > only...) > ------------------------------ > > In how_to_use.md > <#421 (comment)> > : > > > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. > + > +## Embed data analyses and visualizations in your site > + > +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. > + > +## Download data for offline analysis > + > +Data Commons provides several tools for downloading its data: > + > +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. > +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). > + > +## Build machine learning models > + > +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. > > We try not to use "data sets".. perhaps just Data Commons provides ideal > training ... > ------------------------------ > > In how_to_use.md > <#421 (comment)> > : > > > +## Embed data analyses and visualizations in your site > + > +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. > + > +## Download data for offline analysis > + > +Data Commons provides several tools for downloading its data: > + > +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. > +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). > + > +## Build machine learning models > + > +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. > + > +## Contribute data to Data Commons > > Can we point to this page instead, which lists other ways to contibute: > https://docs.datacommons.org/contributing/ > ------------------------------ > > In index.md > <#421 (comment)> > : > > > > -One of the salient aspects of Data Commons is that it is not a repository of data sets. There are many great repositories (Dataverse, BQ public datasets, data.gov) that more than adequately address that need. Instead, it is a single unified database created by normalizing/aligning the schemas and entity references across these different datasets (to the extent possible). So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API. Of course, she would want to know the provenance of the data, which is recorded with every data point, something enabled in the APIs. > +Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on – saving organizations months of tedious, costly and error-prone work. Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. > + > +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081), you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en) for Vietnam: > + > +![Google Search query result](/assets/images/dcoverview1.png){: width="800"} > + > +## A standards-based knowledge graph, schema, and APIs > + > +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. > > rather than "makes heavy use of schema.org constructs", data commons is > an extension of schema.org > ------------------------------ > > In index.md > <#421 (comment)> > : > > > + > +## A standards-based knowledge graph, schema, and APIs > + > +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. > + > +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). > + > +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. > + > +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. > + > + > + > +## An open-source project and website platform > + > +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). > > Can we drop "intended" and just say that it is a community-based > resource? We do have external contributions... > ------------------------------ > > In index.md > <#421 (comment)> > : > > > + > +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). > + > +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. > + > +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. > + > + > + > +## An open-source project and website platform > + > +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). > + > +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. > + > +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. > > Javascript --> Web Components > ------------------------------ > > In index.md > <#421 (comment)> > : > > > + > +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. > + > +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. > + > + > + > +## An open-source project and website platform > + > +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). > + > +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. > + > +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. > + > +Finally, Data Commons provides an open-source, [customizable website](/custom_dc/index.html) implementation, for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. > > is "customizable website" the best way to describe custom dc? could we > call it [customizable implementation] > ------------------------------ > > In index.md > <#421 (comment)> > : > > > + > +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. > + > +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). > + > +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. > + > +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. > + > + > + > +## An open-source project and website platform > + > +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). > + > +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. > > how about "data coverage" instead of "available data sources" > ------------------------------ > > In index.md > <#421 (comment)> > : > > > --- > > -# Why Data Commons? > +{: .no_toc} > +# What is Data Commons? > > Should we point to the About page at all, for the Why? > > — > Reply to this email directly, view it on GitHub > <#421 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BHMM7UGITGPZI56JUXP5233ZIB3ILAVCNFSM6AAAAABJOTB4EWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGE3DOMZRGI> > . > You are receiving this because you authored the thread.Message ID: > <datacommonsorg/docsite/pull/421/review/2126167312 <(212)%20616-7312>@ > github.com> >

-- ================================== Prem Ramaswami Product Manager DataCommons.org Make Data Sing ***@***.*** Phone: +18579981598 ==================================

kmoscoe · 2024-06-18T21:42:42Z

BTW, one of the odd things about the "About" page is that there is a sentence that says, "I am bullish about...". But there is no indication of who "I" is. Should there be some kind of byline about Guha there?

…

On Tue, Jun 18, 2024 at 1:54 PM Prem Ramaswami ***@***.***> wrote: Feel free to point this at Guha once before publishing. On Tue, Jun 18, 2024 at 2:31 PM Kara Moscoe ***@***.***> wrote: > +Prem Ramaswami ***@***.***> > It's been reviewed by Prem, who told me to "just publish it". :-) Prem, > should Guha review? > > On Tue, Jun 18, 2024 at 11:23 AM Carolyn Au ***@***.***> > wrote: > >> ***@***.**** commented on this pull request. >> >> Thanks for sharing this Kara. Has this been reviewed by Prem and Guha? >> ------------------------------ >> >> In how_to_use.md >> <#421 (comment)> >> : >> >> > +nav_order: 1 >> +--- >> + >> +{: .no_toc} >> +# How to use Data Commons >> + >> +* TOC >> +{:toc} >> + >> +## Learn about the data in Data Commons >> + >> +To find out what data is available in Data Commons, check out the [Statistical Variable Explorer](https://datacommons.org/tools/statvar) and see the [Data sources](/datasets/index.html) pages. >> + >> +## Issue interactive data queries >> + >> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >> >> It could be helpful to also link to the homepage. >> >> Also, to me, it reads as if they're also supposed ot use the search >> query bar on the vis tools. Should it be ... or use any of the ... or ... >> or explore using any of the visualization tools ... >> ------------------------------ >> >> In how_to_use.md >> <#421 (comment)> >> : >> >> > + >> +## Issue interactive data queries >> + >> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >> + >> +If you want to issue SQL queries, and you have a Google Cloud Platform account, use BigQuery Studio on Data Commons data in [Analytics Hub](https://cloud.google.com/analytics-hub). See the [Data Commons in BigQuery](/bigquery/index.html) page for more details. >> + >> +## Issue programmatic data queries >> + >> +Data Commons publishes REST, Python, Pandas, Google Sheets, and SPARQL [APIs](/api/index.html). The APIs support both low-level exploration of the knowledge graph as well as higher-level statistical analysis of variables. >> + >> +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >> + >> +## Embed data analyses and visualizations in your site >> + >> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >> >> could we label the link Web Components API (it's not quite javascript >> only...) >> ------------------------------ >> >> In how_to_use.md >> <#421 (comment)> >> : >> >> > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >> + >> +## Embed data analyses and visualizations in your site >> + >> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >> + >> +## Download data for offline analysis >> + >> +Data Commons provides several tools for downloading its data: >> + >> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >> + >> +## Build machine learning models >> + >> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >> >> We try not to use "data sets".. perhaps just Data Commons provides >> ideal training ... >> ------------------------------ >> >> In how_to_use.md >> <#421 (comment)> >> : >> >> > +## Embed data analyses and visualizations in your site >> + >> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >> + >> +## Download data for offline analysis >> + >> +Data Commons provides several tools for downloading its data: >> + >> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >> + >> +## Build machine learning models >> + >> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >> + >> +## Contribute data to Data Commons >> >> Can we point to this page instead, which lists other ways to contibute: >> https://docs.datacommons.org/contributing/ >> ------------------------------ >> >> In index.md >> <#421 (comment)> >> : >> >> > >> -One of the salient aspects of Data Commons is that it is not a repository of data sets. There are many great repositories (Dataverse, BQ public datasets, data.gov) that more than adequately address that need. Instead, it is a single unified database created by normalizing/aligning the schemas and entity references across these different datasets (to the extent possible). So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API. Of course, she would want to know the provenance of the data, which is recorded with every data point, something enabled in the APIs. >> +Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on – saving organizations months of tedious, costly and error-prone work. Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. >> + >> +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081), you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en) for Vietnam: >> + >> +![Google Search query result](/assets/images/dcoverview1.png){: width="800"} >> + >> +## A standards-based knowledge graph, schema, and APIs >> + >> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >> >> rather than "makes heavy use of schema.org constructs", data commons is >> an extension of schema.org >> ------------------------------ >> >> In index.md >> <#421 (comment)> >> : >> >> > + >> +## A standards-based knowledge graph, schema, and APIs >> + >> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >> + >> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >> + >> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >> + >> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >> + >> + >> + >> +## An open-source project and website platform >> + >> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >> >> Can we drop "intended" and just say that it is a community-based >> resource? We do have external contributions... >> ------------------------------ >> >> In index.md >> <#421 (comment)> >> : >> >> > + >> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >> + >> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >> + >> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >> + >> + >> + >> +## An open-source project and website platform >> + >> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >> + >> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >> + >> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >> >> Javascript --> Web Components >> ------------------------------ >> >> In index.md >> <#421 (comment)> >> : >> >> > + >> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >> + >> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >> + >> + >> + >> +## An open-source project and website platform >> + >> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >> + >> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >> + >> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >> + >> +Finally, Data Commons provides an open-source, [customizable website](/custom_dc/index.html) implementation, for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. >> >> is "customizable website" the best way to describe custom dc? could we >> call it [customizable implementation] >> ------------------------------ >> >> In index.md >> <#421 (comment)> >> : >> >> > + >> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >> + >> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >> + >> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >> + >> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >> + >> + >> + >> +## An open-source project and website platform >> + >> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >> + >> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >> >> how about "data coverage" instead of "available data sources" >> ------------------------------ >> >> In index.md >> <#421 (comment)> >> : >> >> > --- >> >> -# Why Data Commons? >> +{: .no_toc} >> +# What is Data Commons? >> >> Should we point to the About page at all, for the Why? >> >> — >> Reply to this email directly, view it on GitHub >> <#421 (review)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/BHMM7UGITGPZI56JUXP5233ZIB3ILAVCNFSM6AAAAABJOTB4EWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGE3DOMZRGI> >> . >> You are receiving this because you authored the thread.Message ID: >> <datacommonsorg/docsite/pull/421/review/2126167312 <(212)%20616-7312>@ >> github.com> >> > -- ================================== Prem Ramaswami Product Manager DataCommons.org Make Data Sing ***@***.*** Phone: +18579981598 <(857)%20998-1598> ==================================

kmoscoe · 2024-06-18T22:40:22Z

We shoudl change this to 3rd person and remove the first person.

…

On Tue, Jun 18, 2024 at 5:42 PM Kara Moscoe ***@***.***> wrote: BTW, one of the odd things about the "About" page is that there is a sentence that says, "I am bullish about...". But there is no indication of who "I" is. Should there be some kind of byline about Guha there? On Tue, Jun 18, 2024 at 1:54 PM Prem Ramaswami ***@***.***> wrote: > Feel free to point this at Guha once before publishing. > > On Tue, Jun 18, 2024 at 2:31 PM Kara Moscoe ***@***.***> wrote: > >> +Prem Ramaswami ***@***.***> >> It's been reviewed by Prem, who told me to "just publish it". :-) Prem, >> should Guha review? >> >> On Tue, Jun 18, 2024 at 11:23 AM Carolyn Au ***@***.***> >> wrote: >> >>> ***@***.**** commented on this pull request. >>> >>> Thanks for sharing this Kara. Has this been reviewed by Prem and Guha? >>> ------------------------------ >>> >>> In how_to_use.md >>> <#421 (comment)> >>> : >>> >>> > +nav_order: 1 >>> +--- >>> + >>> +{: .no_toc} >>> +# How to use Data Commons >>> + >>> +* TOC >>> +{:toc} >>> + >>> +## Learn about the data in Data Commons >>> + >>> +To find out what data is available in Data Commons, check out the [Statistical Variable Explorer](https://datacommons.org/tools/statvar) and see the [Data sources](/datasets/index.html) pages. >>> + >>> +## Issue interactive data queries >>> + >>> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >>> >>> It could be helpful to also link to the homepage. >>> >>> Also, to me, it reads as if they're also supposed ot use the search >>> query bar on the vis tools. Should it be ... or use any of the ... or ... >>> or explore using any of the visualization tools ... >>> ------------------------------ >>> >>> In how_to_use.md >>> <#421 (comment)> >>> : >>> >>> > + >>> +## Issue interactive data queries >>> + >>> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >>> + >>> +If you want to issue SQL queries, and you have a Google Cloud Platform account, use BigQuery Studio on Data Commons data in [Analytics Hub](https://cloud.google.com/analytics-hub). See the [Data Commons in BigQuery](/bigquery/index.html) page for more details. >>> + >>> +## Issue programmatic data queries >>> + >>> +Data Commons publishes REST, Python, Pandas, Google Sheets, and SPARQL [APIs](/api/index.html). The APIs support both low-level exploration of the knowledge graph as well as higher-level statistical analysis of variables. >>> + >>> +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >>> + >>> +## Embed data analyses and visualizations in your site >>> + >>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>> >>> could we label the link Web Components API (it's not quite javascript >>> only...) >>> ------------------------------ >>> >>> In how_to_use.md >>> <#421 (comment)> >>> : >>> >>> > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >>> + >>> +## Embed data analyses and visualizations in your site >>> + >>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>> + >>> +## Download data for offline analysis >>> + >>> +Data Commons provides several tools for downloading its data: >>> + >>> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >>> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >>> + >>> +## Build machine learning models >>> + >>> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >>> >>> We try not to use "data sets".. perhaps just Data Commons provides >>> ideal training ... >>> ------------------------------ >>> >>> In how_to_use.md >>> <#421 (comment)> >>> : >>> >>> > +## Embed data analyses and visualizations in your site >>> + >>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>> + >>> +## Download data for offline analysis >>> + >>> +Data Commons provides several tools for downloading its data: >>> + >>> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >>> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >>> + >>> +## Build machine learning models >>> + >>> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >>> + >>> +## Contribute data to Data Commons >>> >>> Can we point to this page instead, which lists other ways to contibute: >>> https://docs.datacommons.org/contributing/ >>> ------------------------------ >>> >>> In index.md >>> <#421 (comment)> >>> : >>> >>> > >>> -One of the salient aspects of Data Commons is that it is not a repository of data sets. There are many great repositories (Dataverse, BQ public datasets, data.gov) that more than adequately address that need. Instead, it is a single unified database created by normalizing/aligning the schemas and entity references across these different datasets (to the extent possible). So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API. Of course, she would want to know the provenance of the data, which is recorded with every data point, something enabled in the APIs. >>> +Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on – saving organizations months of tedious, costly and error-prone work. Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. >>> + >>> +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081), you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en) for Vietnam: >>> + >>> +![Google Search query result](/assets/images/dcoverview1.png){: width="800"} >>> + >>> +## A standards-based knowledge graph, schema, and APIs >>> + >>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>> >>> rather than "makes heavy use of schema.org constructs", data commons >>> is an extension of schema.org >>> ------------------------------ >>> >>> In index.md >>> <#421 (comment)> >>> : >>> >>> > + >>> +## A standards-based knowledge graph, schema, and APIs >>> + >>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>> + >>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>> + >>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>> + >>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>> + >>> + >>> + >>> +## An open-source project and website platform >>> + >>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>> >>> Can we drop "intended" and just say that it is a community-based >>> resource? We do have external contributions... >>> ------------------------------ >>> >>> In index.md >>> <#421 (comment)> >>> : >>> >>> > + >>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>> + >>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>> + >>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>> + >>> + >>> + >>> +## An open-source project and website platform >>> + >>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>> + >>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>> + >>> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >>> >>> Javascript --> Web Components >>> ------------------------------ >>> >>> In index.md >>> <#421 (comment)> >>> : >>> >>> > + >>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>> + >>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>> + >>> + >>> + >>> +## An open-source project and website platform >>> + >>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>> + >>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>> + >>> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >>> + >>> +Finally, Data Commons provides an open-source, [customizable website](/custom_dc/index.html) implementation, for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. >>> >>> is "customizable website" the best way to describe custom dc? could we >>> call it [customizable implementation] >>> ------------------------------ >>> >>> In index.md >>> <#421 (comment)> >>> : >>> >>> > + >>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>> + >>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>> + >>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>> + >>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>> + >>> + >>> + >>> +## An open-source project and website platform >>> + >>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>> + >>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>> >>> how about "data coverage" instead of "available data sources" >>> ------------------------------ >>> >>> In index.md >>> <#421 (comment)> >>> : >>> >>> > --- >>> >>> -# Why Data Commons? >>> +{: .no_toc} >>> +# What is Data Commons? >>> >>> Should we point to the About page at all, for the Why? >>> >>> — >>> Reply to this email directly, view it on GitHub >>> <#421 (review)>, >>> or unsubscribe >>> <https://github.com/notifications/unsubscribe-auth/BHMM7UGITGPZI56JUXP5233ZIB3ILAVCNFSM6AAAAABJOTB4EWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGE3DOMZRGI> >>> . >>> You are receiving this because you authored the thread.Message ID: >>> <datacommonsorg/docsite/pull/421/review/2126167312 <(212)%20616-7312>@ >>> github.com> >>> >> > > -- > ================================== > Prem Ramaswami > Product Manager > DataCommons.org > Make Data Sing > ***@***.*** > Phone: +18579981598 <(857)%20998-1598> > ================================== > > > >

-- ================================== Prem Ramaswami Product Manager DataCommons.org Make Data Sing ***@***.*** Phone: +18579981598 ==================================

kmoscoe · 2024-06-18T22:56:08Z

You mean second person plural, i.e. "we"?

…

On Tue, Jun 18, 2024 at 3:40 PM Prem Ramaswami ***@***.***> wrote: We shoudl change this to 3rd person and remove the first person. On Tue, Jun 18, 2024 at 5:42 PM Kara Moscoe ***@***.***> wrote: > BTW, one of the odd things about the "About" page is that there is a > sentence that says, "I am bullish about...". But there is no indication of > who "I" is. Should there be some kind of byline about Guha there? > > On Tue, Jun 18, 2024 at 1:54 PM Prem Ramaswami ***@***.***> wrote: > >> Feel free to point this at Guha once before publishing. >> >> On Tue, Jun 18, 2024 at 2:31 PM Kara Moscoe ***@***.***> wrote: >> >>> +Prem Ramaswami ***@***.***> >>> It's been reviewed by Prem, who told me to "just publish it". :-) >>> Prem, should Guha review? >>> >>> On Tue, Jun 18, 2024 at 11:23 AM Carolyn Au ***@***.***> >>> wrote: >>> >>>> ***@***.**** commented on this pull request. >>>> >>>> Thanks for sharing this Kara. Has this been reviewed by Prem and Guha? >>>> ------------------------------ >>>> >>>> In how_to_use.md >>>> <#421 (comment)> >>>> : >>>> >>>> > +nav_order: 1 >>>> +--- >>>> + >>>> +{: .no_toc} >>>> +# How to use Data Commons >>>> + >>>> +* TOC >>>> +{:toc} >>>> + >>>> +## Learn about the data in Data Commons >>>> + >>>> +To find out what data is available in Data Commons, check out the [Statistical Variable Explorer](https://datacommons.org/tools/statvar) and see the [Data sources](/datasets/index.html) pages. >>>> + >>>> +## Issue interactive data queries >>>> + >>>> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >>>> >>>> It could be helpful to also link to the homepage. >>>> >>>> Also, to me, it reads as if they're also supposed ot use the search >>>> query bar on the vis tools. Should it be ... or use any of the ... or ... >>>> or explore using any of the visualization tools ... >>>> ------------------------------ >>>> >>>> In how_to_use.md >>>> <#421 (comment)> >>>> : >>>> >>>> > + >>>> +## Issue interactive data queries >>>> + >>>> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >>>> + >>>> +If you want to issue SQL queries, and you have a Google Cloud Platform account, use BigQuery Studio on Data Commons data in [Analytics Hub](https://cloud.google.com/analytics-hub). See the [Data Commons in BigQuery](/bigquery/index.html) page for more details. >>>> + >>>> +## Issue programmatic data queries >>>> + >>>> +Data Commons publishes REST, Python, Pandas, Google Sheets, and SPARQL [APIs](/api/index.html). The APIs support both low-level exploration of the knowledge graph as well as higher-level statistical analysis of variables. >>>> + >>>> +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >>>> + >>>> +## Embed data analyses and visualizations in your site >>>> + >>>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>>> >>>> could we label the link Web Components API (it's not quite javascript >>>> only...) >>>> ------------------------------ >>>> >>>> In how_to_use.md >>>> <#421 (comment)> >>>> : >>>> >>>> > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >>>> + >>>> +## Embed data analyses and visualizations in your site >>>> + >>>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>>> + >>>> +## Download data for offline analysis >>>> + >>>> +Data Commons provides several tools for downloading its data: >>>> + >>>> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >>>> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >>>> + >>>> +## Build machine learning models >>>> + >>>> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >>>> >>>> We try not to use "data sets".. perhaps just Data Commons provides >>>> ideal training ... >>>> ------------------------------ >>>> >>>> In how_to_use.md >>>> <#421 (comment)> >>>> : >>>> >>>> > +## Embed data analyses and visualizations in your site >>>> + >>>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>>> + >>>> +## Download data for offline analysis >>>> + >>>> +Data Commons provides several tools for downloading its data: >>>> + >>>> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >>>> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >>>> + >>>> +## Build machine learning models >>>> + >>>> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >>>> + >>>> +## Contribute data to Data Commons >>>> >>>> Can we point to this page instead, which lists other ways to >>>> contibute: https://docs.datacommons.org/contributing/ >>>> ------------------------------ >>>> >>>> In index.md >>>> <#421 (comment)> >>>> : >>>> >>>> > >>>> -One of the salient aspects of Data Commons is that it is not a repository of data sets. There are many great repositories (Dataverse, BQ public datasets, data.gov) that more than adequately address that need. Instead, it is a single unified database created by normalizing/aligning the schemas and entity references across these different datasets (to the extent possible). So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API. Of course, she would want to know the provenance of the data, which is recorded with every data point, something enabled in the APIs. >>>> +Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on – saving organizations months of tedious, costly and error-prone work. Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. >>>> + >>>> +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081), you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en) for Vietnam: >>>> + >>>> +![Google Search query result](/assets/images/dcoverview1.png){: width="800"} >>>> + >>>> +## A standards-based knowledge graph, schema, and APIs >>>> + >>>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>>> >>>> rather than "makes heavy use of schema.org constructs", data commons >>>> is an extension of schema.org >>>> ------------------------------ >>>> >>>> In index.md >>>> <#421 (comment)> >>>> : >>>> >>>> > + >>>> +## A standards-based knowledge graph, schema, and APIs >>>> + >>>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>>> + >>>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>>> + >>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>> + >>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>> + >>>> + >>>> + >>>> +## An open-source project and website platform >>>> + >>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>> >>>> Can we drop "intended" and just say that it is a community-based >>>> resource? We do have external contributions... >>>> ------------------------------ >>>> >>>> In index.md >>>> <#421 (comment)> >>>> : >>>> >>>> > + >>>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>>> + >>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>> + >>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>> + >>>> + >>>> + >>>> +## An open-source project and website platform >>>> + >>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>> + >>>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>>> + >>>> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >>>> >>>> Javascript --> Web Components >>>> ------------------------------ >>>> >>>> In index.md >>>> <#421 (comment)> >>>> : >>>> >>>> > + >>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>> + >>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>> + >>>> + >>>> + >>>> +## An open-source project and website platform >>>> + >>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>> + >>>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>>> + >>>> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >>>> + >>>> +Finally, Data Commons provides an open-source, [customizable website](/custom_dc/index.html) implementation, for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. >>>> >>>> is "customizable website" the best way to describe custom dc? could we >>>> call it [customizable implementation] >>>> ------------------------------ >>>> >>>> In index.md >>>> <#421 (comment)> >>>> : >>>> >>>> > + >>>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>>> + >>>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>>> + >>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>> + >>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>> + >>>> + >>>> + >>>> +## An open-source project and website platform >>>> + >>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>> + >>>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>>> >>>> how about "data coverage" instead of "available data sources" >>>> ------------------------------ >>>> >>>> In index.md >>>> <#421 (comment)> >>>> : >>>> >>>> > --- >>>> >>>> -# Why Data Commons? >>>> +{: .no_toc} >>>> +# What is Data Commons? >>>> >>>> Should we point to the About page at all, for the Why? >>>> >>>> — >>>> Reply to this email directly, view it on GitHub >>>> <#421 (review)>, >>>> or unsubscribe >>>> <https://github.com/notifications/unsubscribe-auth/BHMM7UGITGPZI56JUXP5233ZIB3ILAVCNFSM6AAAAABJOTB4EWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGE3DOMZRGI> >>>> . >>>> You are receiving this because you authored the thread.Message ID: >>>> <datacommonsorg/docsite/pull/421/review/2126167312 <(212)%20616-7312>@ >>>> github.com> >>>> >>> >> >> -- >> ================================== >> Prem Ramaswami >> Product Manager >> DataCommons.org >> Make Data Sing >> ***@***.*** >> Phone: +18579981598 <(857)%20998-1598> >> ================================== >> >> >> >> -- ================================== Prem Ramaswami Product Manager DataCommons.org Make Data Sing ***@***.*** Phone: +18579981598 <(857)%20998-1598> ==================================

kmoscoe · 2024-06-18T23:14:07Z

Yes. That too 😂

…

On Tue, Jun 18, 2024, 6:56 PM Kara Moscoe ***@***.***> wrote: You mean second person plural, i.e. "we"? On Tue, Jun 18, 2024 at 3:40 PM Prem Ramaswami ***@***.***> wrote: > We shoudl change this to 3rd person and remove the first person. > > On Tue, Jun 18, 2024 at 5:42 PM Kara Moscoe ***@***.***> wrote: > >> BTW, one of the odd things about the "About" page is that there is a >> sentence that says, "I am bullish about...". But there is no indication of >> who "I" is. Should there be some kind of byline about Guha there? >> >> On Tue, Jun 18, 2024 at 1:54 PM Prem Ramaswami ***@***.***> wrote: >> >>> Feel free to point this at Guha once before publishing. >>> >>> On Tue, Jun 18, 2024 at 2:31 PM Kara Moscoe ***@***.***> wrote: >>> >>>> +Prem Ramaswami ***@***.***> >>>> It's been reviewed by Prem, who told me to "just publish it". :-) >>>> Prem, should Guha review? >>>> >>>> On Tue, Jun 18, 2024 at 11:23 AM Carolyn Au ***@***.***> >>>> wrote: >>>> >>>>> ***@***.**** commented on this pull request. >>>>> >>>>> Thanks for sharing this Kara. Has this been reviewed by Prem and Guha? >>>>> ------------------------------ >>>>> >>>>> In how_to_use.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > +nav_order: 1 >>>>> +--- >>>>> + >>>>> +{: .no_toc} >>>>> +# How to use Data Commons >>>>> + >>>>> +* TOC >>>>> +{:toc} >>>>> + >>>>> +## Learn about the data in Data Commons >>>>> + >>>>> +To find out what data is available in Data Commons, check out the [Statistical Variable Explorer](https://datacommons.org/tools/statvar) and see the [Data sources](/datasets/index.html) pages. >>>>> + >>>>> +## Issue interactive data queries >>>>> + >>>>> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >>>>> >>>>> It could be helpful to also link to the homepage. >>>>> >>>>> Also, to me, it reads as if they're also supposed ot use the search >>>>> query bar on the vis tools. Should it be ... or use any of the ... >>>>> or ... or explore using any of the visualization tools ... >>>>> ------------------------------ >>>>> >>>>> In how_to_use.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > + >>>>> +## Issue interactive data queries >>>>> + >>>>> +For quick analysis, use the search query bar on the home page or any of the visualization tools, such as the [Timeline](https://datacommons.org/tools/visualization#visType=timeline), [Scatter](https://datacommons.org/tools/visualization#visType%3Dscatter), and [Map](https://datacommons.org/tools/visualization#visType%3Dmap) explorers. >>>>> + >>>>> +If you want to issue SQL queries, and you have a Google Cloud Platform account, use BigQuery Studio on Data Commons data in [Analytics Hub](https://cloud.google.com/analytics-hub). See the [Data Commons in BigQuery](/bigquery/index.html) page for more details. >>>>> + >>>>> +## Issue programmatic data queries >>>>> + >>>>> +Data Commons publishes REST, Python, Pandas, Google Sheets, and SPARQL [APIs](/api/index.html). The APIs support both low-level exploration of the knowledge graph as well as higher-level statistical analysis of variables. >>>>> + >>>>> +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >>>>> + >>>>> +## Embed data analyses and visualizations in your site >>>>> + >>>>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>>>> >>>>> could we label the link Web Components API (it's not quite javascript >>>>> only...) >>>>> ------------------------------ >>>>> >>>>> In how_to_use.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > +The Python and pandas APIs provide convenient wrappers for calling the APIs; we have developed a set of [Google Colab tutorials](/tutorials/index.html) to help you get started with analysis. We have also developed a [Data science curriculum](/courseware/intro_data_science.html) featuring our API and data, currently in use at MIT. >>>>> + >>>>> +## Embed data analyses and visualizations in your site >>>>> + >>>>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>>>> + >>>>> +## Download data for offline analysis >>>>> + >>>>> +Data Commons provides several tools for downloading its data: >>>>> + >>>>> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >>>>> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >>>>> + >>>>> +## Build machine learning models >>>>> + >>>>> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >>>>> >>>>> We try not to use "data sets".. perhaps just Data Commons provides >>>>> ideal training ... >>>>> ------------------------------ >>>>> >>>>> In how_to_use.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > +## Embed data analyses and visualizations in your site >>>>> + >>>>> +If you would like to include Data Commons result visualizations in your own website, we provide a [Javascript API](/api/web_components/index.html) that makes it a snap to embed various chart elements, such as scatter plots, maps, pie charts, and many more. >>>>> + >>>>> +## Download data for offline analysis >>>>> + >>>>> +Data Commons provides several tools for downloading its data: >>>>> + >>>>> +- To preview and download for selected places and statistical variables, use the standalone [Data Download Tool](https://datacommons.org/tools/download) or click the **Download** link in any of the results pages of the visualization tools. >>>>> +- To load data into Google Sheets for analysis and charting, install and run the Data Commons Google [Sheets add-on](/api/sheets/index.html). >>>>> + >>>>> +## Build machine learning models >>>>> + >>>>> +Data Commons data sets provide ideal training data for developing machine learning models and other data science applications. The [data science tutorials](/courseware/intro_data_science.html) show you how to use our APIs and data to get started. >>>>> + >>>>> +## Contribute data to Data Commons >>>>> >>>>> Can we point to this page instead, which lists other ways to >>>>> contibute: https://docs.datacommons.org/contributing/ >>>>> ------------------------------ >>>>> >>>>> In index.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > >>>>> -One of the salient aspects of Data Commons is that it is not a repository of data sets. There are many great repositories (Dataverse, BQ public datasets, data.gov) that more than adequately address that need. Instead, it is a single unified database created by normalizing/aligning the schemas and entity references across these different datasets (to the extent possible). So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API. Of course, she would want to know the provenance of the data, which is recorded with every data point, something enabled in the APIs. >>>>> +Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on – saving organizations months of tedious, costly and error-prone work. Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. >>>>> + >>>>> +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081), you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en) for Vietnam: >>>>> + >>>>> +![Google Search query result](/assets/images/dcoverview1.png){: width="800"} >>>>> + >>>>> +## A standards-based knowledge graph, schema, and APIs >>>>> + >>>>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>>>> >>>>> rather than "makes heavy use of schema.org constructs", data commons >>>>> is an extension of schema.org >>>>> ------------------------------ >>>>> >>>>> In index.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > + >>>>> +## A standards-based knowledge graph, schema, and APIs >>>>> + >>>>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>>>> + >>>>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>>>> + >>>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>>> + >>>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>>> + >>>>> + >>>>> + >>>>> +## An open-source project and website platform >>>>> + >>>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>>> >>>>> Can we drop "intended" and just say that it is a community-based >>>>> resource? We do have external contributions... >>>>> ------------------------------ >>>>> >>>>> In index.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > + >>>>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>>>> + >>>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>>> + >>>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>>> + >>>>> + >>>>> + >>>>> +## An open-source project and website platform >>>>> + >>>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>>> + >>>>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>>>> + >>>>> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >>>>> >>>>> Javascript --> Web Components >>>>> ------------------------------ >>>>> >>>>> In index.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > + >>>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>>> + >>>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>>> + >>>>> + >>>>> + >>>>> +## An open-source project and website platform >>>>> + >>>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>>> + >>>>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>>>> + >>>>> +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Javascript](/api/web_components/index.html) APIs. >>>>> + >>>>> +Finally, Data Commons provides an open-source, [customizable website](/custom_dc/index.html) implementation, for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. >>>>> >>>>> is "customizable website" the best way to describe custom dc? could >>>>> we call it [customizable implementation] >>>>> ------------------------------ >>>>> >>>>> In index.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > + >>>>> +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/) consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org) framework, an open framework used by over 40M websites; Data Commons makes heavy use of [Schema.org](https://www.schema.org/docs/schemas.html) constructs and extends the model as required, introducing both general constructs (such as intervals) and values for common properties. >>>>> + >>>>> +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/) allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). >>>>> + >>>>> +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar) allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/visualization) provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. >>>>> + >>>>> +The knowledge graph is also mapped to relational tables that allow for [SQL querying](https://docs.datacommons.org/bigquery/) (requiring a [Google Cloud BigQuery](https://cloud.google.com/bigquery) account) and easier joining to other datasets outside of Data Commons. >>>>> + >>>>> + >>>>> + >>>>> +## An open-source project and website platform >>>>> + >>>>> +Data Commons is intended to be a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg). >>>>> + >>>>> +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs), the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664), [One.org](https://datacommons.one.org/), [TechSoup](https://publicdata.techsoup.org/), and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand the available data sources and welcome contributions from data owners around the world. >>>>> >>>>> how about "data coverage" instead of "available data sources" >>>>> ------------------------------ >>>>> >>>>> In index.md >>>>> <#421 (comment)> >>>>> : >>>>> >>>>> > --- >>>>> >>>>> -# Why Data Commons? >>>>> +{: .no_toc} >>>>> +# What is Data Commons? >>>>> >>>>> Should we point to the About page at all, for the Why? >>>>> >>>>> — >>>>> Reply to this email directly, view it on GitHub >>>>> <#421 (review)>, >>>>> or unsubscribe >>>>> <https://github.com/notifications/unsubscribe-auth/BHMM7UGITGPZI56JUXP5233ZIB3ILAVCNFSM6AAAAABJOTB4EWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGE3DOMZRGI> >>>>> . >>>>> You are receiving this because you authored the thread.Message ID: >>>>> <datacommonsorg/docsite/pull/421/review/2126167312 <(212)%20616-7312> >>>>> @github.com> >>>>> >>>> >>> >>> -- >>> ================================== >>> Prem Ramaswami >>> Product Manager >>> DataCommons.org >>> Make Data Sing >>> ***@***.*** >>> Phone: +18579981598 <(857)%20998-1598> >>> ================================== >>> >>> >>> >>> > > -- > ================================== > Prem Ramaswami > Product Manager > DataCommons.org > Make Data Sing > ***@***.*** > Phone: +18579981598 <(857)%20998-1598> > ================================== > > > >

beets · 2024-06-19T00:35:32Z

BTW, one of the odd things about the "About" page is that there is a sentence that says, "I am bullish about...". But there is no indication of who "I" is. Should there be some kind of byline about Guha there?

oh my - we should drop that sentence

beets

thanks for the updates!

how_to_use.md

index.md

Co-authored-by: Carolyn Au <[email protected]>

new overview docs

6b65016

kmoscoe requested review from shifucun and beets June 17, 2024 19:22

beets reviewed Jun 18, 2024

View reviewed changes

kmoscoe and others added 5 commits June 18, 2024 11:48

incorporated feedback from Carolyn

a49adbf

Merge branch 'master' into overview

1ed53c5

slight reorganization of second intro paragraph

fff59cd

Fix anchor

029e46f

Merge branch 'overview' of https://github.com/kmoscoe/docsite into ov…

d4dad05

…erview

beets reviewed Jun 18, 2024

View reviewed changes

how_to_use.md Show resolved Hide resolved

index.md Show resolved Hide resolved

more feedback from Carolyn

fe5e933

kmoscoe requested a review from rvguha June 18, 2024 21:46

beets approved these changes Jun 19, 2024

View reviewed changes

how_to_use.md Outdated Show resolved Hide resolved

index.md Show resolved Hide resolved

kmoscoe and others added 2 commits June 18, 2024 18:34

Update how_to_use.md

55bf9e3

Co-authored-by: Carolyn Au <[email protected]>

Merge branch 'master' into overview

9d44174

kmoscoe merged commit 1fbc915 into datacommonsorg:master Jun 26, 2024
1 check passed

kmoscoe deleted the overview branch June 26, 2024 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new overview docs #421

new overview docs #421

kmoscoe commented Jun 17, 2024

kmoscoe commented Jun 17, 2024

beets left a comment

kmoscoe commented Jun 18, 2024 via email

beets left a comment

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

beets commented Jun 19, 2024

beets left a comment

new overview docs #421

new overview docs #421

Conversation

kmoscoe commented Jun 17, 2024

kmoscoe commented Jun 17, 2024

beets left a comment

Choose a reason for hiding this comment

kmoscoe commented Jun 18, 2024 via email

beets left a comment

Choose a reason for hiding this comment

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

kmoscoe commented Jun 18, 2024 via email

beets commented Jun 19, 2024

beets left a comment

Choose a reason for hiding this comment