Skip to content

Commit

Permalink
Merge some additional changes from other branch (#454)
Browse files Browse the repository at this point in the history
* integrate custom docs with new UI

* more edits

* use website wording for intro

* fix numbering in table

* rename and some edits

* rename manage_repo file, per Bo

* Merge.
  • Loading branch information
kmoscoe authored Jul 22, 2024
1 parent 22b1433 commit ef83b96
Show file tree
Hide file tree
Showing 10 changed files with 64 additions and 49 deletions.
2 changes: 1 addition & 1 deletion custom_dc/manage_repo.md → custom_dc/build_image.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Build and run a custom image
nav_order: 5
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
Expand Down
8 changes: 7 additions & 1 deletion custom_dc/custom_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,21 @@
layout: default
title: Work with custom data
nav_order: 3
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
# Work with custom data

This page shows you how to format and load your own custom data into your local instance. This is step 2 of the [recommended workflow](/custom_dc/index.html#workflow).


* TOC
{:toc}


## Overview

Custom Data Commons provides a simple mechanism to import your own data, but it requires that the data be provided in a specific format and file structure.

- All data must be in CSV format, using the schema described below.
Expand Down
8 changes: 6 additions & 2 deletions custom_dc/custom_ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@
layout: default
title: Customize the site
nav_order: 4
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
# Customize the site

This page shows you how to customize the UI of your local instance. This is step 3 of the [recommended workflow](/custom_dc/index.html#workflow).

* TOC
{:toc}

## Overview

The default custom Data Commons image provides a bare-bones UI that you will undoubtedly want to customize to your liking. Data Commons uses the Python [Flask](https://flask.palletsprojects.com/en/3.0.x/#) web framework and [Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/) HTML templates. If you're not familiar with these, the following documents are good starting points:

- [Flask Templates](https://flask.palletsprojects.com/en/3.0.x/tutorial/templates/)
Expand Down Expand Up @@ -80,4 +84,4 @@ Alternatively, if you have existing existing CSS and Javascript files, put them

See [`server/templates/custom_dc/custom/new.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/new.html) as an example.

Note: Currently, making changes to any of the files in the `static/` directory, even if you're testing locally, requires that you rebuild a local version of the repo to pick up the changes, as described in [Build a local image](/custom_dc/manage_repo.html#build-repo). We plan to fix this in the near future.
Note: Currently, making changes to any of the files in the `static/` directory, even if you're testing locally, requires that you rebuild a local version of the repo to pick up the changes, as described in [Build a local image](/custom_dc/build_image.html#build-repo). We plan to fix this in the near future.
6 changes: 5 additions & 1 deletion custom_dc/data_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@
layout: default
title: Test data in Google Cloud
nav_order: 6
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
# Test data in Google Cloud

This page shows you how to store your custom data in Google Cloud database and load it into a local instance. This is step 4 of the [recommended workflow](/custom_dc/index.html#workflow).

* TOC
{:toc}

## Overview

Once you have tested locally, you need to get your data into Google Cloud so you can test it remotely. You can continue to run the custom Data Commons instance locally, but retrieve data from the Cloud. In this scenario, the system is set up like this:

![setup3](/assets/images/custom_dc/customdc_setup3.png)
Expand Down
8 changes: 6 additions & 2 deletions custom_dc/deploy_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@
layout: default
title: Deploy a custom instance to Google Cloud
nav_order: 7
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
# Deploy a custom instance to Google Cloud

This page shows you how to create an artifact and run it in Google Cloud Run. This is step 5 of the [recommended workflow](/custom_dc/index.html#workflow).

* TOC
{:toc}

## System overview

When you are ready to launch your custom Data Commons site, we recommend hosting your site in [Google Cloud Run](https://cloud.google.com/run/), which is a serverless solution that is by far the simplest and least expensive option, providing auto-scaling. This is the production setup:

![setup4](/assets/images/custom_dc/customdc_setup4.png)
Expand All @@ -31,7 +35,7 @@ You push a locally built Docker image to the [Google Cloud Artifact Registry](ht

This procedure creates a "dev" Docker package that you upload to the Google Cloud Artifact Registry, and then deploy to Google Cloud Run.

1. Build a local version of the Docker image, following the procedure in [Build a local image](/custom_dc/manage_repo.html#build-repo).
1. Build a local version of the Docker image, following the procedure in [Build a local image](/custom_dc/build_image.html#build-repo).
1. Authenticate to gcloud:

```shell
Expand Down
2 changes: 1 addition & 1 deletion custom_dc/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Frequently asked questions
nav_order: 9
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
Expand Down
53 changes: 22 additions & 31 deletions custom_dc/index.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,23 @@
---
layout: default
title: Custom Data Commons
title: Build your own Data Commons
nav_order: 90
has_children: true
---

{:.no_toc}
# Custom Data Commons
# Build your own Data Commons

* TOC
{:toc}

## Overview

Data Commons is an open source platform. Any organization can create a custom Data Commons instance with its own data, customized user interface and visualization tools.
A custom instance natively joins your data and the base Data Commons data (from datacommons.org) in a unified fashion. Your users can visualize and analyze the data seamlessly without the need for further data preparation.

A custom instance natively combines the base Data Commons data (from datacommons.org) and the custom data in a unified fashion. Users can generate visualizations and perform data analyses across base and custom datasets seamlessly.
You have full control over your own data and computing resources, with the ability to limit access to specific individuals or open it to the general public.

A custom Data Commons site is deployed in Google Cloud Platform (GCP). The owner has full control over data, computing resources and access. The site can be accessible by the general public or can be controlled to limited principals. When base data is joined with the custom instance data, it is pulled in from the base Data Commons site; custom data is never pushed to the base data store.

## Case studies

### Feeding America Data Commons

[Feeding America Data Commons](https://datacommons.feedingamerica.org/) provides access to data from [Map the Meal Gap](https://map.feedingamerica.org/), overlaid with data from a wide range of additional sources into a single portal under a common scheme. Combining datasets from the CDC and Map the Meal gap, the relationship between heart health and food insecurity can be retrieved with a few clicks.

![fa](/assets/images/custom_dc/home-heart-food.png){: height="450" }

### India Data Commons

[India Data Commons](https://datacommons.iitm.ac.in/) is an effort by Robert Bosch Center for Data Science and Artificial Intelligence, IIT Madras, to highlight India-specific data. India Data Commons features datasets published by Indian Ministries and governmental organizations and provides it through the Data Commons knowledge graph.

![iitm](/assets/images/custom_dc/iitm.png){: height="450" }
Note that each new Data Commons is deployed using the Google Cloud Platform (GCP).

## Why use a custom Data Commons instance?

Expand All @@ -49,19 +35,23 @@ For the following use cases, a custom Data Commons instance is not necessary:
- You only want to make your own data available to the base public Data Commons site and don't need to test it. In this case, see the procedures in [Data imports](/import_dataset/index.html).
- You want to make the base public data or visualizations available in your own site. For this purpose, you can call the Data Commons APIs from your site; see [Data Commons web components](/api/web_components/index.html) for more details.

## Supported features

A custom Data Commons instance supports the following features:

- All of the same interactive tools as the base site, including the natural language query interface
- REST APIs --- no additional setup neeeded
- Python and Pandas API wrappers, and/or Spreadsheets --- requires additional setup and maintenance. If you would like to support these facilities, please contact us.
- Access controls to the site, using any supported Google Cloud Run mechanisms, such as Virtual Private Cloud, Cloud IAM, and so on. Please see the GCP [Restricting ingress for Cloud Run](https://cloud.google.com/run/docs/securing/ingress) for more information on these options.
## Comparison between base and custom Data Commons

The following are not supported:
| Feature | Base Data Commons | Custom Data Commons |
|--------------------------------------------------------------|--------------------|---------------------|
| Interactive tools (Exploration tools, Statistical Variable Explorer, etc.) | yes | yes |
| Natural language query interface | yes, using open-source models only<sup>1</sup> | yes, using Google AI technologies and models |
| REST APIs | yes | yes, no additional setup needed |
| Python and Pandas API wappers | yes | yes, but requires additional setup<sup>2</sup> |
| Bigquery interface | yes | no
| Google Spreadsheets | yes | yes, but requires additional setup<sup>2</sup> |
| Site ccess controls | yes, using any supported Cloud Run mechanisms<sup>3</sup> | n/a |
| Fine-grained data access controls<sup>4</sup> | no | n/a |

- Fine-grained data access controls; you cannot set access controls on specific data, only the entire custom site.
- Bigquery APIs
1. Open-source Python ML library, Sentence Transformers model, from [https://huggingface.co/sentence-transformers](https://huggingface.co/sentence-transformers).
1. If you would like to support these facilities, please contact us.
1. For example, Virtual Private Cloud, Cloud IAM, and so on. Please see the GCP [Restricting ingress for Cloud Run](https://cloud.google.com/run/docs/securing/ingress) for more information on these options.
1. You cannot set access controls on specific data, only the entire custom site.

## System overview

Expand Down Expand Up @@ -94,9 +84,10 @@ In terms of development time and effort, to launch a site with custom data in co

The cost of running a site on Google Cloud Platform depends on the size of your data, the traffic you expect to receive, and the amount of geographical replication you want. For a small dataset, we have found the cost comes out to roughly $100 per year. You can get more precise information and cost estimation tools at [Google Cloud pricing](https://cloud.google.com/pricing).

{: #workflow}
## Recommended workflow

1. Work through the [Quickstart](/custom_dc/quickstart.html) page to learn how to run a local Data Commons instance and load some sample custom data.
1. Work through the [Get started](/custom_dc/quickstart.html) page to learn how to run a local Data Commons instance and load some sample custom data.
1. Prepare your real-world custom data and load it in the local custom instance. Data Commons requires your data to be in a specific format. See [Work with custom data](/custom_dc/custom_data.html). If you are just testing custom data to add to the base Data Commons site, you don't need to go any further.
1. If you are launching your own Data Commons site, and want to customize the look of the feel of the site, see [Customize the site](/custom_dc/custom_ui.html).
1. If you are launching your own Data Commons site, upload your data to Google Cloud Platform and continue to use the local instance to test and validate the site. We recommend using Google Cloud Storage to store your data, and Google Cloud SQL to receive SQL queries from the local servers. See [Test data in Google Cloud](/custom_dc/data_cloud.html).
Expand Down
4 changes: 3 additions & 1 deletion custom_dc/launch_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Launch a custom site
nav_order: 8
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
Expand All @@ -11,6 +11,8 @@ parent: Custom Data Commons
* TOC
{:toc}

## Overview

When you are ready to launch your site to external traffic, there are many tasks you will need to perform, including:

- Configure your Cloud Service to serve external traffic, over SSL. GCP offers many options for this; see [Mapping custom domains](https://cloud.google.com/run/docs/mapping-custom-domains).
Expand Down
18 changes: 11 additions & 7 deletions custom_dc/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
---
layout: default
title: Quickstart
title: Get started
nav_order: 2
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
# Quickstart
# Get started

This page shows you how to run a local custom Data Commons instance inside a Docker container and load sample custom data from a local SQLite database. A custom Data Commons instance uses code from the public open-source repo, available at [https://github.com/datacommonsorg/](https://github.com/datacommonsorg/).

This is step 1 of the [recommended workflow](/custom_dc/index.html#workflow).

* TOC
{:toc}

To start developing a custom Data Commons instance, we recommend that you develop your site and host your data locally. This uses a SQLite database to store custom data.
## System overview

![setup2](/assets/images/custom_dc/customdc_setup2.png)
The instructions in this page use the following setup:

This page shows you how to run a local custom Data Commons instance inside a Docker container, load sample custom data, and enable natural querying. A custom Data Commons instance uses code from the public open-source repo, available at [https://github.com/datacommonsorg/](https://github.com/datacommonsorg/).
![setup2](/assets/images/custom_dc/customdc_setup2.png)

## Prerequisites

Expand Down Expand Up @@ -67,7 +71,7 @@ cd website | <var>DIRECTORY</var>

Warning: Do not use any quotes (single or double) or spaces when specifying the values.

Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables files containing secrets. Instead, store them locally only. If you are using Git/Github to manage your code, you can add the file name to the `.gitignore` file.
Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables files containing secrets. Instead, store them locally only. If you are using git/Github, you can add the file to the `.gitignore` file.

## About the downloaded files

Expand Down
4 changes: 2 additions & 2 deletions custom_dc/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Troubleshooting
nav_order: 8
parent: Custom Data Commons
parent: Build your own Data Commons
---

{:.no_toc}
Expand Down Expand Up @@ -53,7 +53,7 @@ If you are building a local instance and get this error:
Step 7/62 : COPY mixer/go.mod mixer/go.sum ./
COPY failed: file not found in build context or excluded by .dockerignore: stat mixer/go.mod: file does not exist
```
You need to download/update additional submodules (derived from other repos). See [Build a local image](/custom_dc/manage_repo.html#build-repo).
You need to download/update additional submodules (derived from other repos). See [Build a local image](/custom_dc/build_image.html#build-repo).

## Data loading problems

Expand Down

0 comments on commit ef83b96

Please sign in to comment.