Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update custom DC docs for single env.list file #456

Merged
merged 22 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions custom_dc/build_image.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ If you want to pick up the latest prebuilt version, do the following:
docker run -it \
-p 8080:8080 \
-e DEBUG=true \
- -env-file $PWD/custom_dc/sqlite_env.list \
- v $PWD/custom_dc/sample:/userdata \
-env-file $PWD/custom_dc/env.list \
-v $PWD/custom_dc/sample:/$PWD/custom_dc/sample \
gcr.io/datcom-ci/datacommons-website-compose:latest
```

Expand Down Expand Up @@ -141,14 +141,14 @@ To upload and deploy the container to the Cloud, see [Deploy a custom instance t

## Run the container with the local SQLite database

To start the services using the locally built repo. If you have made changes to any of the UI components, be sure to map the `custom` directories to the Docker `workspace` directory.
Start the services using the locally built repo. If you have made changes to any of the UI components (or directories), be sure to map the `custom` directories (or alternative directories) to the Docker `workspace` directory.

<pre>
docker run -it \
--env-file $PWD/custom_dc/sqlite_env.list \
--env-file $PWD/custom_dc/env.list \
-p 8080:8080 \
-e DEBUG=true \
[-v $PWD/custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>:/userdata \]
-v <var>OUTPUT_DIRECTORY</var>:/<var>OUTPUT_DIRECTORY</var> \
kmoscoe marked this conversation as resolved.
Show resolved Hide resolved
[-v $PWD/server/templates/custom_dc/custom:/workspace/server/templates/custom_dc/custom \]
[-v $PWD/static/custom_dc/custom:/workspace/static/custom_dc/custom \]
datacommons-website-compose:<var>DOCKER_TAG</var>
Expand Down
22 changes: 12 additions & 10 deletions custom_dc/custom_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Custom Data Commons provides a simple mechanism to import your own data, but it

- All data must be in CSV format, using the schema described below.
- You must also provide a JSON configuration file, named `config.json`, to map the CSV contents to the Data Commons schema knowledge graph. The contents of the JSON file are described below.
- All CSV files and the JSON file must be in the same directory
- All CSV files and the JSON file _must_ be in the same directory

Examples are provided in [`custom_dc/sample`](https://github.com/datacommonsorg/website/tree/master/custom_dc/sample) and [`custom_dc/examples`](https://github.com/datacommonsorg/website/tree/master/custom_dc/examples) directories.

Expand Down Expand Up @@ -242,22 +242,24 @@ The `sources` section is optional. It encodes the sources and provenances associ

To load custom data uploaded to Google Cloud, see instead [Pointing the local Data Commons site to the Cloud data](/custom_dc/data_cloud.html) for procedures.

### Configure custom directories

If you are using a directory other than `custom_dc/sample` to store your CSV files, edit the `env.list` file as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would need to update the file for the sample as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for catching.

- Set the `OUTPUT_DIR` variable to the directory where your input files are stored. The load step will create a `datacommons` subdirectory under this directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a short note to explain this dichotomy. Something along the lines of a separate utility for managing data coming soon where input and output dirs can be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, let's not bother. Too much to explain now.


### Start the Docker container with local custom data {#docker-data}

Once you have your CSV files and config.json set up, use the following command to restart the Docker container, mapping your custom data directory to the Docker userdata directory.
Once you have configured everything, use the following command to restart the Docker container, mapping your output directory to the same path in Docker:

<pre>
docker run -it \
-p 8080:8080 \
-e DEBUG=true \
--env-file $PWD/custom_dc/sqlite_env.list \
-v $PWD/custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>:/userdata \
[-v $PWD/custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>/datacommons:/sqlite] \
--env-file $PWD/custom_dc/env.list \ \
-v <var>OUTPUT_DIRECTORY</var>:/<var>OUTPUT_DIRECTORY</var> \
kmoscoe marked this conversation as resolved.
Show resolved Hide resolved
gcr.io/datcom-ci/datacommons-website-compose:stable
</pre>

The optional `-v` flag preserves the SQLite data so it loads automatically when you restart the Docker container.

Every time you make changes to the CSV or JSON files, you should reload the data, as described below.

## Load custom data in SQLite
Expand All @@ -266,9 +268,9 @@ As you are iterating on changes to the source CSV and JSON files, you will need

You can load the new/updated data from SQLite using the `/admin` page on the site:

1. Optionally, in the `sqlite_env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data.
1. Optionally, in the `env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data.
1. Start the Docker container as usual, being sure to map the path to the directory containing the custom data (see command above).
1. With the services running, navigate to the `/admin page`. If a secret is required, enter it in the text field, and click **Load**. This runs a script inside the Docker container, that converts the CSV data into SQL tables, and generates embeddings in the container as well. The database is created as <code>custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>/datacommons/datacommons.db</code> and embeddings are generated in <code>custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>/datacommons/nl/</code>.
1. With the services running, navigate to the `/admin` page. If a secret is required, enter it in the text field, and click **Load**. This runs a script inside the Docker container, that converts the CSV data into SQL tables, and generates embeddings in the container as well. The database is created as <code><var>OUTPUT_DIRECTORY</var>/datacommons/datacommons.db</code> and embeddings are generated in <code><var>OUTPUT_DIRECTORY</var>/datacommons/nl/</code>.

## Inspect the SQLite database

Expand All @@ -277,7 +279,7 @@ If you need to troubleshoot custom data, it is helpful to inspect the contents o
To do so, from a terminal window, open the database:

<pre>
sqlite3 website/custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>/datacommons/datacommons.db
sqlite3 <var>WEBSITE_ROOT</var>/<var>OUTPUT_DIRECTORY</var>/datacommons/datacommons.db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output directory need not be under the website root. So the path can just be <var>OUTPUT_DIRECTORY</var>/datacommons/datacommons.db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, OK, thanks

</pre>

This starts the interactive SQLite shell. To view a list of tables, at the prompt type `.tables`. The relevant table is `observations`.
Expand Down
2 changes: 1 addition & 1 deletion custom_dc/custom_ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ HTML and CSS customization files are provided as samples to get you started. The
</tbody>
</table>

Note that the `custom` parent directory is customizable as the `FLASK_ENV` environment variable. You can rename the directory as desired and update the environment variable.
Note that the `custom` parent directory is customizable as the `FLASK_ENV` environment variable. You can rename the directory as desired and update the environment variable in `custom_dc/env.list`.

To enable the changes to be picked up by the Docker image, and allow you to refresh the browser for further changes, restart the Docker image with this additional flag to map the directories to the Docker workspace:

Expand Down
27 changes: 15 additions & 12 deletions custom_dc/data_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Once you have tested locally, you need to get your data into Google Cloud so you

You will upload your CSV and JSON files to [Google Cloud Storage](https://cloud.google.com/storage), and the custom Data Commons importer will transform, store, and query the data in a [Google Cloud SQL](https://cloud.google.com/sql) database.


## Prerequisites

- A [GCP](https://console.cloud.google.com/welcome) billing account and project.
Expand Down Expand Up @@ -57,22 +58,22 @@ While you are testing, you can start with a single Google Cloud region; to be cl
1. For the **Location type**, choose the same regional options as for Cloud SQL above.
1. When you have finished setting all the configuration options, click **Create**.
1. In the **Bucket Details** page, click **Create Folder** to create a new folder to hold your data.
1. Name the folder as desired. Record the folder path as <code>gs://<var>BUCKET_NAME</var>/<var>FOLDER_PATH</var></code> for setting environment variables below. You can start with the sample data provided under `custom_dc/sample` and update to your own data later.
1. Name the folder as desired. Record the folder path as <code>gs://<var>BUCKET_NAME</var>/<var>FOLDER_PATH</var></code> for setting the `OUTPUT_DIR` environment variable below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the deprecated GCS_DATA_PATH variable for the benefit of partners using it right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's in mentioned in the file.


### Set up environment variables
### Set environment variables

1. Using your favorite editor, open `custom_dc/cloudsql_env.list`.
1. Enter the relevant values for `DC_API_KEY` and `MAPS_API_KEY`.
1. Using your favorite editor, open `custom_dc/env.list`.
1. Set `USE_SQLITE=false` and `USE_CLOUDSQL=true`
1. Set values for all of the following:

- `GCS_DATA_PATH`
- `CLOUDSQL_INSTANCE`
- `GOOGLE_CLOUD_PROJECT`
- `DB_NAME`
- `DB_USER`
- `DB_PASS`
- `OUTPUT_DIR`

See comments in the [`cloudsql_env.list`](https://github.com/datacommonsorg/website/blob/master/custom_dc/cloudsql_env.list) file for the correct format for each option.
See comments in the [`env.list`](https://github.com/datacommonsorg/website/blob/master/custom_dc/env.list) file for the correct format for each option.

1. Optionally, set an `ADMIN_SECRET` to use when loading the data through the `/admin` page later.

Expand Down Expand Up @@ -117,23 +118,25 @@ If you are prompted to install the Cloud Resource Manager API, press `y` to acce

If you have not made changes that require a local build, and just want to run the pre-downloaded image, from your repository root, run:

```shell
<pre>
docker run -it \
--env-file $PWD/custom_dc/cloudsql_env.list \
--env-file $PWD/custom_dc/env.list \
-v <var>OUTPUT_DIRECTORY</var>:/<var>OUTPUT_DIRECTORY</var> \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GCS paths, there is nothing to mount. So you can remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

-p 8080:8080 \
-e DEBUG=true \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
gcr.io/datcom-ci/datacommons-website-compose:stable
```
</pre>

#### Run with a locally built repo

If you have made local changes and have a [locally built repo](/custom_dc/manage_repo.html#build-repo), from the root of the repository, run the following:
If you have made local changes and have a [locally built repo](/custom_dc/build_image.html#build-repo), from the root of the repository, run the following:

<pre>
docker run -it \
--env-file $PWD/custom_dc/cloudsql_env.list \
--env-file $PWD/custom_dc/env.list \
-v <var>OUTPUT_DIRECTORY</var>:/<var>OUTPUT_DIRECTORY</var> \
-p 8080:8080 \
-e DEBUG=true \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
Expand All @@ -149,7 +152,7 @@ Each time you upload new versions of the source CSV and JSON files, you need to

You can load the new/updated data from Cloud Storage using the `/admin` page on the site:

1. Optionally, in the `cloudsql_env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data.
1. Optionally, in the `env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data.
1. Start the Docker container as described above.
1. With the services running, navigate to the `/admin` page. If a secret is required, enter it in the text field, and click **Load**.
This runs a script inside the Docker container, that converts the CSV data in Cloud Storage into SQL tables, and stores them in the Cloud SQL database you created earlier. It also generates embeddings in the Google Cloud Storage folder into which you uploaded the CSV/JSON files, in a `datacommons/nl/` subfolder.
Expand Down
4 changes: 2 additions & 2 deletions custom_dc/deploy_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ When it completes, verify that the container has been uploaded in the Cloud Cons

To deploy the image in Google Cloud Run, you need to create a Run service. Because we need to set the environment variables for the running image as options in the service, it is actually more convenient to create the service and deploy the image at the same time using `gcloud` rather than the Cloud Console. Once the service is created in this way, you can edit it and redeploy using the Console.

1. Copy the settings from the cloudsql_env.list file to a local environment variable:
1. Copy the settings from the `env.list` file to a local environment variable:

```shell
env_vars=$(awk -F '=' 'NF==2 {print $1"="$2}' custom_dc/cloudsql_env.list | tr '\n' ',' | sed 's/,$//')
env_vars=$(awk -F '=' 'NF==2 {print $1"="$2}' custom_dc/env.list | tr '\n' ',' | sed 's/,$//')
```

1. Create a new Cloud Run service and deploy the image in the Artifact Registry. `
Expand Down
2 changes: 1 addition & 1 deletion custom_dc/launch_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The following is a sample configuration that you can tune as needed. For additio
**Step 2: Set the environment variable**

1. When the Redis instance is created above, go to the **Instances > Redis** tab, look up your instance and note the **Primary Endpoint** IP address.
1. In `custom_dc/cloudsql_env.list`, set the value of the `REDIS_HOST` option to the IP address.
1. In `custom_dc/env.list`, set the value of the `REDIS_HOST` option to the IP address.

**Step 3: Create the VPC connector**

Expand Down
25 changes: 11 additions & 14 deletions custom_dc/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,16 @@ When the downloads are complete, navigate to the root directory of the repo (e.g
cd website | <var>DIRECTORY</var>
</pre>

### Set API keys as environment variables
### Set environment variables

1. Using your favorite editor, open `custom_dc/sqlite_env.list`.
1. Using your favorite editor, open `custom_dc/env.list`.
1. Enter the relevant values for `DC_API_KEY` and `MAPS_API_KEY`.
1. For the `OUTPUT_DIR`, set it to `$PWD/custom_dc/sample/`.
1. Leave `ADMIN_SECRET` blank for now.

Warning: Do not use any quotes (single or double) or spaces when specifying the values.

Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables files containing secrets. Instead, store them locally only. If you are using git/Github, you can add the file to the `.gitignore` file.
Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables file containing secrets. Instead, store it locally only. If you are using git/Github, you can add the file name to the `.gitignore` file.

## About the downloaded files

Expand All @@ -85,7 +86,7 @@ Note: If you are storing your source code in a public/open-source version contro
<tbody>
<tr>
<td width="300"><a href="https://github.com/datacommonsorg/website/tree/master/custom_dc/sample"><code>custom_dc/sample/</code></a></td>
<td>Sample supplemental data that is added to the base data in Data Commons. This page shows you how to easily load and view this data. The data is in CSV format and mapped to Data Commons entity definitions using the config.json file. </td>
<td>Sample supplemental data that is added to the base data in Data Commons. This page shows you how to easily load and view this data. The data is in CSV format and mapped to Data Commons entity definitions using the `config.json` file. </td>
</tr>
<tr>
<td><a href="https://github.com/datacommonsorg/website/tree/master/custom_dc/examples"><code>custom_dc/examples/</code></a></td>
Expand All @@ -100,12 +101,8 @@ Note: If you are storing your source code in a public/open-source version contro
<td>Contains customizable CSS file and default logo. To modify the styles or replace the logo, see <a href="custom_ui.html#styles">Customize Javascript and styles</a>.</td>
</tr>
<tr>
<td><a href="https://github.com/datacommonsorg/website/blob/master/custom_dc/sqlite_env.list"><code>custom_dc/sqlite_env.list</code></a></td>
<td>Contains environment variables for a development environment using SQLite as the database. For details of the variables, see the comments in the file.</td>
</tr>
<tr>
<td><a href="https://github.com/datacommonsorg/website/blob/master/custom_dc/cloudsql_env.list"><code>custom_dc/cloudsql_env.list</code></a></td>
<td>Contains environment variables for a development or production environment using Cloud SQL as the database. For details of the variables, see the comments in the file.</td>
<td><a href="https://github.com/datacommonsorg/website/blob/master/custom_dc/env.list"><code>custom_dc/env.list</code></a></td>
<td>Contains environment variables for the Data Commons services. For details of the variables, see the comments in the file.</td>
</tr>
</tbody>
</table>
Expand All @@ -119,19 +116,19 @@ Note: If you are storing your source code in a public/open-source version contro
docker run -it \
-p 8080:8080 \
-e DEBUG=true \
--env-file $PWD/custom_dc/sqlite_env.list \
-v $PWD/custom_dc/sample:/userdata \
--env-file $PWD/custom_dc/env.list \
-v $PWD/custom_dc/sample:/$PWD/custom_dc/sample \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the slash after the colon.

Can you go through all docker run commands in the doc to make sure both sides of the colon are the same exact paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

gcr.io/datcom-ci/datacommons-website-compose:stable
```

Note: If you are running on Linux, depending on whether you have created a ["sudoless" Docker group](https://docs.docker.com/engine/install/linux-postinstall/), you will need to preface every `docker` invocation with `sudo`.
Note: If you are running on Linux, depending on whether you have created a ["sudoless" Docker group](https://docs.docker.com/engine/install/linux-postinstall/), you may need to preface every `docker` invocation with `sudo`.

This command does the following:

- The first time you run it, downloads the latest stable Data Commons image, `gcr.io/datcom-ci/datacommons-website-compose:stable`, from the Google Cloud Artifact Registry, which may take a few minutes. Subsequent runs use the locally stored image.
- Starts a Docker container in interactive mode.
- Starts development/debug versions of the Web Server, NL Server, and Mixer, as well as the Nginx proxy, inside the container
- Maps the sample data to the Docker path `/userdata`, so the servers do not need to be restarted when you load the sample data
- Maps the sample data to a Docker path, so the servers do not need to be restarted when you load the sample data

### Stop and restart the services

Expand Down
2 changes: 1 addition & 1 deletion custom_dc/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ There is a problem with how you have set up your CSV files and/or config.json fi

If the load page does not show any errors but data still does not load, try checking the following:

1. In the `_env.list` file you are using, check that you are not using single or double quotes around any of the values.
1. In the `env.list` file, check that you are not using single or double quotes around any of the values.
1. Check your Docker command line for invalid arguments. Often Docker won't give any error messages but failures will show up at runtime.

## Website display problems
Expand Down