From 5aefa1c14fac92680f63bb886731da6fcc9cf901 Mon Sep 17 00:00:00 2001 From: kmoscoe <165203920+kmoscoe@users.noreply.github.com> Date: Wed, 24 Jul 2024 17:34:22 -0700 Subject: [PATCH] Update custom DC docs for single env.list file (#456) * integrate custom docs with new UI * more edits * use website wording for intro * fix numbering in table * rename and some edits * rename manage_repo file, per Bo * Merge. * formatting edits * updates per Keyur's feedback * Fix typos * fix nav order * fix link to API key request form * update form link * update key request form and output dir env var --- custom_dc/build_image.md | 22 +++++++++++----------- custom_dc/custom_data.md | 22 ++++++++++++---------- custom_dc/custom_ui.md | 2 +- custom_dc/data_cloud.md | 25 +++++++++++++------------ custom_dc/deploy_cloud.md | 6 +++--- custom_dc/launch_cloud.md | 2 +- custom_dc/quickstart.md | 27 ++++++++++++--------------- custom_dc/troubleshooting.md | 4 ++-- 8 files changed, 55 insertions(+), 55 deletions(-) diff --git a/custom_dc/build_image.md b/custom_dc/build_image.md index a8dfff82f..1a0130b7b 100644 --- a/custom_dc/build_image.md +++ b/custom_dc/build_image.md @@ -30,14 +30,14 @@ If you want to pick up the latest prebuilt version, do the following: ``` 1. Rerun the container, specifying that repo as the argument to the `docker run` command: - ```shell - docker run -it \ - -p 8080:8080 \ - -e DEBUG=true \ -- -env-file $PWD/custom_dc/sqlite_env.list \ -- v $PWD/custom_dc/sample:/userdata \ - gcr.io/datcom-ci/datacommons-website-compose:latest - ``` +```shell +docker run -it \ +-p 8080:8080 \ +-e DEBUG=true \ +-env-file $PWD/custom_dc/env.list \ +-v $PWD/custom_dc/sample:$PWD/custom_dc/sample \ +gcr.io/datcom-ci/datacommons-website-compose:latest +``` ## Build a local image {#build-repo} @@ -141,14 +141,14 @@ To upload and deploy the container to the Cloud, see [Deploy a custom instance t ## Run the container with the local SQLite database -To start the services using the locally built repo. If you have made changes to any of the UI components, be sure to map the `custom` directories to the Docker `workspace` directory. +Start the services using the locally built repo. If you have made changes to any of the UI components (or directories), be sure to map the `custom` directories (or alternative directories) to the Docker `workspace` directory.
docker run -it \ ---env-file $PWD/custom_dc/sqlite_env.list \ +--env-file $PWD/custom_dc/env.list \ -p 8080:8080 \ -e DEBUG=true \ -[-v $PWD/custom_dc/CUSTOM_DATA_DIRECTORY:/userdata \] +-v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \ [-v $PWD/server/templates/custom_dc/custom:/workspace/server/templates/custom_dc/custom \] [-v $PWD/static/custom_dc/custom:/workspace/static/custom_dc/custom \] datacommons-website-compose:DOCKER_TAG diff --git a/custom_dc/custom_data.md b/custom_dc/custom_data.md index b5f79ed9a..5060e725d 100644 --- a/custom_dc/custom_data.md +++ b/custom_dc/custom_data.md @@ -21,7 +21,7 @@ Custom Data Commons provides a simple mechanism to import your own data, but it - All data must be in CSV format, using the schema described below. - You must also provide a JSON configuration file, named `config.json`, to map the CSV contents to the Data Commons schema knowledge graph. The contents of the JSON file are described below. -- All CSV files and the JSON file must be in the same directory +- All CSV files and the JSON file _must_ be in the same directory Examples are provided in [`custom_dc/sample`](https://github.com/datacommonsorg/website/tree/master/custom_dc/sample) and [`custom_dc/examples`](https://github.com/datacommonsorg/website/tree/master/custom_dc/examples) directories. @@ -242,22 +242,24 @@ The `sources` section is optional. It encodes the sources and provenances associ To load custom data uploaded to Google Cloud, see instead [Pointing the local Data Commons site to the Cloud data](/custom_dc/data_cloud.html) for procedures. +### Configure custom directories + +Edit the `env.list` file as follows: +- Set the `OUTPUT_DIR` variable to the directory where your input files are stored. The load step will create a `datacommons` subdirectory under this directory. + ### Start the Docker container with local custom data {#docker-data} -Once you have your CSV files and config.json set up, use the following command to restart the Docker container, mapping your custom data directory to the Docker userdata directory. +Once you have configured everything, use the following command to restart the Docker container, mapping your output directory to the same path in Docker:docker run -it \ -p 8080:8080 \ -e DEBUG=true \ ---env-file $PWD/custom_dc/sqlite_env.list \ --v $PWD/custom_dc/CUSTOM_DATA_DIRECTORY:/userdata \ -[-v $PWD/custom_dc/CUSTOM_DATA_DIRECTORY/datacommons:/sqlite] \ +--env-file $PWD/custom_dc/env.list \ \ +-v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \ gcr.io/datcom-ci/datacommons-website-compose:stable-The optional `-v` flag preserves the SQLite data so it loads automatically when you restart the Docker container. - Every time you make changes to the CSV or JSON files, you should reload the data, as described below. ## Load custom data in SQLite @@ -266,9 +268,9 @@ As you are iterating on changes to the source CSV and JSON files, you will need You can load the new/updated data from SQLite using the `/admin` page on the site: -1. Optionally, in the `sqlite_env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data. +1. Optionally, in the `env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data. 1. Start the Docker container as usual, being sure to map the path to the directory containing the custom data (see command above). -1. With the services running, navigate to the `/admin page`. If a secret is required, enter it in the text field, and click **Load**. This runs a script inside the Docker container, that converts the CSV data into SQL tables, and generates embeddings in the container as well. The database is created ascustom_dc/CUSTOM_DATA_DIRECTORY/datacommons/datacommons.db
and embeddings are generated incustom_dc/CUSTOM_DATA_DIRECTORY/datacommons/nl/
. +1. With the services running, navigate to the `/admin` page. If a secret is required, enter it in the text field, and click **Load**. This runs a script inside the Docker container, that converts the CSV data into SQL tables, and generates embeddings in the container as well. The database is created asOUTPUT_DIRECTORY/datacommons/datacommons.db
and embeddings are generated inOUTPUT_DIRECTORY/datacommons/nl/
. ## Inspect the SQLite database @@ -277,7 +279,7 @@ If you need to troubleshoot custom data, it is helpful to inspect the contents o To do so, from a terminal window, open the database:-sqlite3 website/custom_dc/CUSTOM_DATA_DIRECTORY/datacommons/datacommons.db +sqlite3 OUTPUT_DIRECTORY/datacommons/datacommons.dbThis starts the interactive SQLite shell. To view a list of tables, at the prompt type `.tables`. The relevant table is `observations`. diff --git a/custom_dc/custom_ui.md b/custom_dc/custom_ui.md index b5ac97859..7beb2b5f7 100644 --- a/custom_dc/custom_ui.md +++ b/custom_dc/custom_ui.md @@ -53,7 +53,7 @@ HTML and CSS customization files are provided as samples to get you started. The -Note that the `custom` parent directory is customizable as the `FLASK_ENV` environment variable. You can rename the directory as desired and update the environment variable. +Note that the `custom` parent directory is customizable as the `FLASK_ENV` environment variable. You can rename the directory as desired and update the environment variable in `custom_dc/env.list`. To enable the changes to be picked up by the Docker image, and allow you to refresh the browser for further changes, restart the Docker image with this additional flag to map the directories to the Docker workspace: diff --git a/custom_dc/data_cloud.md b/custom_dc/data_cloud.md index 7d4668111..5664aaf7b 100644 --- a/custom_dc/data_cloud.md +++ b/custom_dc/data_cloud.md @@ -21,6 +21,7 @@ Once you have tested locally, you need to get your data into Google Cloud so you You will upload your CSV and JSON files to [Google Cloud Storage](https://cloud.google.com/storage), and the custom Data Commons importer will transform, store, and query the data in a [Google Cloud SQL](https://cloud.google.com/sql) database. + ## Prerequisites - A [GCP](https://console.cloud.google.com/welcome) billing account and project. @@ -57,22 +58,22 @@ While you are testing, you can start with a single Google Cloud region; to be cl 1. For the **Location type**, choose the same regional options as for Cloud SQL above. 1. When you have finished setting all the configuration options, click **Create**. 1. In the **Bucket Details** page, click **Create Folder** to create a new folder to hold your data. -1. Name the folder as desired. Record the folder path asgs://BUCKET_NAME/FOLDER_PATH
for setting environment variables below. You can start with the sample data provided under `custom_dc/sample` and update to your own data later. +1. Name the folder as desired. Record the folder path asgs://BUCKET_NAME/FOLDER_PATH
for setting the `OUTPUT_DIR` environment variable below. -### Set up environment variables +### Set environment variables -1. Using your favorite editor, open `custom_dc/cloudsql_env.list`. -1. Enter the relevant values for `DC_API_KEY` and `MAPS_API_KEY`. +1. Using your favorite editor, open `custom_dc/env.list`. +1. Set `USE_SQLITE=false` and `USE_CLOUDSQL=true` 1. Set values for all of the following: - - `GCS_DATA_PATH` - `CLOUDSQL_INSTANCE` - `GOOGLE_CLOUD_PROJECT` - `DB_NAME` - `DB_USER` - `DB_PASS` + - `OUTPUT_DIR` - See comments in the [`cloudsql_env.list`](https://github.com/datacommonsorg/website/blob/master/custom_dc/cloudsql_env.list) file for the correct format for each option. + See comments in the [`env.list`](https://github.com/datacommonsorg/website/blob/master/custom_dc/env.list) file for the correct format for each option. 1. Optionally, set an `ADMIN_SECRET` to use when loading the data through the `/admin` page later. @@ -117,23 +118,23 @@ If you are prompted to install the Cloud Resource Manager API, press `y` to acce If you have not made changes that require a local build, and just want to run the pre-downloaded image, from your repository root, run: -```shell +docker run -it \ ---env-file $PWD/custom_dc/cloudsql_env.list \ +--env-file $PWD/custom_dc/env.list \ -p 8080:8080 \ -e DEBUG=true \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \ -v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \ gcr.io/datcom-ci/datacommons-website-compose:stable -``` +#### Run with a locally built repo -If you have made local changes and have a [locally built repo](/custom_dc/manage_repo.html#build-repo), from the root of the repository, run the following: +If you have made local changes and have a [locally built repo](/custom_dc/build_image.html#build-repo), from the root of the repository, run the following:docker run -it \ ---env-file $PWD/custom_dc/cloudsql_env.list \ +--env-file $PWD/custom_dc/env.list \ -p 8080:8080 \ -e DEBUG=true \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \ @@ -149,7 +150,7 @@ Each time you upload new versions of the source CSV and JSON files, you need to You can load the new/updated data from Cloud Storage using the `/admin` page on the site: -1. Optionally, in the `cloudsql_env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data. +1. Optionally, in the `env.list` file, set the `ADMIN_SECRET` environment variable to a string that authorizes users to load data. 1. Start the Docker container as described above. 1. With the services running, navigate to the `/admin` page. If a secret is required, enter it in the text field, and click **Load**. This runs a script inside the Docker container, that converts the CSV data in Cloud Storage into SQL tables, and stores them in the Cloud SQL database you created earlier. It also generates embeddings in the Google Cloud Storage folder into which you uploaded the CSV/JSON files, in a `datacommons/nl/` subfolder. diff --git a/custom_dc/deploy_cloud.md b/custom_dc/deploy_cloud.md index 0a1cc8862..6a3bf3fba 100644 --- a/custom_dc/deploy_cloud.md +++ b/custom_dc/deploy_cloud.md @@ -76,13 +76,13 @@ When it completes, verify that the container has been uploaded in the Cloud Cons To deploy the image in Google Cloud Run, you need to create a Run service. Because we need to set the environment variables for the running image as options in the service, it is actually more convenient to create the service and deploy the image at the same time using `gcloud` rather than the Cloud Console. Once the service is created in this way, you can edit it and redeploy using the Console. -1. Copy the settings from the cloudsql_env.list file to a local environment variable: +1. Copy the settings from the `env.list` file to a local environment variable: ```shell - env_vars=$(awk -F '=' 'NF==2 {print $1"="$2}' custom_dc/cloudsql_env.list | tr '\n' ',' | sed 's/,$//') + env_vars=$(awk -F '=' 'NF==2 {print $1"="$2}' custom_dc/env.list | tr '\n' ',' | sed 's/,$//') ``` -1. Create a new Cloud Run service and deploy the image in the Artifact Registry. ` +1. Create a new Cloud Run service and deploy the image in the Artifact Registry.gcloud run deploy SERVICE_NAME \ diff --git a/custom_dc/launch_cloud.md b/custom_dc/launch_cloud.md index 26a13befe..80e7101c1 100644 --- a/custom_dc/launch_cloud.md +++ b/custom_dc/launch_cloud.md @@ -43,7 +43,7 @@ The following is a sample configuration that you can tune as needed. For additio **Step 2: Set the environment variable** 1. When the Redis instance is created above, go to the **Instances > Redis** tab, look up your instance and note the **Primary Endpoint** IP address. -1. In `custom_dc/cloudsql_env.list`, set the value of the `REDIS_HOST` option to the IP address. +1. In `custom_dc/env.list`, set the value of the `REDIS_HOST` option to the IP address. **Step 3: Create the VPC connector** diff --git a/custom_dc/quickstart.md b/custom_dc/quickstart.md index bf2ff6e5e..5027271d7 100644 --- a/custom_dc/quickstart.md +++ b/custom_dc/quickstart.md @@ -27,7 +27,7 @@ The instructions in this page use the following setup: - If you are developing on Windows, install [WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install) (any distribution will do, but we recommend the default, Ubuntu), and enable [WSL 2 integration with Docker](https://docs.docker.com/desktop/wsl/). - Install [Docker Desktop/Engine](https://docs.docker.com/engine/install/). - Install [Git](https://git-scm.com/). -- Get an API key for Data Commons by submitting the [Data Commons API key request form](https://docs.google.com/forms/d/e/1FAIpQLSePrkVfss9lUIHFClQsVPwPcAVWvX7WaZZyZjJWS99wRQNW4Q/viewform?resourcekey=0-euQU6Kly7YIWVRNS2p4zjw). The key is needed to authorize requests from your site to the base Data Commons site. Typical turnaround times are 24-48 hours. +- Get an API key to authorize requests from your site to the base Data Commons, by [filling out this form](https://docs.google.com/forms/d/e/1FAIpQLSeVCR95YOZ56ABsPwdH1tPAjjIeVDtisLF-8oDYlOxYmNZ7LQ/viewform?usp=dialog). Typical turnaround times are 24-48 hours. - Optional: Get a [Github](http://github.com) account, if you would like to browse the Data Commons source repos using your browser. ## One-time setup steps {#setup} @@ -63,15 +63,16 @@ When the downloads are complete, navigate to the root directory of the repo (e.g cd website | DIRECTORY-### Set API keys as environment variables +### Set environment variables -1. Using your favorite editor, open `custom_dc/sqlite_env.list`. +1. Using your favorite editor, open `custom_dc/env.list`. 1. Enter the relevant values for `DC_API_KEY` and `MAPS_API_KEY`. +1. For the `OUTPUT_DIR`, set it to the full path to the `sample` directory. For example if you have cloned the repo directly to your home directory, this would be/home/USERNAME/website/custom_dc/sample
. 1. Leave `ADMIN_SECRET` blank for now. Warning: Do not use any quotes (single or double) or spaces when specifying the values. -Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables files containing secrets. Instead, store them locally only. If you are using git/Github, you can add the file to the `.gitignore` file. +Note: If you are storing your source code in a public/open-source version control system, we recommend that you do not store the environment variables file containing secrets. Instead, store it locally only. If you are using git/Github, you can add the file name to the `.gitignore` file. ## About the downloaded files @@ -85,7 +86,7 @@ Note: If you are storing your source code in a public/open-source version contro- custom_dc/sample/
Sample supplemental data that is added to the base data in Data Commons. This page shows you how to easily load and view this data. The data is in CSV format and mapped to Data Commons entity definitions using the config.json file. +Sample supplemental data that is added to the base data in Data Commons. This page shows you how to easily load and view this data. The data is in CSV format and mapped to Data Commons entity definitions using the `config.json` file. @@ -100,12 +101,8 @@ Note: If you are storing your source code in a public/open-source version contro custom_dc/examples/
Contains customizable CSS file and default logo. To modify the styles or replace the logo, see Customize Javascript and styles. - -- custom_dc/sqlite_env.list
Contains environment variables for a development environment using SQLite as the database. For details of the variables, see the comments in the file. -- @@ -119,19 +116,19 @@ Note: If you are storing your source code in a public/open-source version contro docker run -it \ -p 8080:8080 \ -e DEBUG=true \ ---env-file $PWD/custom_dc/sqlite_env.list \ --v $PWD/custom_dc/sample:/userdata \ +--env-file $PWD/custom_dc/env.list \ +-v $PWD/custom_dc/sample:$PWD/custom_dc/sample \ gcr.io/datcom-ci/datacommons-website-compose:stable ``` -Note: If you are running on Linux, depending on whether you have created a ["sudoless" Docker group](https://docs.docker.com/engine/install/linux-postinstall/), you will need to preface every `docker` invocation with `sudo`. +Note: If you are running on Linux, depending on whether you have created a ["sudoless" Docker group](https://docs.docker.com/engine/install/linux-postinstall/), you may need to preface every `docker` invocation with `sudo`. This command does the following: - The first time you run it, downloads the latest stable Data Commons image, `gcr.io/datcom-ci/datacommons-website-compose:stable`, from the Google Cloud Artifact Registry, which may take a few minutes. Subsequent runs use the locally stored image. - Starts a Docker container in interactive mode. - Starts development/debug versions of the Web Server, NL Server, and Mixer, as well as the Nginx proxy, inside the container -- Maps the sample data to the Docker path `/userdata`, so the servers do not need to be restarted when you load the sample data +- Maps the sample data to a Docker path, so the servers do not need to be restarted when you load the sample data ### Stop and restart the services diff --git a/custom_dc/troubleshooting.md b/custom_dc/troubleshooting.md index 9acd4dbb7..f4394341d 100644 --- a/custom_dc/troubleshooting.md +++ b/custom_dc/troubleshooting.md @@ -1,7 +1,7 @@ --- layout: default title: Troubleshooting -nav_order: 8 +nav_order: 9 parent: Build your own Data Commons --- @@ -65,7 +65,7 @@ There is a problem with how you have set up your CSV files and/or config.json fi If the load page does not show any errors but data still does not load, try checking the following: -1. In the `_env.list` file you are using, check that you are not using single or double quotes around any of the values. +1. In the `env.list` file, check that you are not using single or double quotes around any of the values. 1. Check your Docker command line for invalid arguments. Often Docker won't give any error messages but failures will show up at runtime. ## Website display problems- custom_dc/cloudsql_env.list
Contains environment variables for a development or production environment using Cloud SQL as the database. For details of the variables, see the comments in the file. ++ custom_dc/env.list
Contains environment variables for the Data Commons services. For details of the variables, see the comments in the file.