Skip to content

Commit

Permalink
Alternate schema mode text, with standalone sections (#530)
Browse files Browse the repository at this point in the history
* integrate custom docs with new UI

* more edits

* use website wording for intro

* fix numbering in table

* rename and some edits

* rename manage_repo file, per Bo

* Merge.

* formatting edits

* updates per Keyur's feedback

* Fix typos

* fix nav order

* fix link to API key request form

* update form link

* update key request form and output dir env var

* Revert to gerund

Though the style guide says to just use imperatives, "get started" just sounds weird. Also this is more consistent with "troubleshooting"

* new troubleshooting entry

* fix typo

* new data container procedures

* more work

* more work

* complete data draft

* more changes

* more changes

* more revisions

* update troubleshooting doc etc.

* new version of diagrams

* remove data loading problems troubleshooting entry; can't reproduce

* revert title change

* add example for not mixing entity types

* changes from Keyur

* add screenshots for GCP, and related changes

* fixed one image

* added screenshots for Cloud Run service

* resize images

* more changes from Keyur

* fix a tiny error

* delete unused images

* fix missing dash

* update build file

* adjust build command

* Revert "adjust build command"

This reverts commit 4ce0fb9.

* update docker file

* more fixes

* one last fix

* make links to Cloud Console open in a new page

* fixes to quickstart suggested by Prem

* one more change

* change from Keyur

* revise procedure

* merge

* add brief explanation of data model to quickstart

* slight wording tweak

* incorporate feedback from Keyur

* remove erroneous edit

* correct missing text

* more work on tasks for finding stuff

* merge

* update to use env.sample

* typo

* typo

* get file back in head shape

* fix file name

* add more detail about data security

* fix typo

* corrections from Keyur

* fix other mention of SQL queries

* add both data directories to docker run commands

* remove extra slash

* update feedback links

* tiny tweaks

* fixes from Hannah

* fix grammar

* remove redundant text

* add link for data requests

* second try

* fix link

* add template parameter back

* add link to issue tracker docs

* feedback from Keyur

* fix template parameter

* add doc for observation properties

* more edits

* corrections from Keyur

* one more change from Keyur

* Add update schema option.

* wording fixes

* alternate schema mode text

* resolve comments from Hannah

* more additions from Hannah

* final edits from Hannah
  • Loading branch information
kmoscoe authored Oct 29, 2024
1 parent 617676b commit 9feb2c4
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 8 deletions.
20 changes: 20 additions & 0 deletions custom_dc/custom_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,8 @@ Edit the `env.list` file you created [previously](/custom_dc/quickstart.html#env

Once you have configured everything, use the following commands to run the data management container and restart the services container, mapping your input and output directories to the same paths in Docker.

#### Step 1: Start the data management container

In one terminal window, from the root directory, run the following command to start the data management container:

<pre>
Expand All @@ -283,6 +285,24 @@ docker run \
gcr.io/datcom-ci/datacommons-data:stable
</pre>

##### Start the data management container in schema update mode {#schema-update-mode}

If you have tried to start a container, and have received a `SQL check failed` error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, `DATA_RUN_MODE=schemaupdate`. This mode updates the database schema without re-importing data or re-building natural language embeddings. This is the quickest way to resolve a SQL check failed error during services container startup.

To do so, add the following line to the above command:

```
docker run \
...
-e DATA_RUN_MODE=schemaupdate \
...
gcr.io/datcom-ci/datacommons-data:stable
```

Once the job has run, go to step 2 below.

#### Step 2: Start the services container

In another terminal window, from the root directory, run the following command to start the services container:

<pre>
Expand Down
35 changes: 29 additions & 6 deletions custom_dc/data_cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,20 +118,31 @@ Now set environment variables:
As you are iterating on changes to the source CSV and JSON files, you can re-upload them at any time, either overwriting existing files or creating new folders. To load them into Cloud SQL, you run the Cloud Run job you created above.

### Step 2: Start the data management Cloud Run job {#run-job}
### Step 2: Run the data management Cloud Run job {#run-job}

Now that everything is configured, and you have uploaded your data in Google Cloud Storage, you simply have to start the Cloud Run data management job to convert the CSV data into tables in the Cloud SQL database and generate the embeddings (in a `datacommons/nl` subfolder).

Every time you upload new input CSV or JSON files to Google Cloud Storage, you will need to rerun the job.

To run the job:
To run the job using the Cloud Console:

1. Go to [https://console.cloud.google.com/run/jobs](https://console.cloud.google.com/run/jobs){: target="_blank"} for your project.
1. From the list of jobs, click the link of the "datacommons-data" job you created above.
1. Click **Execute**. It will take several minutes for the job to run. You can click the **Logs** tab to view the progress.

When it completes, to verify that the data has been loaded correctly, see the next step.

#### Run the data management Cloud Run job in schema update mode {#schema-update-mode}

If you have tried to start a container, and have received a `SQL check failed` error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, `DATA_RUN_MODE=schemaupdate`. This mode updates the database schema without re-importing data or re-building natural language embeddings. This is the quickest way to resolve a SQL check failed error during services container startup.

To run the job using the Cloud Console:

1. Go to [https://console.cloud.google.com/run/jobs](https://console.cloud.google.com/run/jobs){: target="_blank"} for your project.
1. From the list of jobs, click the link of the "datacommons-data" job you created above.
1. Optionally, select **Execute** > **Execute with overrides** and click **Add variable** to set a new variable with name `DATA_RUN_MODE` and value `schemaupdate`.
1. Click **Execute**. It will take several minutes for the job to run. You can click the **Logs** tab to view the progress.

### Inspect the Cloud SQL database {#inspect-sql}

To view information about the created tables:
Expand Down Expand Up @@ -181,7 +192,7 @@ gcloud auth application-default set-quota-project <var>PROJECT_ID</var>

If you are prompted to install the Cloud Resource Manager API, press `y` to accept.

### Step 3: Run the Docker container
### Step 3: Run the data management Docker container

From your project root directory, run:

Expand All @@ -199,6 +210,20 @@ The version is `latest` or `stable`.

To verify that the data is correctly created in your Cloud SQL database, use the procedure in [Inspect the Cloud SQL database](#inspect-sql) above.

#### Run the data management Docker container in schema update mode

If you have tried to start a container, and have received a `SQL check failed` error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, `DATA_RUN_MODE` to miminize the startup time.

To do so, add the following line to the above command:

```
docker run \
...
-e DATA_RUN_MODE=schemaupdate \
...
gcr.io/datcom-ci/datacommons-data:stable
```

## Advanced setup (optional): Access Cloud data from a local services container

For testing purposes, if you wish to run the services Docker container locally but access the data in Google Cloud, use the following procedures.
Expand All @@ -211,7 +236,7 @@ To run a local instance of the services container, you will need to set all the

See the section [above](#gen-creds) for procedures.

### Step 3: Run the Docker container
### Step 3: Run the services Docker container

From the root directory of your repo, run the following command, assuming you are using a locally built image:

Expand All @@ -230,5 +255,3 @@ docker run -it \
</pre>




9 changes: 7 additions & 2 deletions custom_dc/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,17 @@ Failed to create metadata: failed to create secret manager client: google: could

This indicates that you have not specified API keys in the environment file. Follow procedures in [One-time setup steps](/custom_dc/quickstart.html#setup) to obtain and configure API keys.

{: #schema-check-failed}
### "SQL schema check failed"

This error indicates that there is a problem with the database schema. Check for the following additional error:
This error indicates that there has been an update to the database schema, and you need to update your database schema by re-running the data management job as follows:

- "The following columns are missing..." -- This indicates that there has been an update to the database schema. To remedy this, rerun the data management Docker container and then restart the services container.
1. Rerun the data management Docker container, optionally adding the flag `-e DATA_RUN_MODE=schemaupdate` to the `docker run` command. This updates the database schema without re-importing data or re-building natural language embeddings.
1. Restart the services Docker container.

For full command details, see the following sections:
- For local services, see [Start the data management container in schema update mode](/custom_dc/custom_data.html#schema-update-mode).
- For services running on Google Cloud, see [Run the data management Cloud Run job in schema update mode](/custom_dc/data_cloud#schema-update-mode).

## Local build errors

Expand Down

0 comments on commit 9feb2c4

Please sign in to comment.