Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update custom DC docs for single env.list file #456

Merged
merged 22 commits into from
Jul 25, 2024

Conversation

kmoscoe
Copy link
Contributor

@kmoscoe kmoscoe commented Jul 23, 2024

No description provided.

@kmoscoe kmoscoe requested a review from keyurva July 23, 2024 03:00
custom_dc/build_image.md Outdated Show resolved Hide resolved
@@ -242,22 +242,24 @@ The `sources` section is optional. It encodes the sources and provenances associ

To load custom data uploaded to Google Cloud, see instead [Pointing the local Data Commons site to the Cloud data](/custom_dc/data_cloud.html) for procedures.

### Configure custom directories

If you are using a directory other than `custom_dc/sample` to store your CSV files, edit the `env.list` file as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would need to update the file for the sample as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for catching.

custom_dc/custom_data.md Outdated Show resolved Hide resolved
@@ -277,7 +279,7 @@ If you need to troubleshoot custom data, it is helpful to inspect the contents o
To do so, from a terminal window, open the database:

<pre>
sqlite3 website/custom_dc/<var>CUSTOM_DATA_DIRECTORY</var>/datacommons/datacommons.db
sqlite3 <var>WEBSITE_ROOT</var>/<var>OUTPUT_DIRECTORY</var>/datacommons/datacommons.db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output directory need not be under the website root. So the path can just be <var>OUTPUT_DIRECTORY</var>/datacommons/datacommons.db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, OK, thanks

@@ -57,22 +58,22 @@ While you are testing, you can start with a single Google Cloud region; to be cl
1. For the **Location type**, choose the same regional options as for Cloud SQL above.
1. When you have finished setting all the configuration options, click **Create**.
1. In the **Bucket Details** page, click **Create Folder** to create a new folder to hold your data.
1. Name the folder as desired. Record the folder path as <code>gs://<var>BUCKET_NAME</var>/<var>FOLDER_PATH</var></code> for setting environment variables below. You can start with the sample data provided under `custom_dc/sample` and update to your own data later.
1. Name the folder as desired. Record the folder path as <code>gs://<var>BUCKET_NAME</var>/<var>FOLDER_PATH</var></code> for setting the `OUTPUT_DIR` environment variable below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the deprecated GCS_DATA_PATH variable for the benefit of partners using it right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's in mentioned in the file.

docker run -it \
--env-file $PWD/custom_dc/cloudsql_env.list \
--env-file $PWD/custom_dc/env.list \
-v <var>OUTPUT_DIRECTORY</var>:/<var>OUTPUT_DIRECTORY</var> \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GCS paths, there is nothing to mount. So you can remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--env-file $PWD/custom_dc/sqlite_env.list \
-v $PWD/custom_dc/sample:/userdata \
--env-file $PWD/custom_dc/env.list \
-v $PWD/custom_dc/sample:/$PWD/custom_dc/sample \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the slash after the colon.

Can you go through all docker run commands in the doc to make sure both sides of the colon are the same exact paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@keyurva keyurva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

### Configure custom directories

Edit the `env.list` file as follows:
- Set the `OUTPUT_DIR` variable to the directory where your input files are stored. The load step will create a `datacommons` subdirectory under this directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a short note to explain this dichotomy. Something along the lines of a separate utility for managing data coming soon where input and output dirs can be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, let's not bother. Too much to explain now.

@kmoscoe kmoscoe merged commit 5aefa1c into datacommonsorg:master Jul 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants