Document data docker schema update mode #527

hqpho · 2024-10-22T20:55:03Z

This mode is added by datacommonsorg/website#4686. A subsequent PR to mixer will link the new docsite page directly from the schema check error message: datacommonsorg/mixer#1440

keyurva

LGTM modulo the decision on the env var.

keyurva · 2024-10-24T20:19:19Z

custom_dc/database_update.md

+
+While starting Data Commons services, you may see an error that starts with `SQL schema check failed`. This means your database schema must be updated for compatibility with the latest Data Commons services.
+
+You can update your database by running a data management job with the environment variable `SCHEMA_UPDATE_ONLY` set to `true`. This will alter your database without modifying already-imported data.


May need to be updated based on what we decide in https://github.com/datacommonsorg/website/pull/4686/files#r1815646141

hqpho · 2024-10-25T17:59:03Z

Will wait to commit this until the mode is available in the stable data container image.

Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527

kmoscoe

I wouldn't actually include this as a separate page. It's too minor a detail. I would instead just put the text in the relevant existing sections. Can I send you an alternative PR as a proof of concept?

hqpho · 2024-10-28T18:39:12Z

I wouldn't actually include this as a separate page. It's too minor a detail. I would instead just put the text in the relevant existing sections. Can I send you an alternative PR as a proof of concept?

My thought was to have it on one page for ease of linking from an error message! Is that alright? Maybe we can remove this page from the sidenav if we don't want to feel like we're cluttering the docsite?

kmoscoe · 2024-10-28T18:42:09Z

You could still link to a subsection using an anchor.

kmoscoe · 2024-10-28T18:44:44Z

Another thing I don't really like is repeating in entirety the startup procedures. It makes more sense to integrate this info into the existing procedures. Let me send you a PR so you can see what I mean.

hqpho · 2024-10-28T19:34:51Z

Kara, thanks for putting together an alternate approach! I understand the desire not to duplicate startup commands across multiple pages. That said, I worry that the other approach:

Adds cognitive overhead for people running a typical data load command, trying to copy/paste from a doc page and having to learn about a new param that is almost never relevant to them
Doesn't explain the mechanism by which the startup time is minimized, so it might come as a surprise to people what exactly the mode does (or rather doesn't) do

Another argument I'd make in favor of having schema update mode be its own page is that we may expand the mode variable in the future to support more different modes, in which case we can easily revise and extend that page.

@keyurva Do you want to weigh in as a decision tiebreaker here?

kmoscoe · 2024-10-28T20:24:40Z

On Mon, Oct 28, 2024 at 3:35 PM Hannah Pho ***@***.***> wrote: Kara, thanks for putting together an alternate approach <#529>! I understand the desire not to duplicate startup commands across multiple pages. That said, I worry that the other approach: - Adds cognitive overhead for people running a typical data load command, trying to copy/paste from a doc page and having to learn about a new param that is almost never relevant to them OK, another possibility is to have a subheading about running in schema

update mode.

- Doesn't explain the mechanism by which the startup time is minimized, so it might come as a surprise to people what exactly the mode does (or rather doesn't) do But that wasn't really given in your PR either. I can easily add some more

info if you give it to me.

Another argument I'd make in favor of having schema update mode be its own page is that we may expand the mode variable in the future to support more different modes, in which case we can easily revise and extend that page.

I don't like the idea of having pages determined by some random feature. They should be determined by the general stage in the workflow, which is the overall structure I've set up and which is linked from the landing page. Let's please not introduce additional pages for every new feature we add.

…

@keyurva <https://github.com/keyurva> Do you want to weigh in as a decision tiebreaker here? — Reply to this email directly, view it on GitHub <#527 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BHMM7UBZQVKFSV6XEQQJDCDZ52GXFAVCNFSM6AAAAABQNLOYA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGQ2TONZUG4> . You are receiving this because your review was requested.Message ID: ***@***.***>

keyurva · 2024-10-28T20:53:41Z

Is there a middle ground where we don't have a separate page for it but are able to make it as self-sufficient as possible?

The typical workflow is likely going to be what Hannah described: a user starts the service, notices the schema failure with a link to the relevant section / page in the docsite and is primarily interested in resolving this failure in the quickest possible manner.

kmoscoe · 2024-10-28T21:13:33Z

Well, the quickest possible way would just say in the error message: "Restart the data management job, optionally adding the -e DATA_RUN_MODE=schemaupdate option for faster performance, and rerun the services job" or something like that. The middle ground would be a self-contained topic within the existing pages. Let me prepare a PR for you that would show that as a PoC, OK?

…

On Mon, Oct 28, 2024 at 4:54 PM Keyur Shah ***@***.***> wrote: Is there a middle ground where we don't have a separate page for it but are able to make it as self-sufficient as possible? The typical workflow is likely going to be what Hannah described: a user starts the service, notices the schema failure with a link to the relevant section / page in the docsite and is primarily interested in resolving this failure in the quickest possible manner. — Reply to this email directly, view it on GitHub <#527 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BHMM7UDGAVKKBX2KIIX5LVDZ52P6XAVCNFSM6AAAAABQNLOYA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGYYDONRWG4> . You are receiving this because your review was requested.Message ID: ***@***.***>

Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527

* update submodule for release (datacommonsorg#4681) * update NL goldens after mixer push (datacommonsorg#4680) * Adds logging for autocomplete responses. (datacommonsorg#4678) Logs the response count for autocompletion. Staging is not showing any responses. Would like to better understand where the breakdown is occurring. * Exit cdc_services/run.sh when any background process exits (datacommonsorg#4682) This makes startup errors in Mixer or NL servers more obvious. Bug: b/374820494 Reference: https://docs.docker.com/engine/containers/multi-service_container/#use-a-wrapper-script * update nodejs goldens (datacommonsorg#4685) goldens needed to be updated because of a bunch of recent data updates (data diffs can be seen here: datacommonsorg/mixer#1438, datacommonsorg/mixer#1439) * Update submodules (datacommonsorg#4688) * Pin transformers to 4.45.2 (datacommonsorg#4689) Also updates nl goldens * Support schema update mode for data docker (datacommonsorg#4686) Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527 * Improves Typo recognition for autocomplete (datacommonsorg#4690) This PR modifies the scoring algorithm for place autocomplete to count a small score for non-exact matches, to account for one typo. With these changes, we will favor "San Diego" over "Dieppe" for the query "Sna Die". Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX Local with this change: https://screenshot.googleplex.com/9jHqKb2uHJLz37k Note that "Sne Die" will still go back to "Dieppe" because that's 2 typos, so San Diego is out even if it was returned by google Maps predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6 Typo check done as a bag of letters with at most off by one. We do this check on top of the Google Maps Predictions which already take into account typo correction. This part is just to choose the best prediction from google maps. Doing this as part of gaps identified in place autocomplete: https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0 --------- Co-authored-by: chejennifer <[email protected]> Co-authored-by: Gabriel Mechali <[email protected]> Co-authored-by: natalie <[email protected]>

kmoscoe · 2024-10-29T19:56:27Z

Hey both -- I sent you PR 530 <#530> to try to make the text more standalone. Please review; thanks!

…

On Mon, Oct 28, 2024 at 5:12 PM Kara Moscoe ***@***.***> wrote: Well, the quickest possible way would just say in the error message: "Restart the data management job, optionally adding the -e DATA_RUN_MODE=schemaupdate option for faster performance, and rerun the services job" or something like that. The middle ground would be a self-contained topic within the existing pages. Let me prepare a PR for you that would show that as a PoC, OK? On Mon, Oct 28, 2024 at 4:54 PM Keyur Shah ***@***.***> wrote: > Is there a middle ground where we don't have a separate page for it but > are able to make it as self-sufficient as possible? > > The typical workflow is likely going to be what Hannah described: a user > starts the service, notices the schema failure with a link to the relevant > section / page in the docsite and is primarily interested in resolving this > failure in the quickest possible manner. > > — > Reply to this email directly, view it on GitHub > <#527 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BHMM7UDGAVKKBX2KIIX5LVDZ52P6XAVCNFSM6AAAAABQNLOYA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGYYDONRWG4> > . > You are receiving this because your review was requested.Message ID: > ***@***.***> >

hqpho · 2024-10-29T21:55:22Z

Subsumed by #530

hqpho added 3 commits October 21, 2024 23:57

Document data management schema update mode

d4cc7d3

Open links in new tab

a393081

Use the right command for local job w Cloud SQL

de61ea6

hqpho changed the title ~~Document the data docker schema update mode~~ Document data docker schema update mode Oct 22, 2024

This was referenced Oct 22, 2024

Add docsite link in schema check error message. datacommonsorg/mixer#1440

Merged

Support schema update mode for data docker datacommonsorg/website#4686

Merged

hqpho requested review from keyurva and kmoscoe October 22, 2024 20:59

keyurva approved these changes Oct 24, 2024

View reviewed changes

Change mode var name

8ad34b4

kmoscoe reviewed Oct 28, 2024

View reviewed changes

hqpho closed this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document data docker schema update mode #527

Document data docker schema update mode #527

hqpho commented Oct 22, 2024 •

edited

Loading

keyurva left a comment

keyurva Oct 24, 2024

hqpho Oct 25, 2024

hqpho commented Oct 25, 2024

kmoscoe left a comment

hqpho commented Oct 28, 2024

kmoscoe commented Oct 28, 2024

kmoscoe commented Oct 28, 2024

hqpho commented Oct 28, 2024

kmoscoe commented Oct 28, 2024 via email

keyurva commented Oct 28, 2024

kmoscoe commented Oct 28, 2024 via email

kmoscoe commented Oct 29, 2024 via email

hqpho commented Oct 29, 2024


		While starting Data Commons services, you may see an error that starts with `SQL schema check failed`. This means your database schema must be updated for compatibility with the latest Data Commons services.

		You can update your database by running a data management job with the environment variable `SCHEMA_UPDATE_ONLY` set to `true`. This will alter your database without modifying already-imported data.

Document data docker schema update mode #527

Document data docker schema update mode #527

Conversation

hqpho commented Oct 22, 2024 • edited Loading

keyurva left a comment

Choose a reason for hiding this comment

keyurva Oct 24, 2024

Choose a reason for hiding this comment

hqpho Oct 25, 2024

Choose a reason for hiding this comment

hqpho commented Oct 25, 2024

kmoscoe left a comment

Choose a reason for hiding this comment

hqpho commented Oct 28, 2024

kmoscoe commented Oct 28, 2024

kmoscoe commented Oct 28, 2024

hqpho commented Oct 28, 2024

kmoscoe commented Oct 28, 2024 via email

keyurva commented Oct 28, 2024

kmoscoe commented Oct 28, 2024 via email

kmoscoe commented Oct 29, 2024 via email

hqpho commented Oct 29, 2024

hqpho commented Oct 22, 2024 •

edited

Loading