-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document data docker schema update mode #527
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the decision on the env var.
custom_dc/database_update.md
Outdated
|
||
While starting Data Commons services, you may see an error that starts with `SQL schema check failed`. This means your database schema must be updated for compatibility with the latest Data Commons services. | ||
|
||
You can update your database by running a data management job with the environment variable `SCHEMA_UPDATE_ONLY` set to `true`. This will alter your database without modifying already-imported data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May need to be updated based on what we decide in https://github.com/datacommonsorg/website/pull/4686/files#r1815646141
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Will wait to commit this until the mode is available in the stable data container image. |
Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't actually include this as a separate page. It's too minor a detail. I would instead just put the text in the relevant existing sections. Can I send you an alternative PR as a proof of concept?
My thought was to have it on one page for ease of linking from an error message! Is that alright? Maybe we can remove this page from the sidenav if we don't want to feel like we're cluttering the docsite? |
You could still link to a subsection using an anchor. |
Another thing I don't really like is repeating in entirety the startup procedures. It makes more sense to integrate this info into the existing procedures. Let me send you a PR so you can see what I mean. |
Kara, thanks for putting together an alternate approach! I understand the desire not to duplicate startup commands across multiple pages. That said, I worry that the other approach:
Another argument I'd make in favor of having schema update mode be its own page is that we may expand the mode variable in the future to support more different modes, in which case we can easily revise and extend that page. @keyurva Do you want to weigh in as a decision tiebreaker here? |
On Mon, Oct 28, 2024 at 3:35 PM Hannah Pho ***@***.***> wrote:
Kara, thanks for putting together an alternate approach
<#529>! I understand the
desire not to duplicate startup commands across multiple pages. That said,
I worry that the other approach:
- Adds cognitive overhead for people running a typical data load
command, trying to copy/paste from a doc page and having to learn about a
new param that is almost never relevant to them
OK, another possibility is to have a subheading about running in schema
update mode.
- Doesn't explain the mechanism by which the startup time is
minimized, so it might come as a surprise to people what exactly the mode
does (or rather doesn't) do
But that wasn't really given in your PR either. I can easily add some more
info if you give it to me.
Another argument I'd make in favor of having schema update mode be its own
page is that we may expand the mode variable in the future to support more
different modes, in which case we can easily revise and extend that page.
I don't like the idea of having pages determined by some random feature.
They should be determined by the general stage in the workflow, which is
the overall structure I've set up and which is linked from the landing
page. Let's please not introduce additional pages for every new feature we
add.
… @keyurva <https://github.com/keyurva> Do you want to weigh in as a
decision tiebreaker here?
—
Reply to this email directly, view it on GitHub
<#527 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BHMM7UBZQVKFSV6XEQQJDCDZ52GXFAVCNFSM6AAAAABQNLOYA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGQ2TONZUG4>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Is there a middle ground where we don't have a separate page for it but are able to make it as self-sufficient as possible? The typical workflow is likely going to be what Hannah described: a user starts the service, notices the schema failure with a link to the relevant section / page in the docsite and is primarily interested in resolving this failure in the quickest possible manner. |
Well, the quickest possible way would just say in the error message:
"Restart the data management job, optionally adding the -e
DATA_RUN_MODE=schemaupdate option for faster performance, and rerun the
services job" or something like that.
The middle ground would be a self-contained topic within the existing
pages. Let me prepare a PR for you that would show that as a PoC, OK?
…On Mon, Oct 28, 2024 at 4:54 PM Keyur Shah ***@***.***> wrote:
Is there a middle ground where we don't have a separate page for it but
are able to make it as self-sufficient as possible?
The typical workflow is likely going to be what Hannah described: a user
starts the service, notices the schema failure with a link to the relevant
section / page in the docsite and is primarily interested in resolving this
failure in the quickest possible manner.
—
Reply to this email directly, view it on GitHub
<#527 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BHMM7UDGAVKKBX2KIIX5LVDZ52P6XAVCNFSM6AAAAABQNLOYA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGYYDONRWG4>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527
* update submodule for release (datacommonsorg#4681) * update NL goldens after mixer push (datacommonsorg#4680) * Adds logging for autocomplete responses. (datacommonsorg#4678) Logs the response count for autocompletion. Staging is not showing any responses. Would like to better understand where the breakdown is occurring. * Exit cdc_services/run.sh when any background process exits (datacommonsorg#4682) This makes startup errors in Mixer or NL servers more obvious. Bug: b/374820494 Reference: https://docs.docker.com/engine/containers/multi-service_container/#use-a-wrapper-script * update nodejs goldens (datacommonsorg#4685) goldens needed to be updated because of a bunch of recent data updates (data diffs can be seen here: datacommonsorg/mixer#1438, datacommonsorg/mixer#1439) * Update submodules (datacommonsorg#4688) * Pin transformers to 4.45.2 (datacommonsorg#4689) Also updates nl goldens * Support schema update mode for data docker (datacommonsorg#4686) Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527 * Improves Typo recognition for autocomplete (datacommonsorg#4690) This PR modifies the scoring algorithm for place autocomplete to count a small score for non-exact matches, to account for one typo. With these changes, we will favor "San Diego" over "Dieppe" for the query "Sna Die". Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX Local with this change: https://screenshot.googleplex.com/9jHqKb2uHJLz37k Note that "Sne Die" will still go back to "Dieppe" because that's 2 typos, so San Diego is out even if it was returned by google Maps predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6 Typo check done as a bag of letters with at most off by one. We do this check on top of the Google Maps Predictions which already take into account typo correction. This part is just to choose the best prediction from google maps. Doing this as part of gaps identified in place autocomplete: https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0 --------- Co-authored-by: chejennifer <[email protected]> Co-authored-by: Gabriel Mechali <[email protected]> Co-authored-by: natalie <[email protected]>
Hey both -- I sent you PR 530
<#530> to try to make the
text more standalone. Please review; thanks!
…On Mon, Oct 28, 2024 at 5:12 PM Kara Moscoe ***@***.***> wrote:
Well, the quickest possible way would just say in the error message:
"Restart the data management job, optionally adding the -e
DATA_RUN_MODE=schemaupdate option for faster performance, and rerun the
services job" or something like that.
The middle ground would be a self-contained topic within the existing
pages. Let me prepare a PR for you that would show that as a PoC, OK?
On Mon, Oct 28, 2024 at 4:54 PM Keyur Shah ***@***.***>
wrote:
> Is there a middle ground where we don't have a separate page for it but
> are able to make it as self-sufficient as possible?
>
> The typical workflow is likely going to be what Hannah described: a user
> starts the service, notices the schema failure with a link to the relevant
> section / page in the docsite and is primarily interested in resolving this
> failure in the quickest possible manner.
>
> —
> Reply to this email directly, view it on GitHub
> <#527 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BHMM7UDGAVKKBX2KIIX5LVDZ52P6XAVCNFSM6AAAAABQNLOYA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGYYDONRWG4>
> .
> You are receiving this because your review was requested.Message ID:
> ***@***.***>
>
|
Subsumed by #530 |
This mode is added by datacommonsorg/website#4686. A subsequent PR to mixer will link the new docsite page directly from the schema check error message: datacommonsorg/mixer#1440