-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support schema update mode for data docker #4686
Conversation
build/cdc_data/run.sh
Outdated
@@ -64,18 +64,30 @@ fi | |||
# cd into simple importer dir to run the importer. | |||
cd $WORKSPACE_DIR/import/simple | |||
|
|||
# Pick import mode based on value of $SCHEMA_UPDATE_ONLY. | |||
if [[ $SCHEMA_UPDATE_ONLY == "true" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on asking users to set the $MODE
variable themselves vs having separate vars for each mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that that is future-proof - lets us easily add different modes in the future! Updated.
@@ -32,6 +32,16 @@ if [[ $OUTPUT_DIR == "" ]]; then | |||
exit 1 | |||
fi | |||
|
|||
if [[ $DATA_RUN_MODE != "" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Wanna call it DATA_RUN_MODE
or just RUN_MODE
? Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry forgot to explain, I called it DATA_RUN_MODE
since typically the same env.list file is used for both data and services containers, so if we ever want to add mode for services it could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - good point! Makes sense.
Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527
* update submodule for release (datacommonsorg#4681) * update NL goldens after mixer push (datacommonsorg#4680) * Adds logging for autocomplete responses. (datacommonsorg#4678) Logs the response count for autocompletion. Staging is not showing any responses. Would like to better understand where the breakdown is occurring. * Exit cdc_services/run.sh when any background process exits (datacommonsorg#4682) This makes startup errors in Mixer or NL servers more obvious. Bug: b/374820494 Reference: https://docs.docker.com/engine/containers/multi-service_container/#use-a-wrapper-script * update nodejs goldens (datacommonsorg#4685) goldens needed to be updated because of a bunch of recent data updates (data diffs can be seen here: datacommonsorg/mixer#1438, datacommonsorg/mixer#1439) * Update submodules (datacommonsorg#4688) * Pin transformers to 4.45.2 (datacommonsorg#4689) Also updates nl goldens * Support schema update mode for data docker (datacommonsorg#4686) Use the `DATA_RUN_MODE` environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode `schemaupdate` for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include. A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527 * Improves Typo recognition for autocomplete (datacommonsorg#4690) This PR modifies the scoring algorithm for place autocomplete to count a small score for non-exact matches, to account for one typo. With these changes, we will favor "San Diego" over "Dieppe" for the query "Sna Die". Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX Local with this change: https://screenshot.googleplex.com/9jHqKb2uHJLz37k Note that "Sne Die" will still go back to "Dieppe" because that's 2 typos, so San Diego is out even if it was returned by google Maps predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6 Typo check done as a bag of letters with at most off by one. We do this check on top of the Google Maps Predictions which already take into account typo correction. This part is just to choose the best prediction from google maps. Doing this as part of gaps identified in place autocomplete: https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0 --------- Co-authored-by: chejennifer <[email protected]> Co-authored-by: Gabriel Mechali <[email protected]> Co-authored-by: natalie <[email protected]>
Use the
DATA_RUN_MODE
environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The modeschemaupdate
for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include.A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527