Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support schema update mode for data docker #4686

Merged
merged 5 commits into from
Oct 25, 2024

Conversation

hqpho
Copy link
Contributor

@hqpho hqpho commented Oct 22, 2024

Use the DATA_RUN_MODE environment variable to decide what mode to pass to run_stats.sh and whether to build embeddings. The mode schemaupdate for run_stats.sh is added by datacommonsorg/import#344, which this PR updates the import submodule to include.

A docsite page will describe how to pass in this environment variable: datacommonsorg/docsite#527

@@ -64,18 +64,30 @@ fi
# cd into simple importer dir to run the importer.
cd $WORKSPACE_DIR/import/simple

# Pick import mode based on value of $SCHEMA_UPDATE_ONLY.
if [[ $SCHEMA_UPDATE_ONLY == "true" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on asking users to set the $MODE variable themselves vs having separate vars for each mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that that is future-proof - lets us easily add different modes in the future! Updated.

@hqpho hqpho requested a review from keyurva October 25, 2024 13:53
@@ -32,6 +32,16 @@ if [[ $OUTPUT_DIR == "" ]]; then
exit 1
fi

if [[ $DATA_RUN_MODE != "" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wanna call it DATA_RUN_MODE or just RUN_MODE? Your call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry forgot to explain, I called it DATA_RUN_MODE since typically the same env.list file is used for both data and services containers, so if we ever want to add mode for services it could be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - good point! Makes sense.

@hqpho hqpho merged commit 499b981 into datacommonsorg:master Oct 25, 2024
8 checks passed
@hqpho hqpho deleted the dataDocker branch October 25, 2024 19:27
hqpho added a commit to hqpho/dc-website that referenced this pull request Oct 29, 2024
Use the `DATA_RUN_MODE` environment variable to decide what mode to pass
to run_stats.sh and whether to build embeddings. The mode `schemaupdate`
for run_stats.sh is added by
datacommonsorg/import#344, which this PR updates
the import submodule to include.

A docsite page will describe how to pass in this environment variable:
datacommonsorg/docsite#527
hqpho added a commit to hqpho/dc-website that referenced this pull request Oct 29, 2024
* update submodule for release (datacommonsorg#4681)

* update NL goldens after mixer push (datacommonsorg#4680)

* Adds logging for autocomplete responses. (datacommonsorg#4678)

Logs the response count for autocompletion. Staging is not showing any
responses. Would like to better understand where the breakdown is
occurring.

* Exit cdc_services/run.sh when any background process exits (datacommonsorg#4682)

This makes startup errors in Mixer or NL servers more obvious.

Bug: b/374820494
Reference:
https://docs.docker.com/engine/containers/multi-service_container/#use-a-wrapper-script

* update nodejs goldens (datacommonsorg#4685)

goldens needed to be updated because of a bunch of recent data updates
(data diffs can be seen here:
datacommonsorg/mixer#1438,
datacommonsorg/mixer#1439)

* Update submodules (datacommonsorg#4688)

* Pin transformers to 4.45.2 (datacommonsorg#4689)

Also updates nl goldens

* Support schema update mode for data docker (datacommonsorg#4686)

Use the `DATA_RUN_MODE` environment variable to decide what mode to pass
to run_stats.sh and whether to build embeddings. The mode `schemaupdate`
for run_stats.sh is added by
datacommonsorg/import#344, which this PR updates
the import submodule to include.

A docsite page will describe how to pass in this environment variable:
datacommonsorg/docsite#527

* Improves Typo recognition for autocomplete (datacommonsorg#4690)

This PR modifies the scoring algorithm for place autocomplete to count a
small score for non-exact matches, to account for one typo.
With these changes, we will favor "San Diego" over "Dieppe" for the
query "Sna Die".
Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX
Local with this change:
https://screenshot.googleplex.com/9jHqKb2uHJLz37k

Note that "Sne Die" will still go back to "Dieppe" because that's 2
typos, so San Diego is out even if it was returned by google Maps
predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6

Typo check done as a bag of letters with at most off by one. We do this
check on top of the Google Maps Predictions which already take into
account typo correction. This part is just to choose the best prediction
from google maps.

Doing this as part of gaps identified in place autocomplete:
https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0

---------

Co-authored-by: chejennifer <[email protected]>
Co-authored-by: Gabriel Mechali <[email protected]>
Co-authored-by: natalie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants