Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves Typo recognition for autocomplete #4690

Merged
merged 7 commits into from
Oct 28, 2024

Conversation

gmechali
Copy link
Contributor

@gmechali gmechali commented Oct 25, 2024

This PR modifies the scoring algorithm for place autocomplete to count a small score for non-exact matches, to account for one typo.
With these changes, we will favor "San Diego" over "Dieppe" for the query "Sna Die".
Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX
Local with this change: https://screenshot.googleplex.com/9jHqKb2uHJLz37k

Note that "Sne Die" will still go back to "Dieppe" because that's 2 typos, so San Diego is out even if it was returned by google Maps predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6

Typo check done as a bag of letters with at most off by one. We do this check on top of the Google Maps Predictions which already take into account typo correction. This part is just to choose the best prediction from google maps.

Doing this as part of gaps identified in place autocomplete: https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0

@gmechali gmechali requested a review from dwnoble October 25, 2024 18:06
@gmechali gmechali marked this pull request as ready for review October 25, 2024 18:06
@gmechali gmechali merged commit 204ee26 into datacommonsorg:master Oct 28, 2024
9 checks passed
hqpho pushed a commit to hqpho/dc-website that referenced this pull request Oct 29, 2024
This PR modifies the scoring algorithm for place autocomplete to count a
small score for non-exact matches, to account for one typo.
With these changes, we will favor "San Diego" over "Dieppe" for the
query "Sna Die".
Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX
Local with this change:
https://screenshot.googleplex.com/9jHqKb2uHJLz37k

Note that "Sne Die" will still go back to "Dieppe" because that's 2
typos, so San Diego is out even if it was returned by google Maps
predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6

Typo check done as a bag of letters with at most off by one. We do this
check on top of the Google Maps Predictions which already take into
account typo correction. This part is just to choose the best prediction
from google maps.

Doing this as part of gaps identified in place autocomplete:
https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0
hqpho added a commit to hqpho/dc-website that referenced this pull request Oct 29, 2024
* update submodule for release (datacommonsorg#4681)

* update NL goldens after mixer push (datacommonsorg#4680)

* Adds logging for autocomplete responses. (datacommonsorg#4678)

Logs the response count for autocompletion. Staging is not showing any
responses. Would like to better understand where the breakdown is
occurring.

* Exit cdc_services/run.sh when any background process exits (datacommonsorg#4682)

This makes startup errors in Mixer or NL servers more obvious.

Bug: b/374820494
Reference:
https://docs.docker.com/engine/containers/multi-service_container/#use-a-wrapper-script

* update nodejs goldens (datacommonsorg#4685)

goldens needed to be updated because of a bunch of recent data updates
(data diffs can be seen here:
datacommonsorg/mixer#1438,
datacommonsorg/mixer#1439)

* Update submodules (datacommonsorg#4688)

* Pin transformers to 4.45.2 (datacommonsorg#4689)

Also updates nl goldens

* Support schema update mode for data docker (datacommonsorg#4686)

Use the `DATA_RUN_MODE` environment variable to decide what mode to pass
to run_stats.sh and whether to build embeddings. The mode `schemaupdate`
for run_stats.sh is added by
datacommonsorg/import#344, which this PR updates
the import submodule to include.

A docsite page will describe how to pass in this environment variable:
datacommonsorg/docsite#527

* Improves Typo recognition for autocomplete (datacommonsorg#4690)

This PR modifies the scoring algorithm for place autocomplete to count a
small score for non-exact matches, to account for one typo.
With these changes, we will favor "San Diego" over "Dieppe" for the
query "Sna Die".
Prod: https://screenshot.googleplex.com/Bsx2BbyLZArbQuX
Local with this change:
https://screenshot.googleplex.com/9jHqKb2uHJLz37k

Note that "Sne Die" will still go back to "Dieppe" because that's 2
typos, so San Diego is out even if it was returned by google Maps
predictions: https://screenshot.googleplex.com/9LViJoVFni3Lui6

Typo check done as a bag of letters with at most off by one. We do this
check on top of the Google Maps Predictions which already take into
account typo correction. This part is just to choose the best prediction
from google maps.

Doing this as part of gaps identified in place autocomplete:
https://docs.google.com/document/d/15RVckX9ck5eyyhBHW8Nb9lmxPBDPMIeLbax14HbN-GI/edit?tab=t.0

---------

Co-authored-by: chejennifer <[email protected]>
Co-authored-by: Gabriel Mechali <[email protected]>
Co-authored-by: natalie <[email protected]>
@gmechali gmechali deleted the improve_typo branch October 29, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants