Add lang meta data in toponym for housenumber penalty #72
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
We want to ensure that some queries in a specific language is parsed correctly. One example is
1111 MD 760, Lusby, MD, USA
where MD means bothMaryland
(State) andmead
(suffix street type).In this particular configuration, both
1111
and760
can be house numbers...That's why the first result is
But we know that in English countries, the house number is never in second position... That's why I decided to use this particularity to discriminate (or add a penalty) for unusual addresses based on their language.
How it works ?
With libpostal we can know which word is used in which language. We can spread this information from libpostal classifications to their parents (Street Classification in our case).
When this is done, we can check all the solutions and when we see an English Street Classification before a House Number Classification, we can apply a penalty to that solution. The default penalty is 5%.
Well... Then ?
For this PR I added the lang meta only for Toponyms (in order to fix
MD
). I did the change for Prefix and Suffix in another branch, but IDK if it's relevant.... If we need this feature for other classifications, I will merge the other branch, for now "keep it simple".fixes: #60