Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lang meta data in toponym for housenumber penalty #72

Merged
merged 1 commit into from
Nov 24, 2019

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented Oct 8, 2019

Background

We want to ensure that some queries in a specific language is parsed correctly. One example is 1111 MD 760, Lusby, MD, USA where MD means both Maryland (State) and mead (suffix street type).
In this particular configuration, both 1111 and 760 can be house numbers...

That's why the first result is

(0.94) ➜ [ { street: '1111 MD' },
  { housenumber: '760' },
  { locality: 'Lusby' },
  { region: 'MD' },
  { country: 'USA' } ]

But we know that in English countries, the house number is never in second position... That's why I decided to use this particularity to discriminate (or add a penalty) for unusual addresses based on their language.

How it works ?

With libpostal we can know which word is used in which language. We can spread this information from libpostal classifications to their parents (Street Classification in our case).

When this is done, we can check all the solutions and when we see an English Street Classification before a House Number Classification, we can apply a penalty to that solution. The default penalty is 5%.

Well... Then ?

For this PR I added the lang meta only for Toponyms (in order to fix MD). I did the change for Prefix and Suffix in another branch, but IDK if it's relevant.... If we need this feature for other classifications, I will merge the other branch, for now "keep it simple".

fixes: #60

Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

I updated https://github.com/pelias/api/blob/master/middleware/localNamingConventions.js recently to include an updated list of countries which have their house numbers second.

I think in general it's a good idea, one concern would be that users apply their local conventions when travelling overseas, so for instance a German user might write "Main St 100" when travelling to the USA.

I think the penalty model you've applied is perfect for this as it allows both styles of specifying the address while penalizing foreign conventions.

@Joxit
Copy link
Member Author

Joxit commented Nov 19, 2019

Yes, I used https://github.com/pelias/api/blob/master/middleware/localNamingConventions.js for this PR. But in the API, we use country ISO codes and here we need lang ISO codes.

I tried to turn countries to languages, that's why the penalty for French in numberFirstLangs is reduced, because in Switzerland they speak/write French but flip their house numbers.

👍

@Joxit Joxit merged commit f11875e into master Nov 24, 2019
@Joxit Joxit deleted the joxit/housenumber-penalty-by-lang branch November 24, 2019 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

US highway address
2 participants