-
-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Supports for hyphen as alternative spans (#56)
## Background Sometime, parsing fails when words are not well split. Hyphens' main purpose is to glue words together. That meas, when an hyphen is used, we can process it like a simple space in order to have two separate words. Only processing hyphens like spaces can unfortunately not be the final solution because the hyphen is also useful in some other cases. That's why I suggest to take advantage of our graphs and add some alternative ways to complete a phrase without hyphens. ## How it works ? When we split all sections, we do a first compute on spaces only (like before) and then a second compute on hyphen. Example for `10 Boulevard Saint-Germain Paris`, when we split this section, we get this: `10`, `Boulevard`, `Saint-Germain`, `Paris`, here is the graph: ![step1](https://user-images.githubusercontent.com/5153882/63770799-3472b500-c8d6-11e9-8ffd-953af4b0f59e.png) With the hyphen step, we will have `10`, `Boulevard`, `Saint-Germain`, `Paris`, `Saint`, `Germain` ![step2](https://user-images.githubusercontent.com/5153882/63770925-83204f00-c8d6-11e9-94c6-357f8aa48b06.png) Thanks to this, we will be able to parse phrases such as : - `10 Boulevard Saint-Germain Paris`: which is `housenumber` + `street` (first solution without this PR 👎) - `10 Boulevard Saint-Germains Paris`: which is `housenumber` + `street` + `locality` (first solution with this PR 👍)
- Loading branch information
Showing
12 changed files
with
242 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.