support for comma delimited housenumber + street #29

missinglink · 2019-05-27T11:02:53Z

I've seen a few cases internationally where users insert a comma between every component of the address, I'm not sure if this is done manually or when joining cells in a spreadsheet.

This is actually great for most tokens because it helps us to avoid parsing ambiguities.
The issue is when used between the housenumber and the street

so the parser will fail for an address such as:

1, Foo St, Foo, Bar, 411027

but pass for one where the first comma is not present:

1 Foo St, Foo, Bar, 411027

The code responsible for this is the TokenDistanceFilter, which should be modified to ignore section boundaries when considering adjacency.

The text was updated successfully, but these errors were encountered:

missinglink · 2019-05-27T11:08:47Z

Off the top of my head there are two ways to accomplish this:

the prev and next graph nodes only apply within the same section, so we could consider changing this behaviour /or add a new graph relationship which linked to spans across sections (we would need to consider the impact of this and potential errors that might be caused by having an API like this)
record a 'token position', so that each token is assigned a number starting from 0 and incrementing one per token as we read from left-to-right. It would then be possible to write a query which find a token by position (although this type of query is not currently trivial to write as it would require iterating over all sections to locate a span in that way).

missinglink · 2019-05-27T11:11:07Z

One other approach would be to check for a prev relationship and if that doesn't exist then check if there is a previous span, if so, use the child:last node from that.

missinglink added a commit that referenced this issue May 27, 2019

feat(relax_token_distance_filter): simple fix for #29

19c1e0d

missinglink mentioned this issue May 27, 2019

relax_token_distance_filter #30

Merged

missinglink closed this as completed in #30 May 27, 2019

missinglink added a commit that referenced this issue May 27, 2019

feat(relax_token_distance_filter): simple fix for #29 (#30)

490e47d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for comma delimited housenumber + street #29

support for comma delimited housenumber + street #29

missinglink commented May 27, 2019 •

edited

Loading

missinglink commented May 27, 2019 •

edited

Loading

missinglink commented May 27, 2019

support for comma delimited housenumber + street #29

support for comma delimited housenumber + street #29

Comments

missinglink commented May 27, 2019 • edited Loading

missinglink commented May 27, 2019 • edited Loading

missinglink commented May 27, 2019

missinglink commented May 27, 2019 •

edited

Loading

missinglink commented May 27, 2019 •

edited

Loading