Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering, normalisation and variants for postcodes #2757

Merged
merged 30 commits into from
Jun 24, 2022

Conversation

lonvia
Copy link
Member

@lonvia lonvia commented Jun 24, 2022

This adds a special sanitizer and tokenizer for postcodes. The sanitizer filters the postcodes from the OSM tags, so that only those can pass that conform to the official postcode format of the country. The tokenizer creates variants for postcodes that have optional spaces in them.

Per-country postcode formats can be configured in settings/country_settings.yaml. There is a new section in the customization documentation that explains the details.

The main effect of this change is that only postcodes in the proper format will end up in the location postcode table. Postcodes with typos and bad postcode entries will no longer propagate to other objects. Nominatim also has some special treatment of words recognized as postcodes in a search request. This will now work much better because the recognition as a postcode is based on 'whatever can be found in the location_postcode table'.

Still to do are handling of hierarchical postcodes (#1011) and recognition of unknown postcodes during search (#1452).

Fixes #927. Fixes #1207.

lonvia added 30 commits June 23, 2022 23:42
The postcodes will only be removed as a 'computed postcode' they
are still searchable for the given object.
Adds patterns for countries that have simple numeric-only postcodes.
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
If the country code is not part of the mandatory output, the
country code filter will do the correct handling.
Now includes all postcodes that have optional parts.
Optional groups are not implemented yet.
Also documents the changes to the SQL functions of the tokenizer.
Includes smaller code fixes found by the tests.
update_postcodes_from_db() needs to do the full postcode treatment
in order to derive the correct word table entries.
It can happen for bogus names and this will not get fixed anymore.
Hierarchical postcodes need a different treatment.
@lonvia lonvia merged commit 3bf3b89 into osm-search:master Jun 24, 2022
@lonvia lonvia deleted the filter-postcodes branch June 24, 2022 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check postcodes for format correctness Postcode search requires a space to return results
1 participant