Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching of address information for Nominatim export #850

Merged
merged 17 commits into from
Nov 15, 2024

Conversation

lonvia
Copy link
Collaborator

@lonvia lonvia commented Nov 15, 2024

This completely reworks how the import from Nominatim works: places are now imported by country. For each country name information for all places that can function as an address part are loaded in memory. After that places for the country are read in a single query and the address information added from the cache. This saves us thousands of SQL queries for lookup of addresses. Import time for the planet goes down from around 20h to 10h.

Exporting by country requires an additional per-country index over placex. That means that you need write access to the database. Alternative pre-create the index as CREATE INDEX ON placex(country_code).

As a side-effect of the country split, it is now also possible to the database reading in parallel threads. Use the new -j option for that. The usefulness of this option is somewhat limited. Writing to the ES/OS database is still single-threaded and thus a limiting factor. Also, mixing data from different countries while writing results in quite a bit of bloat (180GB vs 200GB for a planet).

@lonvia lonvia merged commit 42e0c39 into komoot:master Nov 15, 2024
4 checks passed
@lonvia lonvia deleted the export-by-country branch November 15, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant