Add caching of address information for Nominatim export #850

lonvia · 2024-11-15T13:24:27Z

This completely reworks how the import from Nominatim works: places are now imported by country. For each country name information for all places that can function as an address part are loaded in memory. After that places for the country are read in a single query and the address information added from the cache. This saves us thousands of SQL queries for lookup of addresses. Import time for the planet goes down from around 20h to 10h.

Exporting by country requires an additional per-country index over placex. That means that you need write access to the database. Alternative pre-create the index as CREATE INDEX ON placex(country_code).

As a side-effect of the country split, it is now also possible to the database reading in parallel threads. Use the new -j option for that. The usefulness of this option is somewhat limited. Writing to the ES/OS database is still single-threaded and thus a limiting factor. Also, mixing data from different countries while writing results in quite a bit of bloat (180GB vs 200GB for a planet).

This way, the thread is only created once.

The actual importer is now named NominatimImporter, while NominatimConnector becomes the class for the common code shared between the two classes.

With autocommit, server-side cursors don't work, slowing down the large queries. Without autocommit without a transaction manager, all queries will be rolled back. So protect at least the writing queries.

lonvia added 17 commits November 12, 2024 10:58

automatically create a country index before import

0e43443

run import for each country separately

6d6ae5b

move handling of importer thread to App

f5c0d25

This way, the thread is only created once.

get rid of readEntireDatabase function

1887afd

enable multi-threading for reading from postgresql

fc3f5a3

make PlaceRowMapper a full class

ccb9245

initialise NominatimResult through static functions

a24c2c3

make OsmlineRowMapper a full class

30449fe

decouple NominatimConnector and NominatimUpdater class

8f692a2

The actual importer is now named NominatimImporter, while NominatimConnector becomes the class for the common code shared between the two classes.

clean code in readCountry

e941c98

precompute address rows on import

b2b6622

avoid duplication of completePlace function

d871945

add transaction manager and go back to manual commit

6b4dd50

With autocommit, server-side cursors don't work, slowing down the large queries. Without autocommit without a transaction manager, all queries will be rolled back. So protect at least the writing queries.

place_ids need a 64-bit value

6f32d32

do not mutate names from the address cache

b50c3e4

clean style issues

b32c234

fix loading addresses for non-country places

23c18ef

lonvia merged commit 42e0c39 into komoot:master Nov 15, 2024
4 checks passed

lonvia deleted the export-by-country branch November 15, 2024 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caching of address information for Nominatim export #850

Add caching of address information for Nominatim export #850

lonvia commented Nov 15, 2024

Add caching of address information for Nominatim export #850

Add caching of address information for Nominatim export #850

Conversation

lonvia commented Nov 15, 2024