Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move external shapefiles to tables in the DB #4092

Merged
merged 4 commits into from
May 9, 2020

Conversation

pnorman
Copy link
Collaborator

@pnorman pnorman commented Mar 26, 2020

This adds a script that loads files into the DB based on a
YAML file listing the data sources. The script can be run
while rendering is going on, as it swaps old tables with
new ones in a transaction.

Loading is done by using ogr2ogr to load into a temporary
schema, clustering, then the swap in transaction. The status
of the tables is tracked in the external_data table, which
lists the last modified date of each table. This allows the
loading script to use conditional GETs and only download and
update for sources which have changed.

Fixes #4089

Rendering tested with a planet DB in Antarctica and a local coastline.

If I were rewriting the python script I'd probably do it differently now that I'm better, but it's existing code that works. I'm happy with it's functionality, it's just internal refactoring I would do.

@pnorman pnorman marked this pull request as ready for review March 26, 2020 22:49
@pnorman
Copy link
Collaborator Author

pnorman commented Mar 27, 2020

I'm going to run some benchmarking on this

@pnorman pnorman self-assigned this Mar 27, 2020
@pnorman
Copy link
Collaborator Author

pnorman commented Mar 27, 2020

Averaged over zoom 0 to zoom 12, shapefiles in the DB takes 514 minutes, shapefiles on disk take 521 minutes. This is a 1% difference, but I'm not sure it's above my margins of error. In any case, performance isn't a concern with this PR.

@pnorman pnorman removed their assignment Mar 28, 2020
@jeisenbe
Copy link
Collaborator

Averaged over zoom 0 to zoom 12, shapefiles in the DB takes 514 minutes, shapefiles on disk take 521 minutes.

Is that the time to render all metatiles at these 13 zoom levels, or just to render the oceans and icesheets?

@pnorman
Copy link
Collaborator Author

pnorman commented Mar 28, 2020

Is that the time to render all metatiles at these 13 zoom levels, or just to render the oceans and icesheets?

Everything.

@jeisenbe
Copy link
Collaborator

For Docker, will ogr2ogr already be included in the container? No changes are needed other than the switch to scripts/get-external-data.py?

@pnorman
Copy link
Collaborator Author

pnorman commented Mar 28, 2020

For Docker, will ogr2ogr already be included in the container? No changes are needed other than the switch to scripts/get-external-data.py?

I added gdal-bin to the dockerfile, that should be enough. If someone wants to test it, that would be good.

@jeisenbe
Copy link
Collaborator

I believe having the coastlines as ways in the database would make it easier to solve #712 (hide admin boundary lines on the coastline)

@jeisenbe
Copy link
Collaborator

So I will need to remove the docker image and then rebuild the image to test this? Unfortunately that takes up a large amount of bandwidth, so I won't be able to do it until our airport is reopened again (hopefully sometime in mid-April or at least by May?).

@Adamant36 or @sommerluk - would you be able to test this PR with Docker?

@Adamant36
Copy link
Contributor

@Adamant36 or @sommerluk - would you be able to test this PR with Docker?

Sure. I'll give it a try over the weekend.

@sommerluk
Copy link
Collaborator

No, I can't test it. Currently, I've here often a bandwidth of 0,0 Mbit/s, and in the best case sometimes 0,9 Mbit/s. Sorry.

@pnorman
Copy link
Collaborator Author

pnorman commented Mar 29, 2020

I'd like to target this for after 5.1

@jeisenbe
Copy link
Collaborator

jeisenbe commented Apr 3, 2020

@Adamant36 - would you have a chance to test this over the weekend?

@Adamant36
Copy link
Contributor

Unfortunately not. My VM setup broke and I'm probably going wait until 20.04 comes out to setup it up again so I dont have to deal with it twice. I can test it after that though.

@jeisenbe
Copy link
Collaborator

jeisenbe commented Apr 4, 2020

Note that this would give up more options for fixing problems like #621, so it is beneficial for more flexibility in rendering options, in addition to the improved performance.

@pnorman pnorman requested a review from imagico April 26, 2020 07:11
@jeisenbe
Copy link
Collaborator

I'm back in the USA, so I can test this once the merge conflict is resolved.

pnorman added 2 commits April 26, 2020 13:41
This adds a script that loads files into the DB based on a
YAML file listing the data sources. The script can be run
while rendering is going on, as it swaps old tables with
new ones in a transaction.

Loading is done by using ogr2ogr to load into a temporary
schema, clustering, then the swap in transaction. The status
of the tables is tracked in the `external_data` table, which
lists the last modified date of each table. This allows the
loading script to use conditional GETs and only download and
update for sources which have changed.
@pnorman
Copy link
Collaborator Author

pnorman commented Apr 27, 2020

Rebased.

@pnorman
Copy link
Collaborator Author

pnorman commented Apr 28, 2020

I've added the new dependencies to the import dockerfile and readme

@jeisenbe
Copy link
Collaborator

Great, this now works as expected. I was a little surprised that the shapefiles are no longer downloaded as separate files at all.

Will updating the shapefiles require manually running scripts/get-external-data.py or is this going to happen automatically?


```
scripts/get-shapefiles.py
scripts/get-external-data.py
```

This script downloads necessary files, generates and populates the *data* directory with all needed shapefiles, including indexing them through *shapeindex*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line does not seem to be correct now

@pnorman
Copy link
Collaborator Author

pnorman commented Apr 28, 2020

Will updating the shapefiles require manually running scripts/get-external-data.py or is this going to happen automatically?

Typically you'd automate this with a cron job

@jeisenbe
Copy link
Collaborator

Ideally we should document how to do this, though that doesn't need to be included in this PR.

I'm tested this PR in docker, and would be ready to approve it, but I see a review by @imagico is pending.

@pnorman
Copy link
Collaborator Author

pnorman commented Apr 30, 2020

I'm tested this PR in docker, and would be ready to approve it, but I see a review by @imagico is pending.

I had requested a review because I hadn't gotten any other response.

I'd go ahead and merge this now, but I want to do it at the start of a release cycle

Copy link
Collaborator

@imagico imagico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, it seems to work fine.

What is a bit annoying is that with the coastlines in the database you have to duplicate the coastline data if you have several test databases (like for real data and abstract tests) and switch between them via localconfig.

@pnorman pnorman merged commit 025d9ce into gravitystorm:master May 9, 2020
@LorenzoStucchi
Copy link

Hi, I followed all the instructions for Docker but I still have issues reported here and I don't have a table called external_data on my gis database.

@Adamant36
Copy link
Contributor

Adamant36 commented Jun 11, 2020

The shape files never loaded for me when I did a clean install and ran Kosmtik. It just errored out after the map loaded and said that the shapefiles where missing.

@LorenzoStucchi
Copy link

So which could be the error?

Here the log file.

@jeisenbe
Copy link
Collaborator

Here's the relevant part of the log:

Trace
    at ProjectServer.raise (/usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:261:13)
    at /usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:75:30
    at /usr/lib/node_modules/kosmtik/node_modules/generic-pool/lib/generic-pool.js:283:11
    at loaded (/usr/lib/node_modules/kosmtik/node_modules/mapnik-pool/index.js:23:37)
Postgis Plugin: ERROR:  relation "icesheet_polygons" does not exist
LINE 1: SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "...
                                           ^
in executeQuery Full sql was: 'SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "way" IS NOT NULL LIMIT 1;'
  encountered during parsing of layer 'icesheet-poly' in Layer
[httpserver] /openstreetmap-carto/tile/4/7/7.png?t=1591820781712 500
Trace
    at ProjectServer.raise (/usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:261:13)
    at /usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:75:30
    at /usr/lib/node_modules/kosmtik/node_modules/generic-pool/lib/generic-pool.js:283:11
    at loaded (/usr/lib/node_modules/kosmtik/node_modules/mapnik-pool/index.js:23:37)
Postgis Plugin: ERROR:  relation "icesheet_polygons" does not exist
LINE 1: SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "...
                                           ^
in executeQuery Full sql was: 'SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "way" IS NOT NULL LIMIT 1;'
  encountered during parsing of layer 'icesheet-poly' in Layer
[httpserver] /openstreetmap-carto/tile/4/8/7.png?t=1591820781712 500
Postgis Plugin: ERROR:  relation "icesheet_polygons" does not exist
LINE 1: SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "...
                                           ^
in executeQuery Full sql was: 'SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "way" IS NOT NULL LIMIT 1;'
  encountered during parsing of layer 'icesheet-poly' in Layer
Trace
    at ProjectServer.raise (/usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:261:13)
    at /usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:75:30
    at /usr/lib/node_modules/kosmtik/node_modules/generic-pool/lib/generic-pool.js:283:11
    at loaded (/usr/lib/node_modules/kosmtik/node_modules/mapnik-pool/index.js:23:37)


2020-06-10 20:26:10.916 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-06-10 20:26:10.993 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2020-06-10 20:26:11.423 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-06-10 20:26:12.042 UTC [24] LOG:  database system was shut down at 2020-06-10 20:25:50 UTC
2020-06-10 20:26:12.311 UTC [1] LOG:  database system is ready to accept connections
2020-06-10 20:28:46.061 UTC [51] ERROR:  relation "icesheet_polygons" does not exist at character 36
2020-06-10 20:28:46.061 UTC [51] STATEMENT:  SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "way" IS NOT NULL LIMIT 1;

@jeisenbe
Copy link
Collaborator

@LorenzoStucchi was this a new installation, or an update to a previous version which was working before?

@jeisenbe
Copy link
Collaborator

@Adamant36 re: "The shape files never loaded for me when I did a clean install and ran Kosmtik":

When did this happen? Was it with the latest release?

@jeisenbe
Copy link
Collaborator

jeisenbe commented Jun 11, 2020

Sorry, was it the latest commit in this PR, 025d9ce or on the master branch, 5724017
?

@LorenzoStucchi
Copy link

@jeisenbe the last update is here.

@LorenzoStucchi was this a new installation, or an update to a previous version which was working before?

It is a new installation on Docker on Windows. But I realise that I have to run twice the command docker-compose up import because the first time I have this error:

Creating network "openstreetmap-carto_default" with the default driver
Creating openstreetmap-carto_db_1 ... done                                                                              Creating openstreetmap-carto_import_1 ... done                                                                          Attaching to openstreetmap-carto_import_1
import_1   | Waiting for PostgreSQL to be running
import_1   | Timeout while waiting for PostgreSQL to be running
import_1   | psql: could not translate host name "db" to address: Name or service not known
import_1   | createdb: could not connect to database template1: could not translate host name "db" to address: Name or service not known
import_1   | osm2pgsql version 1.2.1 (64 bit id space)
import_1   |
import_1   | Allocating memory for dense node cache
import_1   | Allocating dense node cache in one big chunk
import_1   | Allocating memory for sparse node cache
import_1   | Sharing dense sparse
import_1   | Node-cache: cache=512MB, maxblocks=8192*65536, allocation method=11
import_1   | Mid: pgsql, cache=512
import_1   | Connection to database failed: could not translate host name "db" to address: Name or service not known
import_1   |
import_1   | Error occurred, cleaning up
import_1   | DB writer thread failed due to ERROR: Connection to database failed: could not translate host name "db" to address: Name or service not known
import_1   |
import_1   |
import_1   | /usr/bin/env: 'python3\r': No such file or directory
openstreetmap-carto_import_1 exited with code 127

@jeisenbe
Copy link
Collaborator

@LorenzoStucchi did you check out the latest commit of the master branch on this repository, or did you check out the shapefiles_in_db branch in this PR?

@LorenzoStucchi
Copy link

@LorenzoStucchi did you check out the latest commit of the master branch on this repository, or did you check out the shapefiles_in_db branch in this PR?

@jeisenbe I used the master branch with git clone

@jeisenbe
Copy link
Collaborator

OK, I've updated to the latest version of Docker , and now I am seeing the same errors as reported by @LorenzoStucchi and @Adamant36:

kosmtik_1 | Trace kosmtik_1 | at ProjectServer.raise (/usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:261:13) kosmtik_1 | at /usr/lib/node_modules/kosmtik/src/back/ProjectServer.js:75:30 kosmtik_1 | at /usr/lib/node_modules/kosmtik/node_modules/generic-pool/lib/generic-pool.js:283:11 kosmtik_1 | at loaded (/usr/lib/node_modules/kosmtik/node_modules/mapnik-pool/index.js:23:37) kosmtik_1 | Postgis Plugin: ERROR: relation "icesheet_polygons" does not exist kosmtik_1 | LINE 1: SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "... kosmtik_1 | ^ kosmtik_1 | in executeQuery Full sql was: 'SELECT ST_SRID("way") AS srid FROM icesheet_polygons WHERE "way" IS NOT NULL LIMIT 1;' kosmtik_1 | encountered during parsing of layer 'icesheet-poly' in Layer kosmtik_1 | [httpserver] /openstreetmap-carto/tile/4/10/9.png?t=1591895615864 500 kosmtik_1 | [httpserver] /openstreetmap-carto/poll/ 200

The import process was otherwise fine: if I switch back to the latest release (v5.2.0) there is no problem, even without re-importing the data.

@jeisenbe
Copy link
Collaborator

However, when on the last commit in this PR or on the latest commit on the master branch, I now notice this error after the import process completes:

/usr/bin/env: 'python3': No such file or directory

This is noted after the data is imported:

Osm2pgsql took 2s overall
node cache: stored: 23660(100.00%), storage efficiency: 42.62% (dense blocks: 1, sparse nodes: 23659), hit rate: 100.00%
/usr/bin/env: 'python3': No such file or directory

@jeisenbe
Copy link
Collaborator

jeisenbe commented Jun 11, 2020

OK, I killed and removed all the docker processes and images, and tried starting again from the last commit in this PR. This worked fine.

To do this, run docker-compose kill, then run docker image ls and find the id numbers of all docker images.
Use docker image rm <id number of image> for to remove each image.

Then you can use docker-compose up import to rebuild the whole thing: this will involve downloading everything again and will take quite a lot of time, especially the step INFO:root:Checking table water_polygons.

You might need to enter "y" at some point to say that you want to recreate the container.

During the import process, the shapefiles should be downloaded and imported, if this has not been done already.

Then run docker-compose up kosmtik as usual and wait for this to be rebuilt. You again might be prompted to type "y" to rebuild to docker image.

@jeisenbe
Copy link
Collaborator

I have also tried this with the latest commit on the master branch, and it works on my setup.

@Adamant36 and @LorenzoStucchi could you try this and see if it works?

@LorenzoStucchi
Copy link

I tried but not succeded, using the 5.2version I have this issue at the end of docker-compose up import, and no request of confirmation.

Attaching to openstreetmap-carto_import_1
import_1   | Waiting for PostgreSQL to be running
import_1   | Timeout while waiting for PostgreSQL to be running
import_1   | psql: could not translate host name "db" to address: Name or service not known
import_1   | createdb: could not connect to database template1: could not translate host name "db" to address: Name or service not known
import_1   | osm2pgsql version 1.2.1 (64 bit id space)
import_1   |
import_1   | Osm2pgsql failed due to ERROR: Usage error. For further information see:
import_1   |    osm2pgsql -h|--help
import_1   |
openstreetmap-carto_import_1 exited with code 1

@jeisenbe
Copy link
Collaborator

You mean that you checked out the tag v5.2.0 in git prior to attempting this?

If you are switching between v5.2.0 and the current master branch, I'm afraid you will need to take the steps outlined above in #4092 (comment) - kill docker-compose, remove the docker relevant image(s), and rebuild.

@LorenzoStucchi
Copy link

I cancelled everything with the procedure explained and using v5.2.0 I have the error previously explained

@Adamant36
Copy link
Contributor

Reloading everything worked for me. Hopefully there will be a better solution at some point though.

@LorenzoStucchi
Copy link

I retry with v.5.2.0 and after the total reset of docker, it works. It is possible to do it on the Docker dashboard under the Troubleshoot panel, there is the command reset factory defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move shapefiles into database
7 participants