Partitioning #92

pnorman · 2013-10-04T10:24:26Z

It would be extremely useful if it was possible to create partitioned tables on tag values to improve performance. The standard example of this is the polygon table and partitioning it on building IS NULL. This would achieve gains greater than a partial index ON gist (way) WHERE building IS NULL, which is already 11%. See gravitystorm/openstreetmap-carto#207 (comment)

The text was updated successfully, but these errors were encountered:

lonvia · 2013-10-04T11:13:44Z

I'd be careful with partitioned tables. If the partitioning condition is not part of the where clause of the query, all partitions need to be queried in parallel which tends to be rather expensive.

pnorman · 2013-10-04T11:41:58Z

yes, for a table partitioned into tbl_1 and tbl_2 and a where condition unrelated to the partition, the query is equivalent to SELECT ... FROM tbl_1 WHERE ... UNION ALL SELECT ... FROM tbl_2 WHERE ...;. I've been advised that with the table sizes involved in OSM, a couple of partitions won't add significant overhead.

I also believe that if the distribution of other values within the two partitions varies significantly and those values are indexed and used in queries there can be benifits.

I might have a try at doing a test with manually partitioning an existing import to see what gains can be achieved.

apmon · 2013-10-17T03:18:36Z

I have a branch of osm2pgsql ( https://github.com/apmon/osm2pgsql/tree/partitioning ) for quite some time, which allows for partitioning of the osm2pgsql tables according to arbitrary where clauses. Once the release is out and the parallelisation functionality landed, I hope to look at this branch again and plan to merge it.

At the moment the partitionings are still defined in code and not user changeable, but I hope to change this and they are defined by a single data structure. e.g the following partitions the polygon table into buildings and non buildings and the line table into highways and non highways.

partitions [] = {
{ .name = "buildings", t_poly, "building is not null", "NEW.building is not null"},
{ .name = "nonbuildings", t_poly, "building is null", "NEW.building is null"},
{ .name = "highways", t_line, "highway is not null", "NEW.highway is not null"},
{ .name = "nonhighways", t_line, "highway is null", "NEW.highway is null"}
};

One can also use this to partition things into further sub-structures, as long as the combination of where clauses are complete and mutually exclusive.

pnorman · 2013-10-24T05:15:30Z

I'm showing significant speed gains from partitioning, greater than the threading branch.
With 24 processes on my test setup a europe import with 0.84.0 takes 15317s, with threading 7f282b3 13068s, with partitioning 3bbb617 10967s.

The gains are in processing relations going from 3717s base to 428s partitioning, indexes going from 5737s base to 3783s partitioning.

I'm showing a minor slowdown from 2422s to 2377s for pending ways, but that could be within error, I don't have a handle on the uncertainty for that number.

lonvia · 2013-10-24T08:03:09Z

The partitioning branch is based on an osm2pgsql version that is being hit by #30 which should explain most of the time difference.

pnorman · 2015-02-21T11:57:40Z

I'm wondering how relevant this is with the multi backend - most cases where you're considering partitioning you'd also be able to use multiple tables.

It's probably possible to implement partitioning with the multi backend.

pnorman mentioned this issue Oct 21, 2013

make planet_osm_roads table optional and/or configurable #10

Closed

pnorman mentioned this issue Oct 28, 2014

Clustering strategies #208

Closed

3 tasks

pnorman closed this as completed Feb 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitioning #92

Partitioning #92

pnorman commented Oct 4, 2013

lonvia commented Oct 4, 2013

pnorman commented Oct 4, 2013

apmon commented Oct 17, 2013

pnorman commented Oct 24, 2013

lonvia commented Oct 24, 2013

pnorman commented Feb 21, 2015

Partitioning #92

Partitioning #92

Comments

pnorman commented Oct 4, 2013

lonvia commented Oct 4, 2013

pnorman commented Oct 4, 2013

apmon commented Oct 17, 2013

pnorman commented Oct 24, 2013

lonvia commented Oct 24, 2013

pnorman commented Feb 21, 2015