Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioning #92

Closed
pnorman opened this issue Oct 4, 2013 · 6 comments
Closed

Partitioning #92

pnorman opened this issue Oct 4, 2013 · 6 comments

Comments

@pnorman
Copy link
Collaborator

pnorman commented Oct 4, 2013

It would be extremely useful if it was possible to create partitioned tables on tag values to improve performance. The standard example of this is the polygon table and partitioning it on building IS NULL. This would achieve gains greater than a partial index ON gist (way) WHERE building IS NULL, which is already 11%. See gravitystorm/openstreetmap-carto#207 (comment)

@lonvia
Copy link
Collaborator

lonvia commented Oct 4, 2013

I'd be careful with partitioned tables. If the partitioning condition is not part of the where clause of the query, all partitions need to be queried in parallel which tends to be rather expensive.

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 4, 2013

yes, for a table partitioned into tbl_1 and tbl_2 and a where condition unrelated to the partition, the query is equivalent to SELECT ... FROM tbl_1 WHERE ... UNION ALL SELECT ... FROM tbl_2 WHERE ...;. I've been advised that with the table sizes involved in OSM, a couple of partitions won't add significant overhead.

I also believe that if the distribution of other values within the two partitions varies significantly and those values are indexed and used in queries there can be benifits.

I might have a try at doing a test with manually partitioning an existing import to see what gains can be achieved.

@apmon
Copy link
Contributor

apmon commented Oct 17, 2013

I have a branch of osm2pgsql ( https://github.com/apmon/osm2pgsql/tree/partitioning ) for quite some time, which allows for partitioning of the osm2pgsql tables according to arbitrary where clauses. Once the release is out and the parallelisation functionality landed, I hope to look at this branch again and plan to merge it.

At the moment the partitionings are still defined in code and not user changeable, but I hope to change this and they are defined by a single data structure. e.g the following partitions the polygon table into buildings and non buildings and the line table into highways and non highways.

partitions [] = {
{ .name = "buildings", t_poly, "building is not null", "NEW.building is not null"},
{ .name = "nonbuildings", t_poly, "building is null", "NEW.building is null"},
{ .name = "highways", t_line, "highway is not null", "NEW.highway is not null"},
{ .name = "nonhighways", t_line, "highway is null", "NEW.highway is null"}
};

One can also use this to partition things into further sub-structures, as long as the combination of where clauses are complete and mutually exclusive.

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 24, 2013

I'm showing significant speed gains from partitioning, greater than the threading branch.
With 24 processes on my test setup a europe import with 0.84.0 takes 15317s, with threading 7f282b3 13068s, with partitioning 3bbb617 10967s.

The gains are in processing relations going from 3717s base to 428s partitioning, indexes going from 5737s base to 3783s partitioning.

I'm showing a minor slowdown from 2422s to 2377s for pending ways, but that could be within error, I don't have a handle on the uncertainty for that number.

@lonvia
Copy link
Collaborator

lonvia commented Oct 24, 2013

The partitioning branch is based on an osm2pgsql version that is being hit by #30 which should explain most of the time difference.

@pnorman pnorman mentioned this issue Oct 28, 2014
3 tasks
@pnorman
Copy link
Collaborator Author

pnorman commented Feb 21, 2015

I'm wondering how relevant this is with the multi backend - most cases where you're considering partitioning you'd also be able to use multiple tables.

It's probably possible to implement partitioning with the multi backend.

@pnorman pnorman closed this as completed Feb 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants