-
-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partitioning #92
Comments
I'd be careful with partitioned tables. If the partitioning condition is not part of the where clause of the query, all partitions need to be queried in parallel which tends to be rather expensive. |
yes, for a table partitioned into tbl_1 and tbl_2 and a where condition unrelated to the partition, the query is equivalent to I also believe that if the distribution of other values within the two partitions varies significantly and those values are indexed and used in queries there can be benifits. I might have a try at doing a test with manually partitioning an existing import to see what gains can be achieved. |
I have a branch of osm2pgsql ( https://github.com/apmon/osm2pgsql/tree/partitioning ) for quite some time, which allows for partitioning of the osm2pgsql tables according to arbitrary where clauses. Once the release is out and the parallelisation functionality landed, I hope to look at this branch again and plan to merge it. At the moment the partitionings are still defined in code and not user changeable, but I hope to change this and they are defined by a single data structure. e.g the following partitions the polygon table into buildings and non buildings and the line table into highways and non highways. partitions [] = { One can also use this to partition things into further sub-structures, as long as the combination of where clauses are complete and mutually exclusive. |
I'm showing significant speed gains from partitioning, greater than the threading branch. The gains are in processing relations going from 3717s base to 428s partitioning, indexes going from 5737s base to 3783s partitioning. I'm showing a minor slowdown from 2422s to 2377s for pending ways, but that could be within error, I don't have a handle on the uncertainty for that number. |
The partitioning branch is based on an osm2pgsql version that is being hit by #30 which should explain most of the time difference. |
I'm wondering how relevant this is with the multi backend - most cases where you're considering partitioning you'd also be able to use multiple tables. It's probably possible to implement partitioning with the multi backend. |
It would be extremely useful if it was possible to create partitioned tables on tag values to improve performance. The standard example of this is the polygon table and partitioning it on
building IS NULL
. This would achieve gains greater than a partial indexON gist (way) WHERE building IS NULL
, which is already 11%. See gravitystorm/openstreetmap-carto#207 (comment)The text was updated successfully, but these errors were encountered: