-
-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a limit to be set on expiry of line segments #2185
Conversation
Accidentally or deliberately moving a node a long way creates a long line segment which can expire large numbers of tiles causing disruption to the rendering process as osm2pgsql either runs out of memory or generates a huge list of tiles to expire. Limiting expiry of long line segments prevents that and also helps reduce disruption of the resulting rendered maps with long lines that are almost certainly incorrect.
The PR is clean and technically looks okay to me. But I am concerned with a few things:
I think we should clearly think about the two issues raised here, one is about protecting "the map", the other is about protecting "the system". osm2pgsql (potentially) running out of memory is bad and needs addressing. And if an excessive number of tiles in the expire list creates some situation where other parts of the system are likely going to fail, than that needs addressing, too. I'd rather see those issues solved in a different way though that's more directly tied to the problem, i.e. the excessive number of tiles, not to just one case which this PR addresses. And I am not sure whether osm2pgsql is the right place to solve system stability issues (or if there is even an issue), but if that is the case we should talk about that. To solve this problem we probably need some limits in the number of tiles that can show up in the expire list or so. But the goal of that would not be to protect "the map", but to make the system more robust. Protecting "the map" from changes in the data that we don't want to see rendered is a different issue we should not conflate with the first. |
Generally speaking, I'd be okay with accepting this as an operational quick-fix. The code is minimally invasive and in a part of osm2pgsql that's on the way out anyway. So we don't have to maintain that stuff forever. My main concern here is with joto's point 4. The change not only prevents expiry when the vandalism occurs but also when the vandalism is fixed. This may not be a much of a problem on the server with average load. However, especially on the osm.org servers with their very high load, this may lead to a lot of stale tiles where the vandalism remains visible because another change caused the tiles to be rerendered while the vandalism was active. Opinions? I wonder if it wouldn't be better to attack the problem a bit earlier and not allow to lines with very long segments to be written to the output tables. Sadly, this is only possible with the flex output right now, so we'd need the osm-carto port for flex first. But once done it can be flexibly handled in style, so that a sensible maximum length can be set depending on the type of object (allow more for administrative boundaries, less for highways). |
Administrative note: if accepted, this change would need backporting. We are not in a state right now to create a release from master. |
Maybe (and I would note that I'm suggesting this strctly as a quick hack) simply abort processing (aka exit osm2pgsql) when a new linestring would be generated that exceeds a certain length. As long as the input to osm2pgsql at some point covers both vandalism and revert, osm2pgsql would then continue as normal. |
@simonpoole Something like this would be easily possible when the flex output is used. It allows you to get the linestring length in Lua and then you can do any check you want and simply call Of course this would mean switching OSM Carto to flex, but fortunately we are getting near that. There is an issue and PR. |
A new release of osm2pgsql is imminent, so we need to decide what happens to the this PR. As said above, I would be okay to merge if this is essential to the operations of the OSMF servers. However, I do share some of @joto's concerns that this isn't really addressing the root cause and thus only working around one very specific form of vandalism, which may never occur again in this form. @tomhughes what is the status here? I understand that we have switched the servers to expire on metatile level. This should address the memory issue (#2190) and the issue of being stuck for hours writing out the expired tiles. Would that be sufficient for now? The other possible advantage I can see is that the render servers wouldn't end up with basically all tiles marked as outdated. Was that a concern? Was the additional load on rerendering causing issues? We do know that the chance will not prevent vandalised tiles from appearing. It just will be on less tiles. And after the change, it will be necessary to manually expire all affected tiles. Probably not much of a change in the case of global vandalism. However I can see where this might become a major pain when we run into the case where a user accidentally draws a single line over half the globe and then globally all tiles rendered in a certain timespan need to be invalidated because finding just the ones that touch the single line is difficult. Any other pros and cons? |
We are now expiring at the metatile level, yes. The idea of the length limit was really twofold, to reduce the amount of time we spent re-rendering tiles and to reduce the visibility of the damage. I was kind of thinking that we wouldn't then need to do a global dirty after the revert but I guess that we probably would as you say. I don't think we've really had an incident that allows to test the effectiveness of either change though. Maybe we should drop this for now and see how things go with the other change - at least then the revert should correctly dirty the tiles that are changed avoiding the need for a global dirty. |
Okay, sounds good to me. I'll close this for now and we can revisit if necessary. |
Accidentally or deliberately moving a node a long way creates a long line segment which can expire large numbers of tiles causing disruption to the rendering process as osm2pgsql either runs out of memory or generates a huge list of tiles to expire.
Limiting expiry of long line segments prevents that and also helps reduce disruption of the resulting rendered maps with long lines that are almost certainly incorrect.