Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance testing framework #1287

Open
gravitystorm opened this issue Feb 2, 2015 · 19 comments
Open

Performance testing framework #1287

gravitystorm opened this issue Feb 2, 2015 · 19 comments

Comments

@gravitystorm
Copy link
Owner

Cartography is more important than performance, but performance is much harder to test! We've been heavily reliant on @pnorman to run performance tests on his setup but I'd like to open that up so that more people can become involved and that we can automate it as much as possible.

There's a whole load of hardware considerations that makes cross-server test results impossible to compare, but we should concentrate on being able to compare two different commits and give relative answers. It would be useful to have:

  • A test that can be run locally and takes around 10 minutes to run. This lets people sanity-check any complex SQL / attachments / refactorings
  • A test that can be run on EC2 and take less than an hour and costs less than a dollar to run. This could be used as a smoke test for pull requests.

The tests should be roughly indicative of real world rendering patterns e.g. city-biased and mid-zoom biased.

@pnorman
Copy link
Collaborator

pnorman commented Feb 27, 2015

A test that can be run on EC2 and take less than an hour and costs less than a dollar to run. This could be used as a smoke test for pull requests.

My full-world tests actually take <1h to run (on fast hardware). They do cost more than 1$ to run on EC2.

mid-zoom biased

For total time rendering, it's high-zoom biased I believe.

@pnorman
Copy link
Collaborator

pnorman commented May 3, 2015

A test that can be run on EC2 and take less than an hour and costs less than a dollar to run. This could be used as a smoke test for pull requests.

EC2 is subject to about an order of magnitude performance variation between instances. For a particular benchmark configuration on amazon repeated many times on different instances configured the same, the 10% performance is 587 TPS and the 90% is 2537.

A 200% decrease in rendering throughput would be a pretty catastrophic style change, but wouldn't be able to be distinguished on EC2.

@kocio-pl
Copy link
Collaborator

Do we already have some tools/scripts to test the rendering speed, even roughly? I think Kosmtik automation could be nice option. We could for example craft some exporting URLs with a big enough bbox and execute them for many zoom levels automatically, but maybe there is more elegant way of doing it.

@pnorman
Copy link
Collaborator

pnorman commented Jul 13, 2017

Do we already have some tools/scripts to test the rendering speed, even roughly?

render_list with a list of tiles from production is the standard way.

I think Kosmtik automation could be nice option.

There are enough differences it's not a great option, except for horrible performance failures

We could for example craft some exporting URLs with a big enough bbox and execute them for many zoom levels automatically, but maybe there is more elegant way of doing it.

It's essential the workload is realistic.

The method I've used lately has been to randomly sample running queries and ignore the time spent in Mapnik.

@kocio-pl
Copy link
Collaborator

Do you have any scripts that other people could use and compare results or is it just manual testing?

@pnorman
Copy link
Collaborator

pnorman commented Jul 14, 2017

I haven't needed any scripts, it's all been one-line command line stuff.

@kocio-pl
Copy link
Collaborator

Could you share it anyway? Even if it's short, we don't have any standard tools to measure performance and compare results at the moment.

@pnorman
Copy link
Collaborator

pnorman commented Jul 24, 2017

echo tile_list | render_list -n <N> -l 256 -f for normal stuff, render_list -n <N> -l 256 --all -f -z 0 -Z 12 for testing monthly rerendering

@kocio-pl
Copy link
Collaborator

kocio-pl commented Mar 20, 2018

I'm not sure, but I think this message relates to osm-carto PostgreSQL performance testing (comparison of current setup with partitioned tables):

https://lists.openstreetmap.org/pipermail/dev/2018-March/030168.html

@kocio-pl
Copy link
Collaborator

kocio-pl commented Apr 21, 2018

There's a tool called render_speedtest from renderd package (it can be directly built with https://github.com/openstreetmap/mod_tile). It tries to make a thorough test, example snippet from a test running on my virtual machine with 1 thread (default value):

Zoom(9) Now rendering 4 tiles
Rendered 4 tiles in 1.34 seconds (2.98 tiles/s)

Zoom(10) Now rendering 12 tiles
Rendered 12 tiles in 5.78 seconds (2.08 tiles/s)

Zoom(11) Now rendering 36 tiles
Rendered 36 tiles in 11.45 seconds (3.14 tiles/s)

Zoom(12) Now rendering 120 tiles
Rendered 120 tiles in 37.66 seconds (3.19 tiles/s)

Zoom(13) Now rendering 456 tiles
Rendered 456 tiles in 163.32 seconds (2.79 tiles/s)

Zoom(14) Now rendering 1702 tiles
Rendered 1702 tiles in 667.46 seconds (2.55 tiles/s)

@pnorman
Copy link
Collaborator

pnorman commented Apr 22, 2018

There's a tool called render_speedtest from renderd package

Don't render down to z14. Any rendering test that tries to render the world past z12 will give distorted results.

@kocio-pl
Copy link
Collaborator

kocio-pl commented May 1, 2018

What causes this distortion?

@pnorman
Copy link
Collaborator

pnorman commented May 28, 2018

What causes this distortion?

The fact that it doesn't represent a realistic workload. All performance testing needs to test something that matters, and the time to render the world on z13+ doesn't matter because no one does it. In particular, the average complexity of metatiles will be different than a tile server's workload, as will be the balance between different zooms. There are 4x as many z14 tiles as z13 tiles, but not 4x as many z14 tiles rendered as z13 tiles rendered.

https://planet.openstreetmap.org/tile_logs/renderd/renderd.yevaud.20150503.log.xz is an old log of what is rendered, taking into account the tile CDN and the renderd tile store

@matthijsmelissen
Copy link
Collaborator

EC2 is subject to about an order of magnitude performance variation between instances. For a particular benchmark configuration on amazon repeated many times on different instances configured the same, the 10% performance is 587 TPS and the 90% is 2537.

This is very surprising. It's not something we've encountered at my job (web application performance testing). I would be interested in understanding this issue better, @pnorman do you remember what instance type this was?

@pnorman
Copy link
Collaborator

pnorman commented Jun 26, 2018

This is very surprising. It's not something we've encountered at my job (web application performance testing). I would be interested in understanding this issue better, @pnorman do you remember what instance type this was?

It isn't my test results, it was from a comprehensive test comparing lots of different cloud options. gp2 storage had just come out.

I'm sure it's gotten better, but there's still going to be variation and before benchmarking, I'd want to test the machine with fio for disk and something else for CPU.

@kocio-pl
Copy link
Collaborator

What if we use some smarter testing pattern, like: "old - new - old - new - old - new..." on the same machine (instead of old and new being done once and on different machines and time)? That would help to compare tests more directly and avoid non-systematic errors.

Do we have a machine for such testing? Maybe Travis could be used as a first line of defense?

@Sjord
Copy link
Contributor

Sjord commented Dec 18, 2019

As a partial solution, could we run EXPLAIN on all queries and check the cost and query plan reported by postgres? That would be cheap to run and catch a part of the performance problems with queries.

@pnorman
Copy link
Collaborator

pnorman commented Dec 19, 2019

As a partial solution, could we run EXPLAIN on all queries and check the cost and query plan reported by postgres? That would be cheap to run and catch a part of the performance problems with queries.

How much data would you load for this?

@Sjord
Copy link
Contributor

Sjord commented Dec 27, 2019

I am thinking of using a small European country, such as Portugal. I made a script that runs EXPLAIN on all queries in the MML. Even running it on Luxembourg would catch the addresses sequential scan (#3937):

Commit 05dc392, just before the fix:

Total cost 35698
Most expensive addresses 34821.11

Commit e66889e with the fix:

Total cost 884
Most expensive turning-circle-casing 318.99

However, even though it's quite succesful in this case, these kinds of bugs seem quite rare to me and I doubt the script brings great benifits in monitoring performance in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants