-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heap usage increased #3520
Comments
So, it turns out that the issue started happening earlier than the modules commit. From the last good run on March 10, note the apm-server version reported: "apm-server version 8.0.0 built on 6 March [2800966]"
From the job run on March 11: "apm-server version 8.0.0 built on 10 March [64c4140]"
Commit 64c4140 immediately precedes the modules commit. It seems that the hey-apm job runs with whatever is the latest snapshot image. I think perhaps we should build and publish our own image nightly specifically for hey-apm? (CC @elastic/observablt-robots) The only thing in there that looks suspicious is d5a0c46, but that only applies to spans AFAIK. I don't see why error allocations would be impacted by that. |
sure, we want to have a Docker image of every valid commit stored on our Docker registry is part of our incremental deployments effort |
So far I've been unable to reproduce the difference locally. Just to recap, the change is apparently somewhere in: Looking through the changes, the only thing that could possibly make sense to me is d5a0c46 (adding all metadata fields to spans). As mentioned above, workloads that do not involve spans shouldn't be affected. My current hypothesis is that the load from hey-apm benchmark jobs is queuing up, and spilling over into subsequent jobs. Currently, hey-apm has a fixed 60s cooldown between jobs, but does not restart apm-server, or explicitly wait for it to quiesce. I'm going to take a look at modifying hey-apm to wait until the server's queue is empty before proceeding. |
That probably happens, yes. I wouldn't know why happens more after 11 March, thou. Any case, I would generally take CI hey-apm results with a grain of salt. (and yeah, its hard to reproduce locally). I can try to dig a bit in that spans commit. |
I never did get to the bottom of the increase, but with a series of refactoring and optimisation PRs, we're now at or above (in some cases significantly above) the previous performance: (Ignore the dip in ingestion rate towards the end – that's caused by enabling continuous profiling, which caused interference with benchmarking.) We've still got some room for improvement, but I think we can close this for now and continue improvements as a matter of course. I've opened an issue about better control of the load-testing environment, which should also enable us to turn on continuous profiling: elastic/hey-apm#167 |
There has been a non-negligible increase in heap allocations since March 11:
This coincides with #3418.
The text was updated successfully, but these errors were encountered: