-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce chunk write queue memory usage 2 #10874
Conversation
See #10873 (comment) for first part of the story. After applying fix from #10873 and then fix from this PR at ~14:00 to ingester-zone-b, we've got: Legend:
We can see that applying both fixes (#10873 and from this PR) reduced memory usage of new chunk mapper to the same values as old chunk mapper. (In this case, each zone had 20 TSDBs open altogether) |
c7e3bdd
to
af02c17
Compare
This PR reimplements chan chunkWriteJob with custom buffered queue that should use less memory, because it doesn't preallocate entire buffer for maximum queue size at once. Instead it allocates individual "segments" with smaller size. As elements are added to the queue, they fill individual segments. When elements are removed from the queue (and segments), empty segments can be thrown away. This doesn't change memory usage of the queue when it's full, but should decrease its memory footprint when it's empty (queue will keep max 1 segment in such case). Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
af02c17
to
5198a92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks!
However, I don't see the mentioned improvement in the second graph: #10874 (comment) - do you mean those small steps down when the load was potentially smaller?
Anyway, it's great to reduce the allocation when the queue is smaller, so LGTM. Thanks 💪🏽
Just FYI the only negative I see is that generally linked lists can trash L-caches when iterating. And we iterate jobs to process them. However, given the write chunk job requires disk write so higher I/O - linked list overhead probably does not matter. Did you maybe check the impact of this on latency for writing those chunks (queue feeling up quicker)? I wonder if we have even a means to measure this currently. 🤔
Thank you for review!
The improvement is in the yellow line. At first you can see it being better (lower) than green (zone-a), but worse than blue (zone-c). After applying fix from this PR (roughtly at 14 UTC on the graph), it got to the same level as blue (zone-c, running old chunk mapper).
We iterate full segment first (slice of 8192 jobs), before jumping (using linked list) to the next segment. As you point out each chunk job performs IO. I haven't checked the impact on latency of writing the chunks though. |
Signed-off-by: Peter Štibraný <[email protected]>
Restarted windows test flake |
Happy to merge, unless you want to check it @codesome |
1 less PR to review for me? Yes please! |
This PR reimplements
chan chunkWriteJob
with custom buffered queue that should use less memory, because it doesn't preallocate entire buffer for maximum queue size at once. Instead it allocates individual "segments" with smaller size.As elements are added to the queue, they fill individual segments. When elements are removed from the queue (and segments), empty segments can be thrown away. This doesn't change memory usage of the queue when it's full, but decreases its memory footprint when queue is empty (queue will keep max 1 segment in such case).
This PR comes from grafana/mimir-prometheus#247. We use it in Grafana Mimir to reduce memory usage in Mimir clusters with thousands of open TSDBs in single process.
Related to #10873.