Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: improve latency when streaming series from Prometheus #3146

Merged
merged 1 commit into from
Sep 15, 2020

Conversation

simonpasquier
Copy link
Contributor

@simonpasquier simonpasquier commented Sep 9, 2020

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

I've found that when requesting many series (in the order of ten
thousands), the Thanos sidecar spends half of its time computing the
number of received series. To calculate the number of series, it needs
to build a label-based identifier for each chunked series and compare it
with the previous identifier. Eventually this number is only used for
logging and tracing so it doesn't feel like it's worth the penalty.

This change adds an histogram metric,
thanos_sidecar_prometheus_store_received_frames, which tracks the number
of frames per request received from the Prometheus remote read API
(buckets: 10, 100, 1000, 10000, 100000). It can be used to evaluate
whether expensive Series requests are performed.

Verification

CPU profile before this PR (half of the time is spent in prompb.(*Label).String)

thanos_sidecar_before_cpu_profile

CPU profile after this PR

thanos_sidecar_after_cpu_profile

CPU graph showing the 50% improvement (the patched version of the Thanos sidecar has been deployed around 15:50)

https://snapshot.raintank.io/dashboard/snapshot/mufpH3k2yRJUYLM284SXESwVLG0Ns14i?orgId=2

@simonpasquier simonpasquier force-pushed the improve-prometheus-store branch from 8242808 to 6389103 Compare September 9, 2020 13:52
@kakkoyun kakkoyun requested a review from bwplotka September 9, 2020 14:22
@simonpasquier simonpasquier force-pushed the improve-prometheus-store branch 4 times, most recently from ae9dbe4 to 74efa25 Compare September 10, 2020 07:05
@simonpasquier
Copy link
Contributor Author

simonpasquier commented Sep 10, 2020

prometheus-operator/prometheus-operator#3457 is the prometheus-operator PR including this change.
wrong place :)

I've found that when requesting many series (in the order of ten
thousands), the Thanos sidecar spends half of its time computing the
number of received series. To calculate the number of series, it needs
to build a label-based identifier for each chunked series and compare it
with the previous identifier. Eventually this number is only used for
logging and tracing so it doesn't feel like it's worth the penalty.

This change adds an histogram metric,
`thanos_sidecar_prometheus_store_received_frames`, which tracks the
number of frames per request received from the Prometheus remote read
API (buckets: 10, 100, 1000, 10000, 100000). It can be used to evaluate
whether expensive Series requests are being performed.

Signed-off-by: Simon Pasquier <[email protected]>
@simonpasquier simonpasquier force-pushed the improve-prometheus-store branch from 74efa25 to f2ab444 Compare September 15, 2020 11:20
@simonpasquier
Copy link
Contributor Author

@bwplotka friendly ping :)

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow.

Amazing. Just curious how you found this? (: Just putting timers manually?

EDIT: Nvm, read your description. Thanks!

@bwplotka bwplotka merged commit 1078fe7 into thanos-io:master Sep 15, 2020
@bwplotka
Copy link
Member

Thanks, amazingly spotted ❤️

@simonpasquier simonpasquier deleted the improve-prometheus-store branch September 21, 2020 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants