-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy: Query goroutine leak when store.response-timeout
is set
#7618
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cincinnat
force-pushed
the
query-goroutine-leak
branch
4 times, most recently
from
August 9, 2024 13:10
910a242
to
a4b9301
Compare
MichaHoffmann
approved these changes
Aug 9, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, thank you!
saswatamcode
approved these changes
Aug 11, 2024
@cincinnat could you kindly rebase on latest main? We had a CI issue, which seems to fixed. Want to merge this on green 🙂 |
time.AfterFunc() returns a time.Timer object whose C field is nil, accroding to the documentation. A goroutine blocks forever on reading from a `nil` channel, leading to a goroutine leak on random slow queries. Signed-off-by: Mikhail Nozdrachev <[email protected]>
cincinnat
force-pushed
the
query-goroutine-leak
branch
from
August 13, 2024 07:04
a4b9301
to
d23ca27
Compare
saswatamcode
approved these changes
Aug 13, 2024
saswatamcode
pushed a commit
to saswatamcode/thanos
that referenced
this pull request
Aug 13, 2024
…nos-io#7618) time.AfterFunc() returns a time.Timer object whose C field is nil, accroding to the documentation. A goroutine blocks forever on reading from a `nil` channel, leading to a goroutine leak on random slow queries. Signed-off-by: Mikhail Nozdrachev <[email protected]>
saswatamcode
added a commit
that referenced
this pull request
Aug 13, 2024
* Proxy: Query goroutine leak when `store.response-timeout` is set (#7618) time.AfterFunc() returns a time.Timer object whose C field is nil, accroding to the documentation. A goroutine blocks forever on reading from a `nil` channel, leading to a goroutine leak on random slow queries. Signed-off-by: Mikhail Nozdrachev <[email protected]> * pkg/clientconfig: fix TLS configs with only CA (#7634) 065e3dd introduced a regression: TLS configurations for Thanos Ruler query and alerting with only a CA file failed to load. For instance, the following snippet is a valid query configuration: ``` - static_configs: - prometheus.example.com:9090 scheme: https http_config: tls_config: ca_file: /etc/ssl/cert.pem ``` The test fixtures (CA, certificate and key files) are copied from prometheus/common and are valid until 2072. Signed-off-by: Simon Pasquier <[email protected]> * Cut patch release v0.36.1 Signed-off-by: Saswata Mukherjee <[email protected]> * Fix failing e2e test (#7620) Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> --------- Signed-off-by: Mikhail Nozdrachev <[email protected]> Signed-off-by: Simon Pasquier <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Co-authored-by: Mikhail Nozdrachev <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> Co-authored-by: Harry John <[email protected]>
saswatamcode
added a commit
that referenced
this pull request
Aug 14, 2024
* CHANGELOG: Mark 0.36 as in progress Signed-off-by: Michael Hoffmann <[email protected]> * Cut release candidate v0.36.0-rc.0 (#7490) Signed-off-by: Michael Hoffmann <[email protected]> * Cut release candidate 0.36.0 rc.1 (#7510) * *: fix server grpc histograms (#7493) Signed-off-by: Michael Hoffmann <[email protected]> * Close endpoints after the gRPC server has terminated (#7509) Endpoints are currently closed as soon as we receive a SIGTERM or SIGINT. This causes in-flight queries to get cancelled since outgoing connections get closed instantly. This commit moves the endpoints.Close call after the grpc server shutdown to make sure connections are available as long as the server is running. Signed-off-by: Filip Petkovski <[email protected]> * Cut release candidate v0.36.0-rc.1 Signed-off-by: Michael Hoffmann <[email protected]> --------- Signed-off-by: Michael Hoffmann <[email protected]> Signed-off-by: Filip Petkovski <[email protected]> Co-authored-by: Filip Petkovski <[email protected]> * Cut release v0.36.0 (#7578) Signed-off-by: Michael Hoffmann <[email protected]> * Cut patch release `v0.36.1` (#7636) * Proxy: Query goroutine leak when `store.response-timeout` is set (#7618) time.AfterFunc() returns a time.Timer object whose C field is nil, accroding to the documentation. A goroutine blocks forever on reading from a `nil` channel, leading to a goroutine leak on random slow queries. Signed-off-by: Mikhail Nozdrachev <[email protected]> * pkg/clientconfig: fix TLS configs with only CA (#7634) 065e3dd introduced a regression: TLS configurations for Thanos Ruler query and alerting with only a CA file failed to load. For instance, the following snippet is a valid query configuration: ``` - static_configs: - prometheus.example.com:9090 scheme: https http_config: tls_config: ca_file: /etc/ssl/cert.pem ``` The test fixtures (CA, certificate and key files) are copied from prometheus/common and are valid until 2072. Signed-off-by: Simon Pasquier <[email protected]> * Cut patch release v0.36.1 Signed-off-by: Saswata Mukherjee <[email protected]> * Fix failing e2e test (#7620) Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> --------- Signed-off-by: Mikhail Nozdrachev <[email protected]> Signed-off-by: Simon Pasquier <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Co-authored-by: Mikhail Nozdrachev <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> Co-authored-by: Harry John <[email protected]> --------- Signed-off-by: Michael Hoffmann <[email protected]> Signed-off-by: Filip Petkovski <[email protected]> Signed-off-by: Mikhail Nozdrachev <[email protected]> Signed-off-by: Simon Pasquier <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Co-authored-by: Michael Hoffmann <[email protected]> Co-authored-by: Filip Petkovski <[email protected]> Co-authored-by: Mikhail Nozdrachev <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> Co-authored-by: Harry John <[email protected]>
hczhu-db
pushed a commit
to databricks/thanos
that referenced
this pull request
Aug 22, 2024
* Proxy: Query goroutine leak when `store.response-timeout` is set (thanos-io#7618) time.AfterFunc() returns a time.Timer object whose C field is nil, accroding to the documentation. A goroutine blocks forever on reading from a `nil` channel, leading to a goroutine leak on random slow queries. Signed-off-by: Mikhail Nozdrachev <[email protected]> * pkg/clientconfig: fix TLS configs with only CA (thanos-io#7634) 065e3dd introduced a regression: TLS configurations for Thanos Ruler query and alerting with only a CA file failed to load. For instance, the following snippet is a valid query configuration: ``` - static_configs: - prometheus.example.com:9090 scheme: https http_config: tls_config: ca_file: /etc/ssl/cert.pem ``` The test fixtures (CA, certificate and key files) are copied from prometheus/common and are valid until 2072. Signed-off-by: Simon Pasquier <[email protected]> * Cut patch release v0.36.1 Signed-off-by: Saswata Mukherjee <[email protected]> * Fix failing e2e test (thanos-io#7620) Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> --------- Signed-off-by: Mikhail Nozdrachev <[email protected]> Signed-off-by: Simon Pasquier <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]> Co-authored-by: Mikhail Nozdrachev <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> Co-authored-by: Harry John <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
time.AfterFunc()
returns atime.Timer
object whoseC
field is nil, accroding to the documentation. A goroutine blocks forever on reading from anil
channel, leading to a goroutine leak on random slow queries for Thanos.This goroutine leak would be most apparent for busy services with
query.promql-engine=thanos
, when grouroutins tend to stuck in batches, thanks to the wide usage ofsync.Once
by the engine.Changes
Verification