-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add receiver downscale endpoint #88
Conversation
pkg/server/http/http.go
Outdated
@@ -17,6 +15,11 @@ import ( | |||
toolkit_web "github.com/prometheus/exporter-toolkit/web" | |||
"golang.org/x/net/http2" | |||
"golang.org/x/net/http2/h2c" | |||
"net/http" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix the import convention for go: https://google.github.io/styleguide/go/best-practices.html#import-ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored
cmd/thanos/receive.go
Outdated
@@ -311,6 +311,8 @@ func runReceive( | |||
httpserver.WithGracePeriod(time.Duration(*conf.httpGracePeriod)), | |||
httpserver.WithTLSConfig(*conf.httpTLSConfig), | |||
) | |||
var lastDownscalePrepareTimestamp *int64 = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implementation assumes the pod won't restart, to preserve the state, we might need to persist it on local PVC too, any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but it's safe to lose this state. rollout operator will only do downscale if the returned timestamp is more than 30 seconds ago. If we lose this state, it returns the current timestamp and downscale will not accidentally happen. In the worst case, a scheduled downscale will be delayed for 30 seconds.
pkg/receive/multitsdb.go
Outdated
@@ -106,6 +106,14 @@ func (t *MultiTSDB) SkipMatchExternalLabels() { | |||
t.skipMatchExternalLabels = true | |||
} | |||
|
|||
func (t *MultiTSDB) GetTenants() map[string]*tenant { | |||
return t.tenants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map is protected by mtx
. A call site of GetTenants()
can hold a pointer to the map and do reads/writes while the map is being updated by other goroutines. That'd be a data race.
It's safer to return a deep copy of the map if copying is not too expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored code
pkg/server/http/http.go
Outdated
func RegisterDownscale[K comparable, V any](s *Server, m map[K]V, mtx *sync.RWMutex, t *int64) { | ||
s.mux.Handle("/-/downscale", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
mtx.RLock() | ||
n := len(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't pass in a map, if only the size of map is needed. Pass in the size of the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the size of the map changes dynamically during handler runtime and Golang passes map by reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored code
cf878f5
to
ea0d891
Compare
Signed-off-by: Yuchen Wang <[email protected]>
pkg/receive/multitsdb.go
Outdated
func (t *MultiTSDB) GetTenantsLen() int { | ||
t.mtx.RLock() | ||
defer t.mtx.RUnlock() | ||
return len(t.tenants) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we log how many tenants left and what are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add unit test?
cmd/thanos/receive.go
Outdated
@@ -7,8 +7,10 @@ import ( | |||
"context" | |||
"fmt" | |||
"net" | |||
"net/http" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, would be nice if you can add unit tests, especially after prune.
The endpoint expose number of TSDBs, rollout operator will patch the sts if n_tsdb=0
related PRs:
https://github.com/databricks/universe/pull/753653
databricks/rollout-operator#7