-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app: push readyz in background #867
Conversation
app/monitoringapi.go
Outdated
@@ -73,6 +73,24 @@ func wireMonitoringAPI(life *lifecycle.Manager, addr string, localNode *enode.Lo | |||
life.RegisterStart(lifecycle.AsyncBackground, lifecycle.StartMonitoringAPI, httpServeHook(server.ListenAndServe)) | |||
life.RegisterStop(lifecycle.StopMonitoringAPI, lifecycle.HookFunc(server.Shutdown)) | |||
|
|||
go func() { | |||
ticker := time.NewTicker(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can discuss on the duration part, what will be the right duration to push metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics are not pushed, they are pull. So you do not need to update more than pull period, which is 15s. Suggest making this 10s.
case <-ctx.Done(): | ||
return | ||
case <-ticker.C: | ||
syncing, err := beaconNodeSyncing(ctx, eth2Cl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoided logging errors here, since that would spam the logs
@@ -73,6 +73,24 @@ func wireMonitoringAPI(life *lifecycle.Manager, addr string, localNode *enode.Lo | |||
life.RegisterStart(lifecycle.AsyncBackground, lifecycle.StartMonitoringAPI, httpServeHook(server.ListenAndServe)) | |||
life.RegisterStop(lifecycle.StopMonitoringAPI, lifecycle.HookFunc(server.Shutdown)) | |||
|
|||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest extracting a function isReady(ctx) bool
which you call here and in newReadyHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a test for isReady
please 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should ONLY have a async "isReady" resolver, and then readyz
just reads the latest value..?
Codecov Report
@@ Coverage Diff @@
## main #867 +/- ##
==========================================
- Coverage 54.73% 54.69% -0.04%
==========================================
Files 113 114 +1
Lines 12079 12265 +186
==========================================
+ Hits 6611 6708 +97
- Misses 4505 4587 +82
- Partials 963 970 +7
Continue to review full report at Codecov.
|
) | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return ctx.Err() | ||
case res := <-results: | ||
if res.Error != nil { | ||
continue | ||
errCount++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error counter to return error if are unable to ping quorum number of peers
app/monitoringapi.go
Outdated
case <-ctx.Done(): | ||
return | ||
case <-ticker.Chan(): | ||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remember to decrease mutex locked scope to absolute minimum, and remember to NEVER do IO (disk/network) calls while a lock is held.
app/monitoringapi.go
Outdated
actual int | ||
errCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest: okCount, errCount
app/monitoringapi_internal_test.go
Outdated
|
||
// We wrap the Advance() calls with blockers to make sure that the ticker | ||
// can go to sleep and produce ticks without time passing in parallel. | ||
clock.BlockUntil(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlockUntil only works for calls to Sleep
and After
, not Timer.Chan().. so you'll need to another way.
app/monitoringapi_internal_test.go
Outdated
} | ||
} | ||
|
||
func TestStartCheckerPingFail(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lots of duplication in these tests, can't we make it a table test rather?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure
app/monitoringapi.go
Outdated
readyErr = errors.New("beacon node not synced") | ||
readyzGauge.Set(0) | ||
} else if peersReady(ctx, peerIDs, tcpNode) != nil { | ||
readyErr = errors.New("couldn't ping all peers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest using error sentinels (var errReadyNoPing = errors.New("blah blah")
) and use them for comparison in tests
readyErrFunc := startReadyChecker(ctx, tcpNode, eth2Cl, peerIDs, clockwork.NewRealClock()) | ||
mux.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) { | ||
readyErr := readyErrFunc() | ||
if readyErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but i avoided this because it needs to send response as error string which will result in calling readErrFunc() twice and again asking for mutex.
app/monitoringapi.go
Outdated
func startReadyChecker(ctx context.Context, tcpNode host.Host, eth2Cl eth2client.NodeSyncingProvider, peerIDs []peer.ID, clock clockwork.Clock) func() error { | ||
var ( | ||
mu sync.Mutex | ||
readyErr error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest initialising the error with non-nil, "errReadyUninit = "ready check uninitialised"
app/monitoringapi.go
Outdated
mu.Lock() | ||
readyErr = errReadySyncing | ||
mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather do locking in one place.
suggest:
... } else if syncing {
err = errReadySyncing
readyzGauge.Set(0)
} ...
mu.Lock()
readyErr = err
mu.Unlock()
This change pushed readyz metrics every 1 second. Earlier we were pushing this metric only when someone calls readyz endpoint. If no-one calls that endpoint it will show inactive on readyz grafana panel.
category: bug
ticket: #880