-
Notifications
You must be signed in to change notification settings - Fork 188
worker, ha: increase keepalive TTL to 1 minute, and to 30 minutes if relay enabled #1405
Conversation
84f43d6
to
17ff623
Compare
pkg/ha/keepalive.go
Outdated
// create a new lease with new TTL, and overwrite original KV | ||
cliCtx, cancel := context.WithTimeout(ctx, etcdutil.DefaultRequestTimeout) | ||
defer cancel() | ||
lease, err = cli.Grant(cliCtx, newTTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The revokeLease function should be called for the old lease, like line 104.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lease will expire if not keepalive, so I think it's not needed
PTAL @lichunzhu @3pointer |
Our ha integration test still passed after increasing keepalive ttl... 🤔 |
I guess it's gracefully exited (not kill -9) so revoke lease function called. will check today |
dm/worker/server.go
Outdated
@@ -588,6 +588,13 @@ func (s *Server) startWorker(cfg *config.SourceConfig) error { | |||
return err | |||
} | |||
startRelay = !relayStage.IsDeleted && relayStage.Expect == pb.Stage_Running | |||
// change keepalive TTL to 30 minutes if it's default value | |||
// is relayStage is not running, we choose to change keepalive here instead of at relayStage switching | |||
if s.cfg.KeepAliveTTL == defaultKeepAliveTTL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if s.cfg.KeepAliveTTL == defaultKeepAliveTTL { | |
if s.cfg.KeepAliveTTL < relayEnabledKeepAliveTTL { |
What if I set KeepAliveTTL
to 31s
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should left a chance if user doesn't want this feature, so only increase TTL when default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think we should add a configuration
named relayEnabledKeepAliveTTL
. In the current way, I can't set keepAliveTTL
and increase relayKeepAliveTTL
at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we can try to revoke the last lease when we "re-keepalive". It doesn't matter even if the job is failed.
Rest LGTM
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
@zeminzhou, Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments. See the corresponding SIG page for more information. Related SIG: migrate(slack). |
/run-all-tests |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-2.0 in PR #1426 |
What problem does this PR solve?
make keepalive more rubost
What is changed and how it works?
after this PR there're two keepalive TTL: one for relay and one for non-relay. They are 30 minutes and 1 minute seperately by default.
If relay task is assigned, change it to relay-keepalive-ttl, when no relay tasks, chage it to keepalive-ttl
Check List
Tests
Code changes
Side effects
Related changes