-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: move rpc defrag notifier into backend. #16959
base: main
Are you sure you want to change the base?
Conversation
server/storage/backend/backend.go
Outdated
if notifier == nil { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check? It doesn't protect against nil pointer exceptions.
It will work for
b.SubscribeDefragNotifier(nil)
But not for
var notifier *healthNotifier
b.SubscribeDefragNotifier(notifier)
Because nil != (*healthNotifier)(nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to protect the line
mc := adapter.MaintenanceServerToMaintenanceClient(v3rpc.NewMaintenanceServer(s, nil))
@@ -459,6 +490,9 @@ func (b *backend) defrag() error { | |||
// lock database after lock tx to avoid deadlock. | |||
b.mu.Lock() | |||
defer b.mu.Unlock() | |||
// send notifications after acquiring the lock. | |||
b.defragStarted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notifying listeners under lock is a simple and correct way to fix the race issue of two goroutines calling Defrag at once. However the current implementation comes with one flaw, it introduces an external call under a lock.
Backend just calls and blocks on executing the notifiers, but doesn't really know what they are doing. We should be really careful about introducing a blocking call here. Do you know if call .SetServingStatus(allGRPCServices, healthpb.HealthCheckResponse_SERVING)
is blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SetServingStatus
call is not blocking. It sends update to a channel, but it clears the channel first. And the lock is used only in simple cases.
// SetServingStatus is called when need to reset the serving status of a service
// or insert a new service entry into the statusMap.
func (s *Server) SetServingStatus(service string, servingStatus healthpb.HealthCheckResponse_ServingStatus) {
s.mu.Lock()
defer s.mu.Unlock()
if s.shutdown {
logger.Infof("health: status changing for %s to %v is ignored because health service is shutdown", service, servingStatus)
return
}
s.setServingStatusLocked(service, servingStatus)
}
func (s *Server) setServingStatusLocked(service string, servingStatus healthpb.HealthCheckResponse_ServingStatus) {
s.statusMap[service] = servingStatus
for _, update := range s.updates[service] {
// Clears previous updates, that are not sent to the client, from the channel.
// This can happen if the client is not reading and the server gets flow control limited.
select {
case <-update:
default:
}
// Puts the most recent update to the channel.
update <- servingStatus
}
}
Do you think we should make the defragStarted
call in a separate routine to remove such concerns?
Signed-off-by: Siyuan Zhang <[email protected]>
8529110
to
50c9868
Compare
close/reopen to re-trigger all arm64 workflows |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
@siyuanfoundation: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
the problem with the current implementation is:
Based on these reasons, I think it is necessary to move the defrag notifiers further back to the backend.