-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up imported nodes/services/checks as needed #13367
Conversation
// deletedNodeChecks tracks node checks that were not present in the latest response. | ||
// A single node check will be attached to all service instances of a node, so this | ||
// deduplication prevents issuing multiple deregistrations for a single check. | ||
deletedNodeChecks = make(map[nodeCheckTuple]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conveniently after i re-activate mesh-gateway-only-mode there will be no node checks replicated to worry about since the checks on a service will be flattened into a single summary check per service instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this still apply to non-mesh exports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, unless we wanted to squash those checks as well.
agent/rpc/peering/replication.go
Outdated
PeerName: peerName, | ||
}) | ||
if err != nil { | ||
return fmt.Errorf("failed to deregister service: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fail fast here vs using merr
and grabbing all the raft apply errors? Wondering if it is meaningful to output node/service/check IDs in the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I'm a bit hesitant to use merr
for raft apply errors because the error could get really unwieldy. For example, if leadership is lost and all raft applies fail because of it there could be thousands of errors
Thoughts @rboyer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all times a raft apply will work. If it doesn't work it almost certainly is going to be due to leadership loss so we'd want fast failure at the first raft error, not collection of errors and display at the end.
The comments really helped me review and understand what's going on in this PR! 👍 |
agent/rpc/peering/replication.go
Outdated
|
||
// All services on the node were deleted, so the node is also cleaned up. | ||
err = s.Backend.Apply().CatalogDeregister(&structs.DeregisterRequest{ | ||
Node: string(node), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to specify the partition for the node using a empty-namespace, populated-partition entmeta here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some ent tests in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a few tiny plumbing issues I noticed but otherwise, nice.
agent/rpc/peering/replication.go
Outdated
PeerName: peerName, | ||
}) | ||
if err != nil { | ||
ident := fmt.Sprintf("partition:%s/peer:%s/node:%s/service_id:%s", csn.Service.PartitionOrDefault(), peerName, csn.Node.Node, csn.Service.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing the namespace here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with non blocking comment about some missing namespaces in error output. Also you need a rebase I believe.
Previously, imported data would never be deleted. As nodes/services/checks were registered and deregistered, resources deleted from the exporting cluster would accumulate in the imported cluster. This commit makes updates to replication so that whenever an update is received for a service name we reconcile what was present in the catalog against what was received. This handleUpdateService method can handle both updates and deletions.
db7d8a8
to
c96847c
Compare
Previously, imported data would never be deleted. As nodes/services/checks were registered and deregistered, resources deleted from the exporting cluster would accumulate in the imported cluster. This commit makes updates to replication so that whenever an update is received for a service name we reconcile what was present in the catalog against what was received. This handleUpdateService method can handle both updates and deletions.
Description
Previously, imported data would never be deleted. As
nodes/services/checks were registered and deregistered, resources
deleted from the exporting cluster would accumulate in the imported
cluster.
This commit makes updates to replication so that whenever an update is
received for a service name we reconcile what was present in the catalog
against what was received.
This handleUpdateService method can handle both updates and deletions.
Testing & Reproduction steps
PR Checklist
external facing docs updated