Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up imported nodes/services/checks as needed #13367

Merged
merged 4 commits into from
Jun 13, 2022

Conversation

freddygv
Copy link
Contributor

@freddygv freddygv commented Jun 6, 2022

Description

Previously, imported data would never be deleted. As
nodes/services/checks were registered and deregistered, resources
deleted from the exporting cluster would accumulate in the imported
cluster.

This commit makes updates to replication so that whenever an update is
received for a service name we reconcile what was present in the catalog
against what was received.

This handleUpdateService method can handle both updates and deletions.

Testing & Reproduction steps

  • Added unit tests for various cases. Ran them in both OSS and ENT.

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

@freddygv freddygv requested review from a team, eculver, rboyer, kisunji and acpana and removed request for a team June 6, 2022 15:50
@freddygv freddygv added the pr/no-changelog PR does not need a corresponding .changelog entry label Jun 6, 2022
// deletedNodeChecks tracks node checks that were not present in the latest response.
// A single node check will be attached to all service instances of a node, so this
// deduplication prevents issuing multiple deregistrations for a single check.
deletedNodeChecks = make(map[nodeCheckTuple]struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conveniently after i re-activate mesh-gateway-only-mode there will be no node checks replicated to worry about since the checks on a service will be flattened into a single summary check per service instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this still apply to non-mesh exports?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, unless we wanted to squash those checks as well.

agent/rpc/peering/replication.go Show resolved Hide resolved
agent/rpc/peering/replication.go Outdated Show resolved Hide resolved
PeerName: peerName,
})
if err != nil {
return fmt.Errorf("failed to deregister service: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fail fast here vs using merr and grabbing all the raft apply errors? Wondering if it is meaningful to output node/service/check IDs in the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'm a bit hesitant to use merr for raft apply errors because the error could get really unwieldy. For example, if leadership is lost and all raft applies fail because of it there could be thousands of errors

Thoughts @rboyer ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all times a raft apply will work. If it doesn't work it almost certainly is going to be due to leadership loss so we'd want fast failure at the first raft error, not collection of errors and display at the end.

agent/rpc/peering/stream_test.go Outdated Show resolved Hide resolved
@kisunji
Copy link
Contributor

kisunji commented Jun 6, 2022

The comments really helped me review and understand what's going on in this PR! 👍


// All services on the node were deleted, so the node is also cleaned up.
err = s.Backend.Apply().CatalogDeregister(&structs.DeregisterRequest{
Node: string(node),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify the partition for the node using a empty-namespace, populated-partition entmeta here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some ent tests in a separate PR

Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few tiny plumbing issues I noticed but otherwise, nice.

@freddygv freddygv requested a review from rboyer June 7, 2022 18:05
PeerName: peerName,
})
if err != nil {
ident := fmt.Sprintf("partition:%s/peer:%s/node:%s/service_id:%s", csn.Service.PartitionOrDefault(), peerName, csn.Node.Node, csn.Service.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing the namespace here.

Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with non blocking comment about some missing namespaces in error output. Also you need a rebase I believe.

freddygv added 4 commits June 13, 2022 11:41
Previously, imported data would never be deleted. As
nodes/services/checks were registered and deregistered, resources
deleted from the exporting cluster would accumulate in the imported
cluster.

This commit makes updates to replication so that whenever an update is
received for a service name we reconcile what was present in the catalog
against what was received.

This handleUpdateService method can handle both updates and deletions.
@freddygv freddygv force-pushed the peering/service-cleanup branch from db7d8a8 to c96847c Compare June 13, 2022 17:41
@freddygv freddygv merged commit 71b2545 into main Jun 13, 2022
@freddygv freddygv deleted the peering/service-cleanup branch June 13, 2022 17:52
freddygv added a commit that referenced this pull request Jun 15, 2022
Previously, imported data would never be deleted. As
nodes/services/checks were registered and deregistered, resources
deleted from the exporting cluster would accumulate in the imported
cluster.

This commit makes updates to replication so that whenever an update is
received for a service name we reconcile what was present in the catalog
against what was received.

This handleUpdateService method can handle both updates and deletions.
@kisunji kisunji added the theme/cluster-peering Related to Consul's cluster peering feature label Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-changelog PR does not need a corresponding .changelog entry theme/cluster-peering Related to Consul's cluster peering feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants