Fix issue with peer stream node cleanup. #17235

hashi-derek · 2023-05-08T13:19:38Z

This commit encompasses a few problems that are closely related due to their proximity in the code.

The peerstream utilizes node IDs in several locations to determine which nodes / services / checks should be cleaned up or created. While VM deployments with agents will likely always have a node ID, agentless uses synthetic nodes and does not populate the field. This means that for consul-k8s deployments, all services were likely bundled together into the same synthetic node in some code paths (but not all), resulting in strange behavior. The Node.Node field should be used instead as a unique identifier, as it should always be populated.
The peerstream cleanup process for unused nodes uses an incorrect query for node deregistration. This query is NOT namespace aware and results in the node (and corresponding services) being deregistered prematurely whenever it has zero default-namespace services and 1+ non-default-namespace services registered on it. This issue is tricky to find due to the incorrect logic mentioned in 1, combined with the fact that the affected services must be co-located on the same node as the currently deregistering service for this to be encountered.
The stream tracker did not understand differences between services in different namespaces and could therefore report incorrect numbers. It was updated to utilize the full service name to avoid conflicts and return proper results.

This commit encompasses a few problems that are closely related due to their proximity in the code. 1. The peerstream utilizes node IDs in several locations to determine which nodes / services / checks should be cleaned up or created. While VM deployments with agents will likely always have a node ID, agentless uses synthetic nodes and does not populate the field. This means that for consul-k8s deployments, all services were likely bundled together into the same synthetic node in some code paths (but not all), resulting in strange behavior. The Node.Node field should be used instead as a unique identifier, as it should always be populated. 2. The peerstream cleanup process for unused nodes uses an incorrect query for node deregistration. This query is NOT namespace aware and results in the node (and corresponding services) being deregistered prematurely whenever it has zero default-namespace services and 1+ non-default-namespace services registered on it. This issue is tricky to find due to the incorrect logic mentioned in #1, combined with the fact that the affected services must be co-located on the same node as the currently deregistering service for this to be encountered. 3. The stream tracker did not understand differences between services in different namespaces and could therefore report incorrect numbers. It was updated to utilize the full service name to avoid conflicts and return proper results.

hashi-derek added backport/1.14 backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. labels May 8, 2023

Add changelog.

28a83da

hashi-derek requested a review from DanStough May 8, 2023 13:26

hashi-derek marked this pull request as ready for review May 8, 2023 16:17

Change changelog wording.

4feb116

DanStough approved these changes May 8, 2023

View reviewed changes

hashi-derek merged commit 50ef6a6 into main May 8, 2023

hashi-derek deleted the derekm/NET-3007/fix-peer-stream-cleanup branch May 8, 2023 18:13

This was referenced May 8, 2023

Backport of Fix issue with peer stream node cleanup. into release/1.14.x #17246

Closed

Backport of Fix issue with peer stream node cleanup. into release/1.15.x #17247

Merged

hashi-derek added backport/1.14 and removed backport/1.14 labels May 8, 2023

hc-github-team-consul-core mentioned this pull request May 8, 2023

Backport of Fix issue with peer stream node cleanup. into release/1.14.x #17248

Merged

p-linnane mentioned this pull request Jun 1, 2023

consul 1.15.3 Homebrew/homebrew-core#132601

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue with peer stream node cleanup. #17235

Fix issue with peer stream node cleanup. #17235

hashi-derek commented May 8, 2023 •

edited

Loading

Fix issue with peer stream node cleanup. #17235

Fix issue with peer stream node cleanup. #17235

Conversation

hashi-derek commented May 8, 2023 • edited Loading

hashi-derek commented May 8, 2023 •

edited

Loading