-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[profiles] DAP slow rollout on DS creation #1365
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1365 +/- ##
==========================================
- Coverage 47.29% 47.21% -0.09%
==========================================
Files 217 217
Lines 18710 18968 +258
==========================================
+ Hits 8849 8955 +106
- Misses 9399 9544 +145
- Partials 462 469 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall i think the logic is good, i just left some suggestions about naming to make names more intuitive and/or concise. (there may be a couple other places that can be updated in the same vein)
a suggestion for the future, especially if we add it to a CRD; perhaps we can call it CreationStrategy
to be parallel to the existing UpdateStrategy
for node, hasCorrectProfileLabel := range nodesThatMatchProfile { | ||
if useSlowStart { | ||
if hasCorrectProfileLabel { | ||
profileStatus.SlowStart.NodesLabeled++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LabeledNodesCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to change the CRD value as well?
NodesLabeled int32 `json:"nodesLabeled"` |
PodsReady
-> ReadyPodsCount
and MaxUnavailable
-> MaxUnavailableCount
? We don't use Count
for the DaemonSetStatus so I tried to base the names off of that format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, thanks for explaining.. yes i was thinking we could change the CRD, so that it is more intuitive that it's a number and not a list. if you prefer a couple letters less, we could use num
instead? NumReadyPods
, NumMaxUnavailable
, NumLabeledNodes
? either is OK with me
pkg/agentprofile/agent_profile.go
Outdated
for node, hasCorrectProfileLabel := range nodesThatMatchProfile { | ||
if useSlowStart { | ||
if hasCorrectProfileLabel { | ||
profileStatus.SlowStart.NodesLabeled++ | ||
} else { | ||
if numNodesToLabel <= 0 { | ||
continue | ||
} | ||
numNodesToLabel-- | ||
profileStatus.SlowStart.NodesLabeled++ | ||
} | ||
} | ||
|
||
profileAppliedByNode[node] = types.NamespacedName{ | ||
Namespace: profile.Namespace, | ||
Name: profile.Name, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for node, hasCorrectProfileLabel := range nodesThatMatchProfile { | |
if useSlowStart { | |
if hasCorrectProfileLabel { | |
profileStatus.SlowStart.NodesLabeled++ | |
} else { | |
if numNodesToLabel <= 0 { | |
continue | |
} | |
numNodesToLabel-- | |
profileStatus.SlowStart.NodesLabeled++ | |
} | |
} | |
profileAppliedByNode[node] = types.NamespacedName{ | |
Namespace: profile.Namespace, | |
Name: profile.Name, | |
} | |
} | |
for node, hasCorrectProfileLabel := range nodesThatMatchProfile { | |
if useSlowStart { | |
if hasCorrectProfileLabel { | |
profileStatus.SlowStart.NodesLabeled++ | |
} else if numNodesToLabel > 0 { | |
profileAppliedByNode[node] = types.NamespacedName{ | |
Namespace: profile.Namespace, | |
Name: profile.Name, | |
} | |
numNodesToLabel-- | |
profileStatus.SlowStart.NodesLabeled++ | |
} | |
} else { | |
profileAppliedByNode[node] = types.NamespacedName{ | |
Namespace: profile.Namespace, | |
Name: profile.Name, | |
} | |
} | |
} |
would that work to not apply nodes to labels that are already correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All nodes should be added to profileAppliedByNode
if a profile applies to them. Otherwise, their profile label will be removed later in the reconcile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this code a bit challenging to read and well worth refactoring at some point.
eb551a9
to
ae14ff2
Compare
|
||
profileAppliedByNode, err = agentprofile.ProfileToApply(logger, &profile, nodeList, profileAppliedByNode, now) | ||
maxUnavailable := agentprofile.GetMaxUnavailable(logger, dda, &profile, len(nodeList)) | ||
profileAppliedByNode, err = agentprofile.ApplyProfile(logger, &profile, nodeList, profileAppliedByNode, now, maxUnavailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 Code Vulnerability
Potential memory range aliasing. Avoid using the memory reference. (...read more)
Implicit memory aliasing in for loops refers to a scenario in Go programming when two or more pointers reference the same location in memory, creating unexpected side effects. This often results in a common mistake amongst Go programmers due to the 'range' clause.
Consider this example, where a slice of pointers is created in a loop:
data := []int{1, 2, 3}
pointers := make([]*int, 3)
for i, v := range data {
pointers[i] = &v
}
You might expect the 'pointers' slice to hold addresses of elements in 'data' slice, but that's not the case. In each iteration of the loop, variable 'v' gets a new value but its memory address doesn't change because it's a loop variable. As a result, each element in 'pointers' slice points to the same memory location - the address of the loop variable 'v'. The final value of 'v' is '3', and since all pointers point to 'v', dereferencing the pointers would yield '3' regardless of the pointer's index in the slice.
To avoid implicit memory aliasing in for loops in Go, you should address the actual elements in the original data structure, like so:
data := []int{1, 2, 3}
pointers := make([]*int, 3)
for i := range data {
pointers[i] = &data[i]
}
In this example, each pointer in the 'pointers' slice correctly points to the respective element in the 'data' slice.
Learn More
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the linked stack overflow issue, should be fine with go 1.22
@@ -349,8 +349,8 @@ func (r *Reconciler) profilesToApply(ctx context.Context, logger logr.Logger, no | |||
|
|||
sortedProfiles := agentprofile.SortProfiles(profilesList.Items) | |||
for _, profile := range sortedProfiles { | |||
|
|||
profileAppliedByNode, err = agentprofile.ProfileToApply(logger, &profile, nodeList, profileAppliedByNode, now) | |||
maxUnavailable := agentprofile.GetMaxUnavailable(logger, dda, &profile, len(nodeList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 Code Vulnerability
Potential memory range aliasing. Avoid using the memory reference. (...read more)
Implicit memory aliasing in for loops refers to a scenario in Go programming when two or more pointers reference the same location in memory, creating unexpected side effects. This often results in a common mistake amongst Go programmers due to the 'range' clause.
Consider this example, where a slice of pointers is created in a loop:
data := []int{1, 2, 3}
pointers := make([]*int, 3)
for i, v := range data {
pointers[i] = &v
}
You might expect the 'pointers' slice to hold addresses of elements in 'data' slice, but that's not the case. In each iteration of the loop, variable 'v' gets a new value but its memory address doesn't change because it's a loop variable. As a result, each element in 'pointers' slice points to the same memory location - the address of the loop variable 'v'. The final value of 'v' is '3', and since all pointers point to 'v', dereferencing the pointers would yield '3' regardless of the pointer's index in the slice.
To avoid implicit memory aliasing in for loops in Go, you should address the actual elements in the original data structure, like so:
data := []int{1, 2, 3}
pointers := make([]*int, 3)
for i := range data {
pointers[i] = &data[i]
}
In this example, each pointer in the 'pointers' slice correctly points to the respective element in the 'data' slice.
Learn More
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the linked stack overflow issue, should be fine with go 1.22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certain things can be refactored and test coverage improved, functionally looks good to me. Given this functionality is behind SlowStart check we can merge and take care of those items in next release.
* First pass slow rollout dap * Fix autogenerated files * Remove timeout max_unavailable env vars * review suggestions * unblock rollout after completion * no slow start after completion
What does this PR do?
When DSs are first created, they spin up pods on all nodes matching their selector/affinity. With the node agent and profiles, this can be problematic in large clusters in which hundreds to thousands of node agent pods must be spun up at a time. This progressively labels nodes for profiles to target to slow down the initial rollout. Future updates to the DS will follow the update strategy defined in the DS
Motivation
https://datadoghq.atlassian.net/browse/CECO-1382
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
Write there any instructions and details you may have to test your PR.
Checklist
bug
,enhancement
,refactoring
,documentation
,tooling
, and/ordependencies
qa/skip-qa
label