-
Notifications
You must be signed in to change notification settings - Fork 255
NMI retries and ticker for periodic sync reconcile #272
Conversation
pkg/crd/crd.go
Outdated
@@ -267,7 +267,7 @@ func (c *Client) ListPodIds(podns, podname string) (*[]aadpodid.AzureIdentity, e | |||
|
|||
var matchedIds []aadpodid.AzureIdentity | |||
for _, v := range azAssignedIDList.(*aadpodid.AzureAssignedIdentityList).Items { | |||
if v.Spec.Pod == podname && v.Spec.PodNamespace == podns { | |||
if v.Spec.Pod == podname && v.Spec.PodNamespace == podns && v.Status.Status == "Assigned" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these states a constant ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial plan, but the only confusion is IdentityCreated is an int (EventType) in the v1 pkg.
pkg/mic/mic.go
Outdated
glog.Info("Sync thread started.") | ||
var event aadpodid.EventType | ||
for { | ||
select { | ||
case <-exit: | ||
return | ||
case event = <-c.EventChannel: | ||
glog.V(6).Infof("Received event: %v", event) | ||
case <-ticker.C: | ||
glog.V(6).Infof("Running sync retry loop") | ||
} | ||
|
||
stats.Init() | ||
// This is the only place where the AzureAssignedIdentity creation is initiated. | ||
begin := time.Now() | ||
workDone := false | ||
glog.V(6).Infof("Received event: %v", event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will event show here ?
msg := fmt.Sprintf("no AzureAssignedIdentity found for pod:%s/%s", podns, podname) | ||
logger.Errorf("%s, %+v", msg, err) | ||
http.Error(w, msg, http.StatusForbidden) | ||
http.Error(w, msg, http.StatusNotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hosthandler and mishandler looks identical through most parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please unify the new code which you are adding into one single function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this as a TODO and refactor in the next PR. Have a few questions on why we check filter identities using force namespaced only in 1 handler and not another. We can go over that during the review.
pkg/nmi/server/server.go
Outdated
logger.Warningf("failed to get assigned ids for pod:%s/%s, retrying attempt: %d", podns, podname, attempt) | ||
time.Sleep(listPodIDsRetryIntervalInSeconds * time.Second) | ||
} | ||
return nil, fmt.Errorf("getting assigned identities for pod %s/%s failed after %d attempts. Error: %v", podns, podname, maxAttempts, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this error propagate to the caller in the application ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this will be propagated if it fails after retries.
pkg/nmi/server/server.go
Outdated
} | ||
attempt++ | ||
logger.Warningf("failed to get assigned ids for pod:%s/%s, retrying attempt: %d", podns, podname, attempt) | ||
time.Sleep(listPodIDsRetryIntervalInSeconds * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we break if we go >= attempt
for { | ||
podIDs, err = kubeClient.ListPodIds(podns, podname) | ||
if err == nil && len(*podIDs) != 0 { | ||
if len(rqClientID) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are two kinds of error, one is that after a few seconds, we still did not find the AzureAssignedIdentity and another is that it never moved to Assigned. Not able to gather that different from retry attempts and varying time durations for them..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry attempts are not different based on the states. The final desired state is Assigned
. So I think this encompasses all possible scenarios.
update retry
Ran 10 of 10 Specs in 4998.515 seconds |
@@ -111,7 +111,7 @@ var _ = Describe("Kubernetes cluster using aad-pod-identity", func() { | |||
fmt.Println("\nTearing down the test environment...") | |||
|
|||
// Ensure a clean cluster after the end of each test | |||
cmd := exec.Command("kubectl", "delete", "AzureIdentity,AzureIdentityBinding,AzureAssignedIdentity", "--all") | |||
cmd := exec.Command("kubectl", "delete", "AzureIdentity,AzureIdentityBinding", "--all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing azureassignedidentity from the delete command. It should be deleted by MIC once the identity and binding are removed and it's a good test to ensure we reliably delete the assigned identities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is the exception we should add the check at the end to see if no AzureAssignedIdentities exists and fail the test if it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already ensure the length of azureassignedidentities is equal to 0 at the end of AfterEach
.
@@ -141,7 +141,7 @@ e2e: | |||
|
|||
.PHONY: unit-test | |||
unit-test: | |||
go test $(shell go list ./... | grep -v /test/e2e) -v | |||
go test -count=1 $(shell go list ./... | grep -v /test/e2e) -v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usage of count here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to ensure the test runs at least once instead of showing the cached result.
pkg/crd/crd.go
Outdated
@@ -268,7 +268,11 @@ func (c *Client) ListPodIds(podns, podname string) (*[]aadpodid.AzureIdentity, e | |||
var matchedIds []aadpodid.AzureIdentity | |||
for _, v := range azAssignedIDList.(*aadpodid.AzureAssignedIdentityList).Items { | |||
if v.Spec.Pod == podname && v.Spec.PodNamespace == podns { | |||
matchedIds = append(matchedIds, *v.Spec.AzureIdentityRef) | |||
for _, desiredState := range assignedIDStates { | |||
if v.Status.Status == desiredState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings.EqualFold ? or compareState function which we can change as per version ?
pkg/nmi/server/server.go
Outdated
func listPodIDsInCreatedStateWithRetries(ctx context.Context, kubeClient k8s.Client, logger *log.Entry, podns, podname, rqClientID string, maxAttempts int) (*[]aadpodid.AzureIdentity, error) { | ||
// created and assigned states are added to statesToCheck. This is to ensure if the call is made after the assigned identity is in assigned state | ||
// for the pod, we shouldn't fail on the hard check for created state. Both states are accepted values for passing first check. | ||
return listPodIDsWithRetry(ctx, kubeClient, logger, podns, podname, rqClientID, []string{aadpodid.AssignedIDCreated, aadpodid.AssignedIDAssigned}, maxAttempts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are both state assigned ids added when the function name indicates we are going to gather Created ones.
pkg/nmi/server/server.go
Outdated
logger.Errorf("%s, %+v", msg, err) | ||
http.Error(w, msg, http.StatusForbidden) | ||
http.Error(w, msg, http.StatusNotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to do it two times ? Shouldn't we check and do accordingly ?
cmd/mic/main.go
Outdated
@@ -24,6 +26,8 @@ func main() { | |||
flag.StringVar(&cloudconfig, "cloudconfig", "", "Path to cloud config e.g. Azure.json file") | |||
flag.BoolVar(&forceNamespaced, "forceNamespaced", false, "Forces namespaced identities, binding, and assignment") | |||
flag.BoolVar(&versionInfo, "version", false, "Prints the version information") | |||
flag.StringVar(&syncRetryInterval, "syncRetryInterval", "3600s", "The interval in seconds at which sync loop should periodically check for errors and reconcile.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to duration var
Ran 11 of 11 Specs in 3839.814 seconds
SUCCESS! -- 11 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestAADPodIdentity (3839.81s)
PASS
ok github.com/Azure/aad-pod-identity/test/e2e 3839.837s |
for _, v := range azAssignedIDList.(*aadpodid.AzureAssignedIdentityList).Items { | ||
if v.Spec.Pod == podname && v.Spec.PodNamespace == podns { | ||
matchedIds = append(matchedIds, *v.Spec.AzureIdentityRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add a comment in the next PR indicating the backward compatibility aspects.
@@ -85,6 +85,8 @@ metadata: | |||
name: nmi | |||
namespace: {{.Namespace}} | |||
spec: | |||
updateStrategy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significance of this ?
} | ||
logger.Warningf("failed to get assigned ids for pod:%s/%s in ASSIGNED state, retrying attempt: %d", podns, podname, attempt) | ||
} | ||
return nil, fmt.Errorf("getting assigned identities for pod %s/%s in ASSIGNED state failed after %d attempts. Error: %v", podns, podname, 2*maxAttemptsPerCheck, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have the duration in the error logs on how much it tried to do retry.
Reason for Change:
404
instead of403
when no valid assigned ids found after retries.Issue Fixed:
fixes #257
Notes for Reviewers: