-
Notifications
You must be signed in to change notification settings - Fork 255
Performance Improvement - reduce identity assignment time #219
Conversation
pkg/mic/mic.go
Outdated
fmt.Sprintf("Lookup of node %s for pod %s resulted in error %v", createID.Spec.NodeName, createID.Name, err)) | ||
continue | ||
} | ||
err = c.CloudClient.CheckUserMSI(id.Spec.ResourceID, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the new interface? Can remove just do the right thing?
@kkmsft I was able to validate #145 with my changes. Created close to 44 pods with 2 assigned identities. With version 1.3 it took about 29m to create all assigned identities and assign msi to node. With my changes, it takes only about a minute since it's batching all the updates with a single call. I've a lot of refactoring of my PR still though. |
Thank you @aramase. Good to know that it's moving in the right direction. After our initial discussion on the approach have not looked deeply into the changes. Please let me know when you want me to start looking at the changes. Also, any chance (I know I am being greedy here ;-)) you can convert that test to an e2e test so that we can track any regression in future. |
@kkmsft Sure, I'll add that as an e2e. Still working on testing a couple of things. I'll need to refactor the PR. I'll let you know when it's ready for review. |
@@ -112,6 +115,59 @@ func withInspection() autorest.PrepareDecorator { | |||
} | |||
} | |||
|
|||
// CheckUserMSI checks if the identity has been assigned to the node | |||
func (c *Client) CheckUserMSI(userAssignedMSIID string, node *corev1.Node) bool { | |||
idH, _, err := c.getIdentityResource(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arent' these calls costly ? in the sense that they have to ARM every time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkmsft You're right! we can avoid the multiple calls to ARM by just getting the list once and then validating existence/non-existence of all assigned ids based on that list.
pkg/mic/mic.go
Outdated
@@ -261,24 +275,48 @@ func (c *Client) Sync(exit <-chan struct{}) { | |||
|
|||
glog.V(5).Infof("Initiating assigned id creation for pod - %s, binding - %s", createID.Spec.Pod, binding.Name) | |||
|
|||
err = c.createAssignedIdentityDeps(&createID, id, node) | |||
err = c.createAssignedIdentity(&createID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can perform this create when we start looking at the node map ? Since we don't track state of the assigned identities, we could have a large time gap between the time the assignment is created and then the node assignment was made. Is that a possibility ? If so, can we mitigate it some how without additional state ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkmsft This mimics the current logic. Currently, we create the assigned identity, then make the request for node assignment. The timegap still remains the same with this change because we are batching all the assignments and making the single call. So the time for assignment still remains the same. There shouldn't be a long time gap here.
pkg/mic/mic_test.go
Outdated
@@ -159,9 +163,11 @@ func (c *TestCloudClient) CompareMSI(nodeName string, userIDs []string) bool { | |||
|
|||
func (c *TestCloudClient) PrintMSI() { | |||
for key, val := range c.ListMSI() { | |||
fmt.Printf("\nNode name: %s\n", key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this debugging logs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll clean that up.
Tests passed -
|
pkg/mic/mic.go
Outdated
glog.V(5).Infof("Initiating assigned id creation for pod - %s, binding - %s", createID.Spec.Pod, binding.Name) | ||
|
||
if isUserAssignedMSI { | ||
err := c.createAssignedIdentity(&createID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assigned identity will only be created once the identity assignment on the node is successfully complete.
pkg/mic/mic.go
Outdated
fmt.Sprintf("Binding %s removed from node %s for pod %s", removedBinding.Name, delID.Spec.NodeName, delID.Spec.Pod)) | ||
|
||
// remove assigned identity crd from cluster | ||
if err := c.removeAssignedIdentity(&delID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, removing the assigned identity only when the identity has been successfully removed from the node
@@ -231,7 +231,10 @@ var _ = Describe("Kubernetes cluster using aad-pod-identity", func() { | |||
identityValidatorName := fmt.Sprintf("identity-validator-%d", i) | |||
|
|||
setUpIdentityAndDeployment(keyvaultIdentity, fmt.Sprintf("%d", i)) | |||
time.Sleep(5 * time.Second) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the sleep, so the test can actually proceed once the assigned identity exists.
pkg/cloudprovider/cloudprovider.go
Outdated
} | ||
|
||
// AppendOrRemoveUserMSI will batch process the removal and addition of ids | ||
func (c *Client) AppendOrRemoveUserMSI(addUserAssignedMSIIDs []string, removeUserAssignedMSIIDs []string, node *corev1.Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to split this into two separate functions, one which adds and another which removes and then we can add individually add UTs for them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or may be naming can be UpdateUserMSI - AppendOrRemove feels like CreateOrUpdate which is doing a different thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the method
pkg/mic/mic.go
Outdated
|
||
func (c *Client) appendToRemoveListForNode(nodeMap map[string]trackUserAssignedMSIIds, resourceID string, node *corev1.Node) { | ||
if trackList, ok := nodeMap[node.Name]; ok { | ||
if trackList.removeUserAssignedMSIIDs != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary ?
pkg/mic/mic.go
Outdated
if err != nil { | ||
// check which all identity assignment failed | ||
// remove those assigned identities | ||
// TODO check with identity team if CreateOrUpdate results in error, is it all or some failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this before the PR gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent out an email.
pkg/mic/mic.go
Outdated
continue | ||
} | ||
idList, err := c.getUserMSIListForNode(node) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan here. If we just log and continue then idList can be having random values..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getUserMSIListForNode returns nil on error. So we'll fail all of them.
pkg/mic/mic.go
Outdated
|
||
err = c.CloudClient.AssignUserMSI(id.Spec.ResourceID, node) | ||
nodeIdentityList := make(map[string][]string) | ||
for _, n := range nodesWithError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we crash or get restarted at this time ?
@@ -626,6 +626,8 @@ func setUpIdentityAndDeployment(azureIdentityName, suffix, replicas string) { | |||
ok, err = daemonset.WaitOnReady(nmiDaemonSet) | |||
Expect(err).NotTo(HaveOccurred()) | |||
Expect(ok).To(Equal(true)) | |||
|
|||
time.Sleep(30 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this sleep in the states PR because there we can start waiting for n assigned identities with desired state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran 9 of 10 Specs in 2221.232 seconds
SUCCESS! -- 9 Passed | 0 Failed | 0 Pending | 1 Skipped
--- PASS: TestAADPodIdentity (2221.23s)
PASS
ok github.com/Azure/aad-pod-identity/test/e2e 2221.249s
5ecf29d
to
20e4e14
Compare
wip batch process createorupdate update logic update events fix events generate set to use unique values update interface reorder check flow update to using one get per node update tests add e2e test for testing scale perf create assigned identity before assignment add unit tests for UpdateUserMSI interface
} | ||
|
||
// getListOfIdsToDelete will go over the delete list to determine if the id is required to be deleted | ||
// only user assigned identity not in use are added to the remove list for the node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: two sentences.
Ran 10 of 10 Specs in 2353.337 seconds |
resolves
#145
#181
#217
Add states to the assigned identities
Note