-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(status): fetch volume status using controller podIP #112
fix(status): fetch volume status using controller podIP #112
Conversation
858c571
to
be019a0
Compare
Signed-off-by: shubham <[email protected]>
Signed-off-by: shubham <[email protected]>
Can you please verify the following scenarios where a node containing both controller and replica is brought down:
|
Tried the above scenario:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided few questions
if err != nil { | ||
// log err only, as controller must be in container creating state | ||
// don't return err as it will dump stack trace unneccesary | ||
logrus.Infof("failed to get controller pod ip for volume %s: %s", instance.Name, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be warning? instead of info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens mostly when controller pod is in container creating state. Having it as info is good enough.
add field selector to pod list for Running controller pods fix issue with race between pending replica pvc deletion and bound Signed-off-by: shubham <[email protected]>
Observed another issue with replica movement on node deletion that the replica pod and pvc gets deleted again and again
This was happening as the new pending pod does not have node name annotation on it. Added a check for that as well which fixes this issue. |
Signed-off-by: shubham <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: shubham <[email protected]>
…ive#112) Signed-off-by: shubham <[email protected]>
Signed-off-by: shubham <[email protected]>
Signed-off-by: shubham [email protected]
This PR address 2 issues:
Reason: Fetching the stats using the service IP repeatedly can cause dns issues which fails occasionally and causes cycling of status
Fix: Use controller pod IP to fetch the stats and maintain an updated map for all volumes and pod IPs
The above fix will help here as well
Tests covered manually: