Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(status): fetch volume status using controller podIP #112

Merged
merged 5 commits into from
Jul 13, 2021

Conversation

shubham14bajpai
Copy link
Contributor

@shubham14bajpai shubham14bajpai commented Jul 12, 2021

Signed-off-by: shubham [email protected]

This PR address 2 issues:

  • Rapidly cycling Unknown and Ready status:
    Reason: Fetching the stats using the service IP repeatedly can cause dns issues which fails occasionally and causes cycling of status
    Fix: Use controller pod IP to fetch the stats and maintain an updated map for all volumes and pod IPs
  • Log flooding of failed tcp requests
    The above fix will help here as well

Tests covered manually:

  • volume creation and check status
  • multiple controller pod restarts
  • multiple replica pod restarts

Signed-off-by: shubham <[email protected]>
@payes
Copy link
Contributor

payes commented Jul 12, 2021

Can you please verify the following scenarios where a node containing both controller and replica is brought down:

  1. Delete the controller pod, so that now 2 controller pods are present. One in deleting state and the other in Running/ContainerCreating state.
  2. Verify if we are able to fetch the status even when one replica is down.

@payes payes requested a review from mittachaitu July 12, 2021 15:21
@shubham14bajpai
Copy link
Contributor Author

Can you please verify the following scenarios where a node containing both controller and replica is brought down:

  1. Delete the controller pod, so that now 2 controller pods are present. One in deleting state and the other in Running/ContainerCreating state.
  2. Verify if we are able to fetch the status even when one replica is down.

Tried the above scenario:

  1. While there were two controller pods one in Running other in Terminating state the status was Unknown, the pod finally terminated after 3 minutes.
  2. Once only a single controller pod was running it was able to read status of 2 replicas and eventually 3rd replica came.

Copy link
Contributor

@mittachaitu mittachaitu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provided few questions

.github/workflows/build.yaml Show resolved Hide resolved
cmd/manager/main.go Show resolved Hide resolved
if err != nil {
// log err only, as controller must be in container creating state
// don't return err as it will dump stack trace unneccesary
logrus.Infof("failed to get controller pod ip for volume %s: %s", instance.Name, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be warning? instead of info?

Copy link
Contributor Author

@shubham14bajpai shubham14bajpai Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens mostly when controller pod is in container creating state. Having it as info is good enough.

pkg/controllers/jivavolume_controller.go Outdated Show resolved Hide resolved
add field selector to pod list for Running controller pods
fix issue with race between pending replica pvc deletion and bound

Signed-off-by: shubham <[email protected]>
@shubham14bajpai
Copy link
Contributor Author

Observed another issue with replica movement on node deletion that the replica pod and pvc gets deleted again and again

$ k get pods -w
NAME                                                              READY   STATUS        RESTARTS   AGE
jiva-operator-99f6fbccb-72dkt                                     1/1     Running       0          3m24s
openebs-jiva-csi-controller-0                                     5/5     Running       0          111m
openebs-jiva-csi-node-lxtn2                                       3/3     Running       0          111m
openebs-jiva-csi-node-mq7qp                                       3/3     Running       0          74m
openebs-jiva-csi-node-mz9p6                                       3/3     Running       0          111m
openebs-localpv-provisioner-85485d7b48-9vknk                      1/1     Running       0          113m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5g27rm   2/2     Running       0          71s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5h8cms   2/2     Terminating   0          78m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-0               1/1     Running       1          74m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Running       2          83m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-2               1/1     Running       2          83m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5h8cms   2/2     Terminating   0          80m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5h8cms   2/2     Terminating   0          80m
openebs-jiva-csi-node-lxtn2                                       3/3     Terminating   0          113m
openebs-jiva-csi-node-lxtn2                                       3/3     Terminating   0          113m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Terminating   2          84m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Terminating   2          84m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
openebs-jiva-csi-node-rsgvt                                       0/3     Pending       0          0s
openebs-jiva-csi-node-rsgvt                                       0/3     Pending       0          0s
openebs-jiva-csi-node-rsgvt                                       0/3     ContainerCreating   0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Pending             0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Pending             0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     ContainerCreating   0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Completed           0          3s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Terminating         0          3s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Terminating         0          3s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending             0          7s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     ContainerCreating   0          7s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Running             0          21s
openebs-jiva-csi-node-rsgvt                                       3/3     Running             

This was happening as the new pending pod does not have node name annotation on it. Added a check for that as well which fixes this issue.

Copy link
Contributor

@prateekpandey14 prateekpandey14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@mittachaitu mittachaitu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prateekpandey14 prateekpandey14 merged commit 7b569d1 into openebs-archive:master Jul 13, 2021
shubham14bajpai added a commit to shubham14bajpai/jiva-operator that referenced this pull request Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jiva volume status shows unknown upon continuous restart of replica in a specific node
4 participants