fix(status): fetch volume status using controller podIP #112

shubham14bajpai · 2021-07-12T10:18:54Z

Signed-off-by: shubham [email protected]

This PR address 2 issues:

Rapidly cycling Unknown and Ready status:
Reason: Fetching the stats using the service IP repeatedly can cause dns issues which fails occasionally and causes cycling of status
Fix: Use controller pod IP to fetch the stats and maintain an updated map for all volumes and pod IPs
Log flooding of failed tcp requests
The above fix will help here as well

Tests covered manually:

volume creation and check status
multiple controller pod restarts
multiple replica pod restarts

Signed-off-by: shubham <[email protected]>

payes · 2021-07-12T15:21:11Z

Can you please verify the following scenarios where a node containing both controller and replica is brought down:

Delete the controller pod, so that now 2 controller pods are present. One in deleting state and the other in Running/ContainerCreating state.
Verify if we are able to fetch the status even when one replica is down.

pkg/controllers/jivavolume_controller.go

shubham14bajpai · 2021-07-12T16:19:20Z

Can you please verify the following scenarios where a node containing both controller and replica is brought down:

Delete the controller pod, so that now 2 controller pods are present. One in deleting state and the other in Running/ContainerCreating state.

Verify if we are able to fetch the status even when one replica is down.

Tried the above scenario:

While there were two controller pods one in Running other in Terminating state the status was Unknown, the pod finally terminated after 3 minutes.
Once only a single controller pod was running it was able to read status of 2 replicas and eventually 3rd replica came.

mittachaitu

provided few questions

.github/workflows/build.yaml

cmd/manager/main.go

mittachaitu · 2021-07-13T06:14:00Z

pkg/controllers/jivavolume_controller.go

+		if err != nil {
+			// log err only, as controller must be in container creating state
+			// don't return err as it will dump stack trace unneccesary
+			logrus.Infof("failed to get controller pod ip for volume %s: %s", instance.Name, err.Error())


shouldn't it be warning? instead of info?

This happens mostly when controller pod is in container creating state. Having it as info is good enough.

pkg/controllers/jivavolume_controller.go

add field selector to pod list for Running controller pods fix issue with race between pending replica pvc deletion and bound Signed-off-by: shubham <[email protected]>

shubham14bajpai · 2021-07-13T08:52:45Z

Observed another issue with replica movement on node deletion that the replica pod and pvc gets deleted again and again

$ k get pods -w
NAME                                                              READY   STATUS        RESTARTS   AGE
jiva-operator-99f6fbccb-72dkt                                     1/1     Running       0          3m24s
openebs-jiva-csi-controller-0                                     5/5     Running       0          111m
openebs-jiva-csi-node-lxtn2                                       3/3     Running       0          111m
openebs-jiva-csi-node-mq7qp                                       3/3     Running       0          74m
openebs-jiva-csi-node-mz9p6                                       3/3     Running       0          111m
openebs-localpv-provisioner-85485d7b48-9vknk                      1/1     Running       0          113m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5g27rm   2/2     Running       0          71s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5h8cms   2/2     Terminating   0          78m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-0               1/1     Running       1          74m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Running       2          83m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-2               1/1     Running       2          83m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5h8cms   2/2     Terminating   0          80m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-ctrl-6c77fb5h8cms   2/2     Terminating   0          80m
openebs-jiva-csi-node-lxtn2                                       3/3     Terminating   0          113m
openebs-jiva-csi-node-lxtn2                                       3/3     Terminating   0          113m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Terminating   2          84m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Terminating   2          84m
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Terminating   0          10s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending       0          0s
openebs-jiva-csi-node-rsgvt                                       0/3     Pending       0          0s
openebs-jiva-csi-node-rsgvt                                       0/3     Pending       0          0s
openebs-jiva-csi-node-rsgvt                                       0/3     ContainerCreating   0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Pending             0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Pending             0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     ContainerCreating   0          0s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Completed           0          3s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Terminating         0          3s
init-pvc-2d4799f5-24e4-4fd9-8bfd-81aa254f51b0                     0/1     Terminating         0          3s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     Pending             0          7s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               0/1     ContainerCreating   0          7s
pvc-ea334577-d4dc-4eba-a4e3-d07bb6c33cd8-jiva-rep-1               1/1     Running             0          21s
openebs-jiva-csi-node-rsgvt                                       3/3     Running

This was happening as the new pending pod does not have node name annotation on it. Added a check for that as well which fixes this issue.

Signed-off-by: shubham <[email protected]>

prateekpandey14

lgtm

mittachaitu

LGTM

Signed-off-by: shubham <[email protected]>

…ive#112) Signed-off-by: shubham <[email protected]>

Signed-off-by: shubham <[email protected]>

shubham14bajpai requested a review from prateekpandey14 July 12, 2021 10:18

shubham14bajpai linked an issue Jul 12, 2021 that may be closed by this pull request

jiva volume status shows unknown upon continuous restart of replica in a specific node #111

Closed

shubham14bajpai requested a review from payes July 12, 2021 10:19

shubham14bajpai force-pushed the status branch 3 times, most recently from 858c571 to be019a0 Compare July 12, 2021 11:17

fix(status): fetch volume status using controller podIP

ef3577f

Signed-off-by: shubham <[email protected]>

shubham14bajpai force-pushed the status branch from be019a0 to ef3577f Compare July 12, 2021 12:08

fix sanity test

be8a9f6

Signed-off-by: shubham <[email protected]>

shubham14bajpai force-pushed the status branch from 61035ae to be8a9f6 Compare July 12, 2021 13:03

payes requested a review from mittachaitu July 12, 2021 15:21

prateekpandey14 reviewed Jul 12, 2021

View reviewed changes

pkg/controllers/jivavolume_controller.go Show resolved Hide resolved

prateekpandey14 reviewed Jul 12, 2021

View reviewed changes

pkg/controllers/jivavolume_controller.go Outdated Show resolved Hide resolved

mittachaitu reviewed Jul 13, 2021

View reviewed changes

add check for skipping controller pod on not ready nodes

a8a8f40

add field selector to pod list for Running controller pods fix issue with race between pending replica pvc deletion and bound Signed-off-by: shubham <[email protected]>

use svc ip incase pod ip is missing

371320a

Signed-off-by: shubham <[email protected]>

prateekpandey14 approved these changes Jul 13, 2021

View reviewed changes

mittachaitu approved these changes Jul 13, 2021

View reviewed changes

fix defer call in status update

86f299b

Signed-off-by: shubham <[email protected]>

payes approved these changes Jul 13, 2021

View reviewed changes

prateekpandey14 merged commit 7b569d1 into openebs-archive:master Jul 13, 2021

shubham14bajpai added a commit to shubham14bajpai/jiva-operator that referenced this pull request Jul 13, 2021

fix(status): fetch volume status using controller podIP (openebs-arch…

cd27593

…ive#112) Signed-off-by: shubham <[email protected]>

shubham14bajpai mentioned this pull request Jul 13, 2021

fix(status): fetch volume status using controller podIP #113

Merged

prateekpandey14 pushed a commit that referenced this pull request Jul 13, 2021

fix(status): fetch volume status using controller podIP (#112) (#113)

43c61c0

Signed-off-by: shubham <[email protected]>

shubham14bajpai mentioned this pull request Jul 16, 2021

Example is not working #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(status): fetch volume status using controller podIP #112

fix(status): fetch volume status using controller podIP #112

shubham14bajpai commented Jul 12, 2021 •

edited

Loading

payes commented Jul 12, 2021

shubham14bajpai commented Jul 12, 2021

mittachaitu left a comment

mittachaitu Jul 13, 2021

shubham14bajpai Jul 13, 2021 •

edited

Loading

shubham14bajpai commented Jul 13, 2021

prateekpandey14 left a comment

mittachaitu left a comment

fix(status): fetch volume status using controller podIP #112

fix(status): fetch volume status using controller podIP #112

Conversation

shubham14bajpai commented Jul 12, 2021 • edited Loading

payes commented Jul 12, 2021

shubham14bajpai commented Jul 12, 2021

mittachaitu left a comment

Choose a reason for hiding this comment

mittachaitu Jul 13, 2021

Choose a reason for hiding this comment

shubham14bajpai Jul 13, 2021 • edited Loading

Choose a reason for hiding this comment

shubham14bajpai commented Jul 13, 2021

prateekpandey14 left a comment

Choose a reason for hiding this comment

mittachaitu left a comment

Choose a reason for hiding this comment

shubham14bajpai commented Jul 12, 2021 •

edited

Loading

shubham14bajpai Jul 13, 2021 •

edited

Loading