Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cstor-pool-mgmt): fix livenessprobe in cStor pool deployment #1544

Merged
merged 4 commits into from
Dec 11, 2019

Conversation

mittachaitu
Copy link

@mittachaitu mittachaitu commented Dec 10, 2019

Signed-off-by: mittachaitu [email protected]

What this PR does / why we need it:
This PR fixes the liveness probe on cstor-pool container by adding timeout setting for command execution(run a command with a time limit). timeout will be helpful in a case when the disks are detached from the node and when liveness triggers command(zfs set... command) it will be hung forever and kubelet will not treat them as a failures. Kubelet also retries execute the same command after timeoutSeconds mentioned in the liveness probe. By triggering timeout 120 zfs set io.openebs:livenesstimestamp="$(date +%s)" cstor-<pool_name> will kill the process if it exceeds more than 120 seconds and returns non-zero exit status.

Note:

  1. When the disks are detached from the node cStor pool container will restart after 480 seconds i.e 8 minutes(tested in 1.4.8 Kubernetes version).

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #openebs/openebs#2852

Special notes for your reviewer:

Checklist:

  • Fixes #
  • Labelled this PR & related issue with documentation tag
  • PR messages has document related information
  • Labelled this PR & related issue with breaking-changes tag
  • PR messages has breaking changes related information
  • Labelled this PR & related issue with requires-upgrade tag
  • PR messages has upgrade related information
  • Commit has unit tests
  • Commit has integration tests

@mittachaitu mittachaitu added the pr/upgrade-changes-pending This PR requires upgrade changes to be automated label Dec 10, 2019
@mittachaitu mittachaitu self-assigned this Dec 10, 2019
@mittachaitu mittachaitu changed the title fix(cstor-pool-mgmt): fix livenessprobe in cstor pool fix(cstor-pool-mgmt): fix livenessprobe in cStor pool deployment Dec 10, 2019
Copy link
Contributor

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.. but, how do we do the testing of this?

@vishnuitta
Copy link
Contributor

@singhmeghna79 would you be able to review this change and also try out if possible?

Copy link
Contributor

@sonasingh46 sonasingh46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pkg/install/v1alpha1/cstor_pool.go Outdated Show resolved Hide resolved
@sonasingh46
Copy link
Contributor

A similar fix needs to be propagated to the cspi mgmt YAML spec that can be done in a separate PR.

@mittachaitu
Copy link
Author

A similar fix needs to be propagated to the cspi mgmt YAML spec that can be done in a separate PR.

Handled in PR #1546

@kmova kmova merged commit 7a088a0 into openebs-archive:master Dec 11, 2019
@kmova kmova added this to the 1.5.0 milestone Dec 11, 2019
prateekpandey14 pushed a commit to prateekpandey14/maya that referenced this pull request Dec 11, 2019
…nebs-archive#1544)

This PR fixes the liveness probe on cstor-pool container by adding `timeout` setting for command execution(run a command with a time limit). `timeout` will be helpful in a case when the disks are detached from the node and when liveness triggers command(zfs set... command) it will be hung forever and kubelet will not treat them as a failures. 

Kubelet also retries execute the same command after timeoutSeconds mentioned in the liveness probe. By triggering timeout 120 zfs set io.openebs:livenesstimestamp="$(date +%s)" cstor-<pool_name> will kill the process if it exceeds more than 120 seconds and returns non-zero exit status.


Signed-off-by: mittachaitu <[email protected]>
kmova pushed a commit that referenced this pull request Dec 11, 2019
This PR fixes the liveness probe on cstor-pool container by adding `timeout` setting for command execution(run a command with a time limit). `timeout` will be helpful in a case when the disks are detached from the node and when liveness triggers command(zfs set... command) it will be hung forever and kubelet will not treat them as a failures. 

Kubelet also retries execute the same command after timeoutSeconds mentioned in the liveness probe. By triggering timeout 120 zfs set io.openebs:livenesstimestamp="$(date +%s)" cstor-<pool_name> will kill the process if it exceeds more than 120 seconds and returns non-zero exit status.


Signed-off-by: mittachaitu <[email protected]>
shubham14bajpai pushed a commit to shubham14bajpai/maya that referenced this pull request Dec 27, 2019
…nebs-archive#1544)

This PR fixes the liveness probe on cstor-pool container by adding `timeout` setting for command execution(run a command with a time limit). `timeout` will be helpful in a case when the disks are detached from the node and when liveness triggers command(zfs set... command) it will be hung forever and kubelet will not treat them as a failures. 

Kubelet also retries execute the same command after timeoutSeconds mentioned in the liveness probe. By triggering timeout 120 zfs set io.openebs:livenesstimestamp="$(date +%s)" cstor-<pool_name> will kill the process if it exceeds more than 120 seconds and returns non-zero exit status.


Signed-off-by: mittachaitu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: CStorPoolMgmt pr/upgrade-changes-pending This PR requires upgrade changes to be automated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants