Data lost after reboot #251

daresheep · 2021-02-25T10:02:58Z

Hello,

I had using csi-driver-host-path V1.5 .0

After reboot system, both pod had been crashed.

describe pods , information this:

Events:
  Type     Reason                  Age              From                     Message
  ----     ------                  ----             ----                     -------
  Normal   Scheduled               20s              default-scheduler        Successfully assigned default/virt-launcher-firewall-wlh69 to ceph1
  Normal   SuccessfulAttachVolume  20s              attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-df463bd3-488b-4e03-b828-5923290f6cdb"
  Warning  FailedMount             0s (x4 over 4s)  kubelet                  MountVolume.SetUp failed for volume "pvc-df463bd3-488b-4e03-b828-5923290f6cdb" : rpc error: code = NotFound desc = volume id d2ec3050-7782-11eb-b03e-46ba88f41811 does not exist in the volumes list

After reboot, the mount information were losted, but discoveryExistingVolumes() is reading the data form "findmnt",

This makes all of volume information lost.

Can someone have other idea???

Thank you....

The text was updated successfully, but these errors were encountered:

daresheep · 2021-02-25T10:03:55Z

@pohly

Sir, Can you give me some help, thanks.

pohly · 2021-02-25T16:24:42Z

I had using csi-driver-host-path V1.5 .0

After the reboot you are still using that version? There were some changes in the code in v1.6.0, but nothing that should have made things worse. Just want to be sure.

Looking at the code, I suspect it was never meant to survive a reboot. Remember, this is a demo driver. It doesn't support all use-cases of a real driver.

Having said that, a PR which enhances the tracking of local volumes and snapshots would be welcome. V1.6.0 introduced capacity simulation, and the size of volumes are known to get lost when restarting the pod.

/help

k8s-ci-robot · 2021-02-25T16:24:43Z

@pohly:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

I had using csi-driver-host-path V1.5 .0

After the reboot you are still using that version? There were some changes in the code in v1.6.0, but nothing that should have made things worse. Just want to be sure.

Looking at the code, I suspect it was never meant to survive a reboot. Remember, this is a demo driver. It doesn't support all use-cases of a real driver.

Having said that, a PR which enhances the tracking of local volumes and snapshots would be welcome. V1.6.0 introduced capacity simulation, and the size of volumes are known to get lost when restarting the pod.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

daresheep · 2021-02-26T06:41:56Z

Thank you for your help!

After the reboot you are still using that version?

Yes, already using V1.5.0

Just upgrade to V1.6.0, this issue already exist.

I think i need to setup other CSI driver to handler this.

Thanks again.

stoneshi-yunify · 2021-04-07T07:22:40Z

I encountered this issue too with latest release (v1.6.2). I looked at the code and I think I've known the reason, the func discoveryExistingVolumes can not be used to discover existing volumes after reboot. It can only survive a pod restart, not a node reboot. I managed to get it work by getting the existing volumes from the PersistentVolumes.

@pohly Could you please take a look at my code and give any suggestions? If you agree I can open a PR (sure I will refine my code and add some unit tests). Thanks very much!

pohly · 2021-04-07T08:32:39Z

That function is also broken in other ways. I ran into that when trying to update the driver in Kubernetes E2E testing:
#210 (comment)

Let's use this issue to track that rewrite of the state saving code.

/reopen
/cc @fengzixu

k8s-ci-robot · 2021-04-07T08:32:41Z

@pohly: Reopened this issue.

In response to this:

That function is also broken in other ways. I ran into that when trying to update the driver in Kubernetes E2E testing:
#210 (comment)

Let's use this issue to track that rewrite of the state saving code.

/reopen
/cc @fengzixu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2021-04-07T08:33:58Z

@fengzixu you said that you wanted to work on this. Can you give an estimate when you might be done? This is relatively urgent because it blocks using the 1.5 and 1.6 driver releases for testing.

fengzixu · 2021-04-07T08:36:35Z

@fengzixu you said that you wanted to work on this. Can you give an estimate when you might be done? This is relatively urgent because it blocks using the 1.5 and 1.6 driver releases for testing.

@pohly I have worked on it. Is is ok for you to submit the fixing PR on next Monday? If there is any change about this time, I will sync up with you in this issue

pohly · 2021-04-07T08:39:24Z

Sounds good.

fengzixu · 2021-04-12T10:21:21Z

Updated: I am working on it today. But my work is little heavy. Let me sync up If I can submit this PR by tonight

pohly · 2021-04-30T12:30:32Z

Recovering state after a driver restart was fixed in #277.

However, the original ask in this issue was to also support host reboots. That's a bit different because mounted volumes become unmounted and need to be mounted again.

I don't think the hostpath driver needs to support that. It is clearly marked as "don't use in production" and I prefer to not add code that isn't needed for its original purpose (demos, E2E testing).

k8s-triage-robot · 2021-07-29T12:54:46Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-28T13:44:34Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-09-27T14:11:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-09-27T14:11:25Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

4967685 Merge pull request kubernetes-csi#254 from bells17/add-github-actions d9bd160 Update skip list in codespell GitHub Action adb3af9 Merge pull request kubernetes-csi#252 from bells17/update-go-version f5aebfc Add GitHub Actions workflows b82ee38 Merge pull request kubernetes-csi#253 from bells17/fix-typo c317456 Fix typo 0a78505 Bump to Go 1.22.3 edd89ad Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck 043fd09 Add test-logcheck target d7535ae Merge pull request kubernetes-csi#250 from jsafrane/go-1.22 b52e7ad Update go to 1.22.2 14fdb6f Merge pull request kubernetes-csi#247 from msau42/prow 9b4352e Update release playbook c7bb972 Fix release notes script to use fixed tags 463a0e9 Add script to update specific go modules git-subtree-dir: release-tools git-subtree-split: 4967685

Add test-logcheck target

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 25, 2021

daresheep closed this as completed Feb 26, 2021

k8s-ci-robot reopened this Apr 7, 2021

fengzixu mentioned this issue Apr 18, 2021

initial implementation of refactor host-path-driver #276

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2021

k8s-ci-robot closed this as completed Sep 27, 2021

TerryHowe pushed a commit to TerryHowe/csi-driver-host-path that referenced this issue Oct 17, 2024

Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck

edd89ad

Add test-logcheck target

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data lost after reboot #251

Data lost after reboot #251

daresheep commented Feb 25, 2021

daresheep commented Feb 25, 2021

pohly commented Feb 25, 2021

k8s-ci-robot commented Feb 25, 2021

daresheep commented Feb 26, 2021 •

edited

Loading

stoneshi-yunify commented Apr 7, 2021

pohly commented Apr 7, 2021

k8s-ci-robot commented Apr 7, 2021

pohly commented Apr 7, 2021

fengzixu commented Apr 7, 2021

pohly commented Apr 7, 2021

fengzixu commented Apr 12, 2021 •

edited

Loading

pohly commented Apr 30, 2021

k8s-triage-robot commented Jul 29, 2021

k8s-triage-robot commented Aug 28, 2021

k8s-triage-robot commented Sep 27, 2021

k8s-ci-robot commented Sep 27, 2021

Data lost after reboot #251

Data lost after reboot #251

Comments

daresheep commented Feb 25, 2021

daresheep commented Feb 25, 2021

pohly commented Feb 25, 2021

k8s-ci-robot commented Feb 25, 2021

daresheep commented Feb 26, 2021 • edited Loading

stoneshi-yunify commented Apr 7, 2021

pohly commented Apr 7, 2021

k8s-ci-robot commented Apr 7, 2021

pohly commented Apr 7, 2021

fengzixu commented Apr 7, 2021

pohly commented Apr 7, 2021

fengzixu commented Apr 12, 2021 • edited Loading

pohly commented Apr 30, 2021

k8s-triage-robot commented Jul 29, 2021

k8s-triage-robot commented Aug 28, 2021

k8s-triage-robot commented Sep 27, 2021

k8s-ci-robot commented Sep 27, 2021

daresheep commented Feb 26, 2021 •

edited

Loading

fengzixu commented Apr 12, 2021 •

edited

Loading