Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data lost after reboot #251

Closed
daresheep opened this issue Feb 25, 2021 · 16 comments
Closed

Data lost after reboot #251

daresheep opened this issue Feb 25, 2021 · 16 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@daresheep
Copy link

Hello,

I had using csi-driver-host-path V1.5 .0

After reboot system, both pod had been crashed.

describe pods , information this:

Events:
  Type     Reason                  Age              From                     Message
  ----     ------                  ----             ----                     -------
  Normal   Scheduled               20s              default-scheduler        Successfully assigned default/virt-launcher-firewall-wlh69 to ceph1
  Normal   SuccessfulAttachVolume  20s              attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-df463bd3-488b-4e03-b828-5923290f6cdb"
  Warning  FailedMount             0s (x4 over 4s)  kubelet                  MountVolume.SetUp failed for volume "pvc-df463bd3-488b-4e03-b828-5923290f6cdb" : rpc error: code = NotFound desc = volume id d2ec3050-7782-11eb-b03e-46ba88f41811 does not exist in the volumes list

After reboot, the mount information were losted, but discoveryExistingVolumes() is reading the data form "findmnt",

This makes all of volume information lost.

Can someone have other idea???

Thank you....

@daresheep
Copy link
Author

@pohly

Sir, Can you give me some help, thanks.

@pohly
Copy link
Contributor

pohly commented Feb 25, 2021

I had using csi-driver-host-path V1.5 .0

After the reboot you are still using that version? There were some changes in the code in v1.6.0, but nothing that should have made things worse. Just want to be sure.

Looking at the code, I suspect it was never meant to survive a reboot. Remember, this is a demo driver. It doesn't support all use-cases of a real driver.

Having said that, a PR which enhances the tracking of local volumes and snapshots would be welcome. V1.6.0 introduced capacity simulation, and the size of volumes are known to get lost when restarting the pod.

/help

@k8s-ci-robot
Copy link
Contributor

@pohly:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

I had using csi-driver-host-path V1.5 .0

After the reboot you are still using that version? There were some changes in the code in v1.6.0, but nothing that should have made things worse. Just want to be sure.

Looking at the code, I suspect it was never meant to survive a reboot. Remember, this is a demo driver. It doesn't support all use-cases of a real driver.

Having said that, a PR which enhances the tracking of local volumes and snapshots would be welcome. V1.6.0 introduced capacity simulation, and the size of volumes are known to get lost when restarting the pod.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 25, 2021
@daresheep
Copy link
Author

daresheep commented Feb 26, 2021

Thank you for your help!

After the reboot you are still using that version? 

Yes, already using V1.5.0

Just upgrade to V1.6.0, this issue already exist.

I think i need to setup other CSI driver to handler this.

Thanks again.

@stoneshi-yunify
Copy link
Contributor

I encountered this issue too with latest release (v1.6.2). I looked at the code and I think I've known the reason, the func discoveryExistingVolumes can not be used to discover existing volumes after reboot. It can only survive a pod restart, not a node reboot. I managed to get it work by getting the existing volumes from the PersistentVolumes.

@pohly Could you please take a look at my code and give any suggestions? If you agree I can open a PR (sure I will refine my code and add some unit tests). Thanks very much!

@pohly
Copy link
Contributor

pohly commented Apr 7, 2021

That function is also broken in other ways. I ran into that when trying to update the driver in Kubernetes E2E testing:
#210 (comment)

Let's use this issue to track that rewrite of the state saving code.

/reopen
/cc @fengzixu

@k8s-ci-robot k8s-ci-robot reopened this Apr 7, 2021
@k8s-ci-robot
Copy link
Contributor

@pohly: Reopened this issue.

In response to this:

That function is also broken in other ways. I ran into that when trying to update the driver in Kubernetes E2E testing:
#210 (comment)

Let's use this issue to track that rewrite of the state saving code.

/reopen
/cc @fengzixu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor

pohly commented Apr 7, 2021

@fengzixu you said that you wanted to work on this. Can you give an estimate when you might be done? This is relatively urgent because it blocks using the 1.5 and 1.6 driver releases for testing.

@fengzixu
Copy link
Contributor

fengzixu commented Apr 7, 2021

@fengzixu you said that you wanted to work on this. Can you give an estimate when you might be done? This is relatively urgent because it blocks using the 1.5 and 1.6 driver releases for testing.

@pohly I have worked on it. Is is ok for you to submit the fixing PR on next Monday? If there is any change about this time, I will sync up with you in this issue

@pohly
Copy link
Contributor

pohly commented Apr 7, 2021

Sounds good.

@fengzixu
Copy link
Contributor

fengzixu commented Apr 12, 2021

Updated: I am working on it today. But my work is little heavy. Let me sync up If I can submit this PR by tonight

@pohly
Copy link
Contributor

pohly commented Apr 30, 2021

Recovering state after a driver restart was fixed in #277.

However, the original ask in this issue was to also support host reboots. That's a bit different because mounted volumes become unmounted and need to be mounted again.

I don't think the hostpath driver needs to support that. It is clearly marked as "don't use in production" and I prefer to not add code that isn't needed for its original purpose (demos, E2E testing).

@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

humblec added a commit to humblec/csi-driver-host-path that referenced this issue May 24, 2024
4967685 Merge pull request kubernetes-csi#254 from bells17/add-github-actions
d9bd160 Update skip list in codespell GitHub Action
adb3af9 Merge pull request kubernetes-csi#252 from bells17/update-go-version
f5aebfc Add GitHub Actions workflows
b82ee38 Merge pull request kubernetes-csi#253 from bells17/fix-typo
c317456 Fix typo
0a78505 Bump to Go 1.22.3
edd89ad Merge pull request kubernetes-csi#251 from jsafrane/add-logcheck
043fd09 Add test-logcheck target
d7535ae Merge pull request kubernetes-csi#250 from jsafrane/go-1.22
b52e7ad Update go to 1.22.2
14fdb6f Merge pull request kubernetes-csi#247 from msau42/prow
9b4352e Update release playbook
c7bb972 Fix release notes script to use fixed tags
463a0e9 Add script to update specific go modules

git-subtree-dir: release-tools
git-subtree-split: 4967685
TerryHowe pushed a commit to TerryHowe/csi-driver-host-path that referenced this issue Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants