Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: 71804: Use UnmountMountPoint util to clean up subpaths #22396

Merged

Conversation

jsafrane
Copy link
Contributor

This fixes flakes found by upstream gluster e2e suite - sometimes it terminates gluster server before clients finish their cleanup.

kubernetes/kubernetes#71804
Upstream issue: kubernetes/kubernetes#71584

@openshift/sig-storage

@openshift-ci-robot openshift-ci-robot added sig/storage size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 22, 2019
@bertinatto
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 25, 2019
@bertinatto
Copy link
Member

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 25, 2019
@jsafrane
Copy link
Contributor Author

@bertinatto, I fixed e2e test (You may only call It from within a Describe or Context), PTAL

@bertinatto
Copy link
Member

/retest

1 similar comment
@jsafrane
Copy link
Contributor Author

/retest

@bertinatto
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bertinatto, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 02df482 into openshift:master Mar 26, 2019
@wongma7
Copy link
Contributor

wongma7 commented Mar 26, 2019

The new test is failing/flaking? for nfs https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-4.0&sort-by-flakiness=&width=5&include-filter-by-regex=subPath.should.be.able.to.unmount

For some reason, we are ioutil.ReadDir'ing a container directory instead of the parent volume directory and then hitting stale file handle errors.
For example:

error reading /var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/volume-subpaths/pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a/test-container-subpath-nfs-dynamicpv-xxnv: lstat /var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/volume-subpaths/pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a/test-container-subpath-nfs-dynamicpv-xxnv/0: stale NFS file handle
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L881

subPathDir is
/var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/volume-subpaths/pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a/test-container-subpath-nfs-dynamicpv-xxnv
shouldn't it be
/var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/volume-subpaths/pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a?
subPathDir := filepath.Join(podDir, containerSubPathDirectoryName, volumeName)
podDir = /var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/
containerSubPathDirectoryName = volume-subpaths
volumeName = pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a

So where did test-container-subpath-nfs-dynamicpv-xxnv at the end of the path come from?

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/144/pull-ci-openshift-cluster-version-operator-master-e2e-aws/493


openshift-tests [sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Dynamic PV (default fs)] subPath should be able to unmount after the subpath directory is deleted [Suite:openshift/conformance/parallel] [Suite:k8s] 15m0s

pod-subpath-test-nfs-dynamicpv-xxnv
ip-10-0-187-22.ec2.internal
f760fcc6-4fe6-11e9-9cee-12bab6c09e9a

Mar 26 16:48:47 ip-10-0-131-185 hyperkube[962]: E0326 16:48:47.455823 962 nestedpendingoperations.go:278] Operation for ""kubernetes.io/nfs/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a-pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a" ("f760fcc6-4fe6-11e9-9cee-12bab6c09e9a")" failed. No retries permitted until 2019-03-26 16:48:47.955768951 +0000 UTC m=+1562.231129938 (durationBeforeRetry 500ms). Error: "error cleaning subPath mounts for volume "test-volume" (UniqueName: "kubernetes.io/nfs/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a-pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a") pod "f760fcc6-4fe6-11e9-9cee-12bab6c09e9a" (UID: "f760fcc6-4fe6-11e9-9cee-12bab6c09e9a") : error reading /var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/volume-subpaths/pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a/test-container-subpath-nfs-dynamicpv-xxnv: lstat /var/lib/kubelet/pods/f760fcc6-4fe6-11e9-9cee-12bab6c09e9a/volume-subpaths/pvc-f4edd68e-4fe6-11e9-9cee-12bab6c09e9a/test-container-subpath-nfs-dynamicpv-xxnv/0: stale NFS file handle"


openshift-tests [sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Inline-volume (default fs)] subPath should be able to unmount after the subpath directory is deleted [Suite:openshift/conformance/parallel] [Suite:k8s] 15m0s

pod-subpath-test-nfs-dzwq
ip-10-0-131-185.ec2.internal
3db5d95d-4fe7-11e9-ab53-028b3e5f8e58

Mar 26 16:50:40 ip-10-0-131-185 hyperkube[962]: E0326 16:50:40.096646 962 nestedpendingoperations.go:278] Operation for ""kubernetes.io/nfs/3db5d95d-4fe7-11e9-ab53-028b3e5f8e58-test-volume" ("3db5d95d-4fe7-11e9-ab53-028b3e5f8e58")" failed. No retries permitted until 2019-03-26 16:50:40.596599019 +0000 UTC m=+1674.871960020 (durationBeforeRetry 500ms). Error: "error cleaning subPath mounts for volume "test-volume" (UniqueName: "kubernetes.io/nfs/3db5d95d-4fe7-11e9-ab53-028b3e5f8e58-test-volume") pod "3db5d95d-4fe7-11e9-ab53-028b3e5f8e58" (UID: "3db5d95d-4fe7-11e9-ab53-028b3e5f8e58") : error reading /var/lib/kubelet/pods/3db5d95d-4fe7-11e9-ab53-028b3e5f8e58/volume-subpaths/test-volume/test-container-subpath-nfs-dzwq: lstat /var/lib/kubelet/pods/3db5d95d-4fe7-11e9-ab53-028b3e5f8e58/volume-subpaths/test-volume/test-container-subpath-nfs-dzwq/0: stale NFS file handle"


openshift-tests [sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Pre-provisioned PV (default fs)] subPath should be able to unmount after the subpath directory is deleted [Suite:openshift/conformance/parallel] [Suite:k8s] 15m0s

pod-subpath-test-nfs-preprovisionedpv-qcrj
ip-10-0-131-185.ec2.internal
db7442aa-4fe4-11e9-8999-0e0b46216998

Mar 26 16:34:00 ip-10-0-131-185 hyperkube[962]: E0326 16:34:00.153375 962 nestedpendingoperations.go:278] Operation for ""kubernetes.io/nfs/db7442aa-4fe4-11e9-8999-0e0b46216998-nfs-c6jbm" ("db7442aa-4fe4-11e9-8999-0e0b46216998")" failed. No retries permitted until 2019-03-26 16:34:00.653326705 +0000 UTC m=+674.928687706 (durationBeforeRetry 500ms). Error: "error cleaning subPath mounts for volume "test-volume" (UniqueName: "kubernetes.io/nfs/db7442aa-4fe4-11e9-8999-0e0b46216998-nfs-c6jbm") pod "db7442aa-4fe4-11e9-8999-0e0b46216998" (UID: "db7442aa-4fe4-11e9-8999-0e0b46216998") : error reading /var/lib/kubelet/pods/db7442aa-4fe4-11e9-8999-0e0b46216998/volume-subpaths/nfs-c6jbm/test-container-subpath-nfs-preprovisionedpv-qcrj: lstat /var/lib/kubelet/pods/db7442aa-4fe4-11e9-8999-0e0b46216998/volume-subpaths/nfs-c6jbm/test-container-subpath-nfs-preprovisionedpv-qcrj/0: stale NFS file handle"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. sig/storage size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants