Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/tmp/csi-mount is sometimes not cleaned up after "Node Service"."should work" test which fails further tests #196

Closed
alexanderKhaustov opened this issue Apr 26, 2019 · 8 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@alexanderKhaustov
Copy link
Contributor

Here's csi-sanity output:

Node Service
should work
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:625
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: reusing connection to CSI driver controller at unix:///tmp/csi.sock
STEP: creating mount and staging directories
STEP: creating a single node writer volume
STEP: getting a node id
STEP: controller publishing volume
STEP: node staging volume
STEP: publishing the volume on a node
STEP: cleaning up calling nodeunpublish
STEP: cleaning up calling nodeunstage
STEP: cleaning up calling controllerunpublishing
STEP: cleaning up deleting the volume
cleanup: deleting sanity-node-full-35BAA099-984D9C20 = ef3gu6pu151v67thndsc
cleanup: warning: NodeUnpublishVolume: rpc error: code = Internal desc = Could not unmount "/tmp/csi-mount": Unmount failed: exit status 32
Unmounting arguments: /tmp/csi-mount
Output: umount: /tmp/csi-mount: not mounted

cleanup: warning: ControllerUnpublishVolume: rpc error: code = InvalidArgument desc = Disk unpublish operation failed

• [SLOW TEST:16.717 seconds]
Node Service
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
should work
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:625

ListSnapshots [Controller Server]
should return appropriate values (no optional values added)
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1579
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: reusing connection to CSI driver controller at unix:///tmp/csi.sock
STEP: creating mount and staging directories

• Failure in Spec Setup (BeforeEach) [0.001 seconds]
ListSnapshots [Controller Server]
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
should return appropriate values (no optional values added) [BeforeEach]
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1579

failed to create target directory
Unexpected error:
<*os.PathError | 0xc0003db170>: {
Op: "mkdir",
Path: "/tmp/csi-mount",
Err: 0x11,
}
mkdir /tmp/csi-mount: file exists
occurred

/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/sanity.go:222

Here's the call order from the driver:

I0426 15:01:54.622765 214412 node.go:73] StageVolume: volume="ef3gu6pu151v67thndsc" operation finished
I0426 15:01:54.625078 214412 node.go:185] PublishVolume(volume_id:"ef3gu6pu151v67thndsc" staging_target_path:"/tmp/csi-staging" target_path:"/tmp/csi-mount/target" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > )
I0426 15:01:54.625350 214412 node.go:361] PublishVolume: creating dir /tmp/csi-mount/target
I0426 15:01:54.625459 214412 node.go:366] PublishVolume: mounting /tmp/csi-staging at /tmp/csi-mount/target
I0426 15:01:54.625498 214412 mount_linux.go:138] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /tmp/csi-mount/target --scope -- mount -o bind /tmp/csi-staging /tmp/csi-mount/target])
I0426 15:01:54.636746 214412 mount_linux.go:138] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /tmp/csi-mount/target --scope -- mount -o bind,remount /tmp/csi-staging /tmp/csi-mount/target])
I0426 15:01:54.665978 214412 node.go:236] UnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" target_path:"/tmp/csi-mount/target" )
I0426 15:01:54.666051 214412 node.go:249] UnpublishVolume: unmounting /tmp/csi-mount/target
I0426 15:01:54.666076 214412 mount_linux.go:203] Unmounting /tmp/csi-mount/target
I0426 15:01:54.694340 214412 node.go:139] UnstageVolume(volume_id:"ef3gu6pu151v67thndsc" staging_target_path:"/tmp/csi-staging" )
I0426 15:01:54.695213 214412 node.go:174] UnstageVolume: unmounting /tmp/csi-staging
I0426 15:01:54.695229 214412 mount_linux.go:203] Unmounting /tmp/csi-staging
I0426 15:01:54.716692 214412 controller.go:198] ControllerUnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" node_id:"ef3rphg9mh60js6h4tpt" )
I0426 15:02:00.075137 214412 controller.go:223] ControllerUnpublishVolume: volume ef3gu6pu151v67thndsc detached from node ef3rphg9mh60js6h4tpt
I0426 15:02:00.075743 214412 controller.go:80] DeleteVolume(VolumeId=ef3gu6pu151v67thndsc)
I0426 15:02:03.614579 214412 node.go:236] UnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" target_path:"/tmp/csi-mount" )
I0426 15:02:03.614705 214412 node.go:249] UnpublishVolume: unmounting /tmp/csi-mount
I0426 15:02:03.614753 214412 mount_linux.go:203] Unmounting /tmp/csi-mount
E0426 15:02:03.616796 214412 node.go:252] Could not unmount "/tmp/csi-mount": Unmount failed: exit status 32
Unmounting arguments: /tmp/csi-mount
Output: umount: /tmp/csi-mount: not mounted

I0426 15:02:03.617488 214412 node.go:139] UnstageVolume(volume_id:"ef3gu6pu151v67thndsc" staging_target_path:"/tmp/csi-staging" )
I0426 15:02:03.618271 214412 node.go:166] UnstageVolume: /tmp/csi-staging target not mounted
I0426 15:02:03.618700 214412 controller.go:198] ControllerUnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" node_id:"ef3rphg9mh60js6h4tpt" )
E0426 15:02:03.814387 214412 controller.go:215] Disk unpublish operation failed: request-id = 05f937e6-6e5d-453e-ad2a-e1a49724a569 rpc error: code = InvalidArgument desc = Request validation error: Cannot find disk in instance by specified disk ID.
I0426 15:02:03.815061 214412 controller.go:80] DeleteVolume(VolumeId=ef3gu6pu151v67thndsc)

Note that the failing NodeUnpublishVolume is in fact the second one after the successfull previous one

@pohly
Copy link
Contributor

pohly commented Apr 29, 2019

The underlying problem is that once a test has failed, csi-sanity won't always be able to clean up. For example, if the driver leaves a mounted volume behind, then the usual os.RemoveAll will fail. So the latest code doesn't even try anything more than os.Remove and if that fails, any following os.Mkdir will fail.

IMHO the right solution is to run each test with its own mount and staging directory. Do you agree? This is already possible when using the Go API (just provide your own create/delete functions which dynamically allocate temp directories), but not when using the csi-sanity command.

Are you using the command? Which version?

@alexanderKhaustov
Copy link
Contributor Author

The underlying problem is that once a test has failed, csi-sanity won't always be able to clean up. For example, if the driver leaves a mounted volume behind, then the usual os.RemoveAll will fail. So the latest code doesn't even try anything more than os.Remove and if that fails, any following os.Mkdir will fail.

Seems like the problem above is that if unmount fails (which happens because it is the second unmount attempt and the folder is already unmounted) then the directory is not deleted. And the test itself is not failing. I'd suggest attempting to remove the folder even if unmounting fails.

Are you using the command? Which version?
I've removed the test setup but it was a recent master version, the day of the post or the previous one

@pohly
Copy link
Contributor

pohly commented Apr 30, 2019 via email

@alexanderKhaustov
Copy link
Contributor Author

Sorry, I don't follow. If unmounting succeeded, why does removing the folder fail? os.Remove is called. Also, you are saying that "unmount fails because it is the second unmount attempt and the folder is already unmounted". This sounds like the driver isn't idempotent? It's not an error to call NodeUnpublishVolume twice.

It seemed that remove hasn't been called until after the second unmount attempt and maybe not at all. Still, your objection that driver seems to perform non-idempotently sounds reasonable. I'll look into it once more. Thanks!

Were you using the csi-sanity command?

Yes, I've been running the tests via the command

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xing-yang added a commit to xing-yang/csi-test that referenced this issue Aug 23, 2022
d24254f Merge pull request kubernetes-csi#202 from xing-yang/kind_0.14.0
0faa3fc Update to Kind v0.14.0 images
ef4e1b2 Merge pull request kubernetes-csi#201 from xing-yang/add_1.24_image
4ddce25 Add 1.24 Kind image
7fe5149 Merge pull request kubernetes-csi#200 from pohly/bump-kubernetes-version
70915a8 prow.sh: update snapshotter version
31a3f38 Merge pull request kubernetes-csi#199 from pohly/bump-kubernetes-version
7577454 prow.sh: bump Kubernetes to v1.22.0
d29a2e7 Merge pull request kubernetes-csi#198 from pohly/csi-test-5.0.0
41cb70d prow.sh: sanity testing with csi-test v5.0.0
c85a63f Merge pull request kubernetes-csi#197 from pohly/fix-alpha-testing
b86d8e9 support Kubernetes 1.25 + Ginkgo v2
ab0b0a3 Merge pull request kubernetes-csi#192 from andyzhangx/patch-1
7bbab24 Merge pull request kubernetes-csi#196 from humblec/non-alpha
e51ff2c introduce control variable for non alpha feature gate configuration
ca19ef5 Merge pull request kubernetes-csi#195 from pohly/fix-alpha-testing
3948331 fix testing with latest Kubernetes
9a0260c fix boilerplate header

git-subtree-dir: release-tools
git-subtree-split: d24254f
stmcginnis pushed a commit to stmcginnis/csi-test that referenced this issue Oct 9, 2024
introduce control variable for non alpha feature gate configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants