/tmp/csi-mount is sometimes not cleaned up after "Node Service"."should work" test which fails further tests #196

alexanderKhaustov · 2019-04-26T15:40:48Z

Here's csi-sanity output:

Node Service
should work
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:625
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: reusing connection to CSI driver controller at unix:///tmp/csi.sock
STEP: creating mount and staging directories
STEP: creating a single node writer volume
STEP: getting a node id
STEP: controller publishing volume
STEP: node staging volume
STEP: publishing the volume on a node
STEP: cleaning up calling nodeunpublish
STEP: cleaning up calling nodeunstage
STEP: cleaning up calling controllerunpublishing
STEP: cleaning up deleting the volume
cleanup: deleting sanity-node-full-35BAA099-984D9C20 = ef3gu6pu151v67thndsc
cleanup: warning: NodeUnpublishVolume: rpc error: code = Internal desc = Could not unmount "/tmp/csi-mount": Unmount failed: exit status 32
Unmounting arguments: /tmp/csi-mount
Output: umount: /tmp/csi-mount: not mounted

cleanup: warning: ControllerUnpublishVolume: rpc error: code = InvalidArgument desc = Disk unpublish operation failed

• [SLOW TEST:16.717 seconds]
Node Service
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
should work
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:625

ListSnapshots [Controller Server]
should return appropriate values (no optional values added)
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1579
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: reusing connection to CSI driver controller at unix:///tmp/csi.sock
STEP: creating mount and staging directories

• Failure in Spec Setup (BeforeEach) [0.001 seconds]
ListSnapshots [Controller Server]
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
should return appropriate values (no optional values added) [BeforeEach]
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1579

failed to create target directory
Unexpected error:
<*os.PathError | 0xc0003db170>: {
Op: "mkdir",
Path: "/tmp/csi-mount",
Err: 0x11,
}
mkdir /tmp/csi-mount: file exists
occurred

/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/sanity.go:222

Here's the call order from the driver:

I0426 15:01:54.622765 214412 node.go:73] StageVolume: volume="ef3gu6pu151v67thndsc" operation finished
I0426 15:01:54.625078 214412 node.go:185] PublishVolume(volume_id:"ef3gu6pu151v67thndsc" staging_target_path:"/tmp/csi-staging" target_path:"/tmp/csi-mount/target" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > )
I0426 15:01:54.625350 214412 node.go:361] PublishVolume: creating dir /tmp/csi-mount/target
I0426 15:01:54.625459 214412 node.go:366] PublishVolume: mounting /tmp/csi-staging at /tmp/csi-mount/target
I0426 15:01:54.625498 214412 mount_linux.go:138] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /tmp/csi-mount/target --scope -- mount -o bind /tmp/csi-staging /tmp/csi-mount/target])
I0426 15:01:54.636746 214412 mount_linux.go:138] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /tmp/csi-mount/target --scope -- mount -o bind,remount /tmp/csi-staging /tmp/csi-mount/target])
I0426 15:01:54.665978 214412 node.go:236] UnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" target_path:"/tmp/csi-mount/target" )
I0426 15:01:54.666051 214412 node.go:249] UnpublishVolume: unmounting /tmp/csi-mount/target
I0426 15:01:54.666076 214412 mount_linux.go:203] Unmounting /tmp/csi-mount/target
I0426 15:01:54.694340 214412 node.go:139] UnstageVolume(volume_id:"ef3gu6pu151v67thndsc" staging_target_path:"/tmp/csi-staging" )
I0426 15:01:54.695213 214412 node.go:174] UnstageVolume: unmounting /tmp/csi-staging
I0426 15:01:54.695229 214412 mount_linux.go:203] Unmounting /tmp/csi-staging
I0426 15:01:54.716692 214412 controller.go:198] ControllerUnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" node_id:"ef3rphg9mh60js6h4tpt" )
I0426 15:02:00.075137 214412 controller.go:223] ControllerUnpublishVolume: volume ef3gu6pu151v67thndsc detached from node ef3rphg9mh60js6h4tpt
I0426 15:02:00.075743 214412 controller.go:80] DeleteVolume(VolumeId=ef3gu6pu151v67thndsc)
I0426 15:02:03.614579 214412 node.go:236] UnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" target_path:"/tmp/csi-mount" )
I0426 15:02:03.614705 214412 node.go:249] UnpublishVolume: unmounting /tmp/csi-mount
I0426 15:02:03.614753 214412 mount_linux.go:203] Unmounting /tmp/csi-mount
E0426 15:02:03.616796 214412 node.go:252] Could not unmount "/tmp/csi-mount": Unmount failed: exit status 32
Unmounting arguments: /tmp/csi-mount
Output: umount: /tmp/csi-mount: not mounted

I0426 15:02:03.617488 214412 node.go:139] UnstageVolume(volume_id:"ef3gu6pu151v67thndsc" staging_target_path:"/tmp/csi-staging" )
I0426 15:02:03.618271 214412 node.go:166] UnstageVolume: /tmp/csi-staging target not mounted
I0426 15:02:03.618700 214412 controller.go:198] ControllerUnpublishVolume(volume_id:"ef3gu6pu151v67thndsc" node_id:"ef3rphg9mh60js6h4tpt" )
E0426 15:02:03.814387 214412 controller.go:215] Disk unpublish operation failed: request-id = 05f937e6-6e5d-453e-ad2a-e1a49724a569 rpc error: code = InvalidArgument desc = Request validation error: Cannot find disk in instance by specified disk ID.
I0426 15:02:03.815061 214412 controller.go:80] DeleteVolume(VolumeId=ef3gu6pu151v67thndsc)

Note that the failing NodeUnpublishVolume is in fact the second one after the successfull previous one

pohly · 2019-04-29T16:12:37Z

The underlying problem is that once a test has failed, csi-sanity won't always be able to clean up. For example, if the driver leaves a mounted volume behind, then the usual os.RemoveAll will fail. So the latest code doesn't even try anything more than os.Remove and if that fails, any following os.Mkdir will fail.

IMHO the right solution is to run each test with its own mount and staging directory. Do you agree? This is already possible when using the Go API (just provide your own create/delete functions which dynamically allocate temp directories), but not when using the csi-sanity command.

Are you using the command? Which version?

alexanderKhaustov · 2019-04-30T07:45:51Z

The underlying problem is that once a test has failed, csi-sanity won't always be able to clean up. For example, if the driver leaves a mounted volume behind, then the usual os.RemoveAll will fail. So the latest code doesn't even try anything more than os.Remove and if that fails, any following os.Mkdir will fail.

Seems like the problem above is that if unmount fails (which happens because it is the second unmount attempt and the folder is already unmounted) then the directory is not deleted. And the test itself is not failing. I'd suggest attempting to remove the folder even if unmounting fails.

Are you using the command? Which version?
I've removed the test setup but it was a recent master version, the day of the post or the previous one

pohly · 2019-04-30T08:13:40Z

alexanderKhaustov <[email protected]> writes:

> The underlying problem is that once a test has failed, csi-sanity won't always be able to clean up. For example, if the driver leaves a mounted volume behind, then the usual `os.RemoveAll` will fail. So the latest code doesn't even try anything more than `os.Remove` and if that fails, any following `os.Mkdir` will fail. Seems like the problem above is that if unmount fails (which happens because it is the second unmount attempt and the folder is already unmounted) then the directory is not deleted. And the test itself is not failing. I'd suggest attempting to remove the folder even if unmounting fails.

Sorry, I don't follow. If unmounting succeeded, why does removing the folder fail? `os.Remove` is called. Also, you are saying that "unmount fails because it is the second unmount attempt and the folder is already unmounted". This sounds like the driver isn't idempotent? It's not an error to call NodeUnpublishVolume twice.

> Are you using the command? Which version? I've removed the test setup but it was a recent master version, the day of the post or the previous one

Were you using the csi-sanity command?

alexanderKhaustov · 2019-04-30T15:03:20Z

Sorry, I don't follow. If unmounting succeeded, why does removing the folder fail? os.Remove is called. Also, you are saying that "unmount fails because it is the second unmount attempt and the folder is already unmounted". This sounds like the driver isn't idempotent? It's not an error to call NodeUnpublishVolume twice.

It seemed that remove hasn't been called until after the second unmount attempt and maybe not at all. Still, your objection that driver seems to perform non-idempotently sounds reasonable. I'll look into it once more. Thanks!

Were you using the csi-sanity command?

Yes, I've been running the tests via the command

fejta-bot · 2019-07-29T15:18:10Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-08-28T16:18:19Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-09-27T17:02:06Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-09-27T17:02:14Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

d24254f Merge pull request kubernetes-csi#202 from xing-yang/kind_0.14.0 0faa3fc Update to Kind v0.14.0 images ef4e1b2 Merge pull request kubernetes-csi#201 from xing-yang/add_1.24_image 4ddce25 Add 1.24 Kind image 7fe5149 Merge pull request kubernetes-csi#200 from pohly/bump-kubernetes-version 70915a8 prow.sh: update snapshotter version 31a3f38 Merge pull request kubernetes-csi#199 from pohly/bump-kubernetes-version 7577454 prow.sh: bump Kubernetes to v1.22.0 d29a2e7 Merge pull request kubernetes-csi#198 from pohly/csi-test-5.0.0 41cb70d prow.sh: sanity testing with csi-test v5.0.0 c85a63f Merge pull request kubernetes-csi#197 from pohly/fix-alpha-testing b86d8e9 support Kubernetes 1.25 + Ginkgo v2 ab0b0a3 Merge pull request kubernetes-csi#192 from andyzhangx/patch-1 7bbab24 Merge pull request kubernetes-csi#196 from humblec/non-alpha e51ff2c introduce control variable for non alpha feature gate configuration ca19ef5 Merge pull request kubernetes-csi#195 from pohly/fix-alpha-testing 3948331 fix testing with latest Kubernetes 9a0260c fix boilerplate header git-subtree-dir: release-tools git-subtree-split: d24254f

introduce control variable for non alpha feature gate configuration

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2019

k8s-ci-robot closed this as completed Sep 27, 2019

xing-yang mentioned this issue Aug 23, 2022

Update release tools #389

Merged

stmcginnis pushed a commit to stmcginnis/csi-test that referenced this issue Oct 9, 2024

Merge pull request kubernetes-csi#196 from humblec/non-alpha

7bbab24

introduce control variable for non alpha feature gate configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/tmp/csi-mount is sometimes not cleaned up after "Node Service"."should work" test which fails further tests #196

/tmp/csi-mount is sometimes not cleaned up after "Node Service"."should work" test which fails further tests #196

alexanderKhaustov commented Apr 26, 2019

pohly commented Apr 29, 2019

alexanderKhaustov commented Apr 30, 2019

pohly commented Apr 30, 2019 via email

alexanderKhaustov commented Apr 30, 2019

fejta-bot commented Jul 29, 2019

fejta-bot commented Aug 28, 2019

fejta-bot commented Sep 27, 2019

k8s-ci-robot commented Sep 27, 2019

/tmp/csi-mount is sometimes not cleaned up after "Node Service"."should work" test which fails further tests #196

/tmp/csi-mount is sometimes not cleaned up after "Node Service"."should work" test which fails further tests #196

Comments

alexanderKhaustov commented Apr 26, 2019

Here's csi-sanity output:

• [SLOW TEST:16.717 seconds] Node Service /home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44 should work /home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:625

/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/sanity.go:222

pohly commented Apr 29, 2019

alexanderKhaustov commented Apr 30, 2019

pohly commented Apr 30, 2019 via email

alexanderKhaustov commented Apr 30, 2019

fejta-bot commented Jul 29, 2019

fejta-bot commented Aug 28, 2019

fejta-bot commented Sep 27, 2019

k8s-ci-robot commented Sep 27, 2019

• [SLOW TEST:16.717 seconds]
Node Service
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
should work
/home/akhaustov/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:625