Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RexRay for Azure failed to detach volume on container stop #1122

Closed
jmaitrehenry opened this issue Dec 7, 2017 · 5 comments
Closed

RexRay for Azure failed to detach volume on container stop #1122

jmaitrehenry opened this issue Dec 7, 2017 · 5 comments

Comments

@jmaitrehenry
Copy link
Contributor

Summary

After creating a container with an attached rexray volume, we stop (and remove) the container and the volume continue to be attached and are not reusable.

Bug Reports

Version

Please paste the output of rexray version. For example:

$ rexray version
REX-Ray
-------
Binary: /opt/bin/rexray
Flavor: client+agent+controller
SemVer: 0.11.0
OsArch: Linux-x86_64
Commit: 508023f4b0e4974f1844c26b39af7c4219f4c7a9
Formed: Mon, 16 Oct 2017 07:07:06 UTC

Expected Behavior

When the container is stopped and removed, the volume should be detached and ready to reuse.

Actual Behavior

The volume is not detached.

Steps To Reproduce

  • Create a volume
  • Start a container with the volume
  • Stop and remove it
  • Start a new container with the previous volume
$ docker run -ti --rm -v test123:/foo alpine sh
/ # [ctr+d]

$ docker run -ti --rm -v test123:/foo alpine sh
/run/torcx/bin/docker: Error response from daemon: error while mounting volume '/': VolumeDriver.Mount: ControllerPublishVolume failed: 0: ControllerPublishVolume failed: 0: rpc error: code = Unknown desc = unable to find attached vol in local devices.
ERRO[0000] error waiting for container: context canceled

$ ./rexray volume ls
ID              Name        Status     Size
julien.vhd      julien      available  1
postgresql.vhd  postgresql  available  1
test123.vhd     test123     attached   1

Configuration Files

libstorage:
  logging:
    level:         debug
    httpRequests:  true
    httpResponses: true
  server:
    tasks:
      exeTimeout: 120s
  service: azureud
  integration:
    volume:
      operations:
        mount:
          preempt: true
        create:
          default:
            size: 1
azureud:
  [zip]

Logs

Service Log

https://gist.github.com/jmaitrehenry/dae6f7680c2f66d87a3dafa700103e0f

The last line is really interresting:

INFO[3816] /csi.Controller/ControllerUnpublishVolume: REP 0029: ControllerUnpublishVolume failed: unable to find attached vol in local devices  host=unix:///var/run/rexray/384768199.sock integrationDriver=linux osDriver=linux service=azureud storageDriver=libstorage time=1512670067409

Client Log

$ docker run -ti --rm -v test123:/foo alpine sh
/ # [ctr+d]

$ docker run -ti --rm -v test123:/foo alpine sh
/run/torcx/bin/docker: Error response from daemon: error while mounting volume '/': VolumeDriver.Mount: ControllerPublishVolume failed: 0: ControllerPublishVolume failed: 0: rpc error: code = Unknown desc = unable to find attached vol in local devices.
ERRO[0000] error waiting for container: context canceled

$ ./rexray volume ls
ID              Name        Status     Size
julien.vhd      julien      available  1
postgresql.vhd  postgresql  available  1
test123.vhd     test123     attached   1
@jmaitrehenry
Copy link
Contributor Author

jmaitrehenry commented Dec 7, 2017

I made a quick benchmark - 6 runs each

# Detach volume almost 50% of the time 3/6
docker run -t --rm -v test123:/foo alpine sleep 120

# Detach volume almost 67% of the time 4/6
docker run -t --rm -v test123:/foo alpine sleep 30

# Detach sometime 50% of the time 3/6
docker run -t --rm -v test123:/foo alpine sleep 10

# Never detach the volume
docker run -t --rm -v test123:/foo alpine sleep 1

@clintkitson
Copy link
Member

@jmaitrehenry It looks like we are possibly just running into timeout issues waiting for Azure to process the requests. Can you verify the attachment/detachment process manually to see how long it is actually taking to attach/detach from your instances?

@jmaitrehenry
Copy link
Contributor Author

@clintkitson Not sure, because, when I check the debug log, I couldn't found the POST to ?detach.
It's like Rexray didn't try to detach the volume because it doesn't find the volume.

I will make more test and post the log when Rexray detaches the volume and when Rexray doesn't try to detach.

On the Azure side, when the volume is not detached, Azure never receive the operation for it.

@clintkitson
Copy link
Member

@codenrhoden Can you take a peak at the gist here?

@codenrhoden
Copy link
Member

I'll be looking into this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants