-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix etcd3 locks to handle improper shutdown #2526
Fix etcd3 locks to handle improper shutdown #2526
Conversation
Pinging @xiang90 for a review |
@jefferai This pr changes the default lock timeout from 60s to 15s. Is this consistent with other backends? If yes, then I am ok with it. |
physical/etcd3.go
Outdated
@@ -32,6 +32,9 @@ type EtcdBackend struct { | |||
etcd *clientv3.Client | |||
} | |||
|
|||
// Etcd default for lease is 60s, set to 15s for faster recovery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd default lease duration is 60s. set to 15s for faster recovery.
physical/etcd3.go
Outdated
@@ -32,6 +32,9 @@ type EtcdBackend struct { | |||
etcd *clientv3.Client | |||
} | |||
|
|||
// Etcd default for lease is 60s, set to 15s for faster recovery | |||
const etcd3LockTTLSeconds = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd3LockTimeoutInSeconds
LGTM after addressing the minor issues. |
Solution: Write the lock item with the lease.
81a3834
to
9e705b2
Compare
@xiang90 Making it shorter makes sense to me if it's generally stable. |
@jefferai OK. Then 15 seconds should be good enough. |
Seems this is ready to merge? |
LGTM. I cannot click the button. :P |
@xiang90 Hah, just wanted to make sure that you were fine with the changes. Probably should have just requested a review :-) Thanks! |
Problem: Etcd3 locks are not released if shutdown improperly.
Example
After an instance has improperly shutdown, its lock will remain in Etcd
Inspecting them individually shows that two (the alive instances) pending locks have leases (results trimmed for brevity).
$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl get /vault/core/lock/49075afbd30ce3c9 --write-out json
$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl get /vault/core/lock/7e055afbd3165b61 --write-out json
While the third (the dead instance) owns the lock, but does not have a lease applied.
$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl get /vault/core/lock/7e055af811b4bb53 --write-out json
Solution: Write the lock item with the lease.
Looking at the internals of the Mutex.Lock method shows that all that is required is to add the lease with the put operation