mvcc: cannot detach lease not found #6093

nekto0n · 2016-08-03T21:33:00Z

Hi!
Just encountered a very unexpected issue. I'm using latest etcd and had a key which was used as a lock, i.e. lease attached to it. For some reason lease was there even if I stopped all clients for a while. I decided to remove the key using etcdctl del. After running this command whole cluster (5 nodes) went down with error:

mvcc: cannot detach lease not found

I had to patch etcd to ignore this error, after that cluster went back to live.

Not quite sure, how I managed to get there (have stale lease), but I think you may be interested. Maybe there was a bug a while ago which is now fixed, but lurked in my installation and revealed itself only now.

The text was updated successfully, but these errors were encountered:

xiang90 · 2016-08-03T21:38:37Z

is your deployment upgraded from a pre-release version of etcd3?

nekto0n · 2016-08-03T21:45:55Z

Not sure, but that's quite possible, yes.

xiang90 · 2016-08-03T21:50:50Z

If you have time, probably you can help to reproduce? Set up same lock logic and let lease expire. At the meantime, I will take a look at the lease logic sometime this week.

nekto0n · 2016-08-03T21:59:46Z

Thanks for such a prompt response!
I am quite sure I have another etcd ensemble with same issue right now.

xiang90 · 2016-08-04T03:47:18Z

@nekto0n It would be really helpful if you can somehow reproduce it from a fresh cluster.

xiang90 · 2016-08-04T04:02:30Z

@nekto0n I think I found the bug. I will try to get a fix soon. It would be great if you can provide me the full etcd log? Is there a snapshot sending/receiving event happened?

nekto0n · 2016-08-04T06:39:33Z

@xiang90 I tried to, but I failed to reproduce the bug on fresh/testing installations.

It would be great if you can provide me the full etcd log? Is there a snapshot sending/receiving event happened?

You'd like to look at logs during etcdctl del call? Seems like I miss them =( I only have logs from already broken runs: http://pastebin.com/PYzy0pXn

nekto0n · 2016-08-04T06:41:25Z

But as I said, I'm pretty sure I have at least another one cluster, which has similar state and I can issue etcdctl del and grab all info you need :)

xiang90 · 2016-08-04T14:46:11Z

@nekto0n In the pre-release, there was a bug that if one lease attached with multiple keys, only the first key will be removed. In your case, have you attached multiple keys to one lease?

nekto0n · 2016-08-04T15:21:37Z

@xiang90 No, I don't think so.

xiang90 · 2016-08-05T04:10:12Z

@nekto0n

We made a couple of fixes here: #6098.

Most of them are on recovery path. Previously, etcd might mess up lease items during crash recovery.

If you can throw some workload onto the patched etcd, it would be great...

nekto0n · 2016-08-05T09:36:22Z

Sure thing! I'll give it a spin during weekend.

xiang90 · 2016-08-06T23:22:43Z

No, I don't think so.

Did you implement your own lock? Or you were using our lock implementation? If you were using ours, we actually attach all lock keys onto one lease.

nekto0n · 2016-08-08T04:05:25Z

I used your Session implementation. Couldn't use Lock because I found no way to stop worker, when lease expires. Either way, at the moment I use only one lock per process.

xiang90 · 2016-08-08T04:11:14Z

Can you please create an issue for the lock thing? We can improve it. I cannot really figure out where was the bug... We changed quite a lot stuff since 2.3.x to 3.0. I think we can close this for now. Let us know if it happens again. It would be great if you can reproduce.

nekto0n · 2016-08-08T04:24:17Z

Yeah, sure. Just thought it tried to look like sync.Mutex, which cannot be unlocked behind your back :)

nekto0n · 2016-08-08T04:25:35Z

BTW, I built and installed etcd from master and still something is holding lock file. Can I now remove it with etcdctl without shutting down whole ensemble?

heyitsanthony · 2016-08-08T04:46:46Z

@nekto0n what do you mean by lock file?

If etcd is refusing to start, then an etcd process is still running on the target member directory and needs to be stopped before running the new version of etcd. It can be done one-by-one so the cluster stays up.

If the mutex is still held, you could try revoking the lease on the mutex key with the lowest create revision which will abort the holder's session.

nekto0n · 2016-08-08T04:48:16Z

Sorry for being vague. By lock file I meant "file with attached lease". I confirm, that issue is fixed, after removing file with stale lease I got an error E | mvcc: cannot detach lease not found, not panic.

xiang90 · 2016-08-08T05:07:49Z

@nekto0n If you start with a fresh cluster, even that error should not ever appear. Or there is still a bug some where.

nekto0n · 2016-08-08T05:25:49Z

@xiang90 Right, I failed to reproduce this with a fresh cluster. Thanks a lot for your help!

xiang90 closed this as completed Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mvcc: cannot detach lease not found #6093

mvcc: cannot detach lease not found #6093

nekto0n commented Aug 3, 2016 •

edited

Loading

xiang90 commented Aug 3, 2016

nekto0n commented Aug 3, 2016

xiang90 commented Aug 3, 2016

nekto0n commented Aug 3, 2016

xiang90 commented Aug 4, 2016

xiang90 commented Aug 4, 2016

nekto0n commented Aug 4, 2016 •

edited

Loading

nekto0n commented Aug 4, 2016

xiang90 commented Aug 4, 2016

nekto0n commented Aug 4, 2016

xiang90 commented Aug 5, 2016

nekto0n commented Aug 5, 2016

xiang90 commented Aug 6, 2016

nekto0n commented Aug 8, 2016

xiang90 commented Aug 8, 2016

nekto0n commented Aug 8, 2016

nekto0n commented Aug 8, 2016

heyitsanthony commented Aug 8, 2016

nekto0n commented Aug 8, 2016

xiang90 commented Aug 8, 2016

nekto0n commented Aug 8, 2016

mvcc: cannot detach lease not found #6093

mvcc: cannot detach lease not found #6093

Comments

nekto0n commented Aug 3, 2016 • edited Loading

xiang90 commented Aug 3, 2016

nekto0n commented Aug 3, 2016

xiang90 commented Aug 3, 2016

nekto0n commented Aug 3, 2016

xiang90 commented Aug 4, 2016

xiang90 commented Aug 4, 2016

nekto0n commented Aug 4, 2016 • edited Loading

nekto0n commented Aug 4, 2016

xiang90 commented Aug 4, 2016

nekto0n commented Aug 4, 2016

xiang90 commented Aug 5, 2016

nekto0n commented Aug 5, 2016

xiang90 commented Aug 6, 2016

nekto0n commented Aug 8, 2016

xiang90 commented Aug 8, 2016

nekto0n commented Aug 8, 2016

nekto0n commented Aug 8, 2016

heyitsanthony commented Aug 8, 2016

nekto0n commented Aug 8, 2016

xiang90 commented Aug 8, 2016

nekto0n commented Aug 8, 2016

nekto0n commented Aug 3, 2016 •

edited

Loading

nekto0n commented Aug 4, 2016 •

edited

Loading