Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: TestCtlV3Lock #6464

Closed
xiang90 opened this issue Sep 18, 2016 · 16 comments
Closed

Test: TestCtlV3Lock #6464

xiang90 opened this issue Sep 18, 2016 · 16 comments
Milestone

Comments

@xiang90
Copy link
Contributor

xiang90 commented Sep 18, 2016

--- FAIL: TestCtlV3Lock (16.15s)
    ctl_v3_test.go:138: test timed out for 7s
@xiang90 xiang90 added this to the v3.1.0-rc.0 milestone Sep 18, 2016
@gyuho
Copy link
Contributor

gyuho commented Sep 19, 2016

--- FAIL: TestCtlV3Lock (1.11s)
    ctl_v3_lock_test.go:91: expected different lock name, got l1="a/186d573fc1f99c04\r\n", l2="Error:  lost watcher waiting for delete\r\n"

Got another failure

@justinruggles
Copy link

justinruggles commented Sep 26, 2016

I'm getting hanging/timeouts when trying to obtain a lock with concurrency.Mutex when using master. Using release-3.0 branch works fine. The hang appears to be in waitForDelete(), so it may be related to this.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 26, 2016

@justinruggles It would be great if you can help to debug this if you can reproduce. Also it would be useful to share how to reproduce it.

@justinruggles
Copy link

When I tried again using etcd server from master there was no problem, so I think this was just an issue with using release-3.0 etcd server with a master branch client. Sorry for the noise.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 26, 2016

@mitake @AkihiroSuda I had hard time to reproduce this. Can you try to help?

@mitake
Copy link
Contributor

mitake commented Sep 27, 2016

@xiang90 ok, I'll try it out

@mitake
Copy link
Contributor

mitake commented Sep 27, 2016

I tried to reproduce the fails with more than 2000 iterations by namazu process inspector (that increases randomness of thread scheduling) but couldn't reproduce. How long did you need to produce the above fails? @xiang90 @gyuho

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 27, 2016

@mitake I cannot reproduce it myself. :( I suspect that it might be caused by CPU starvation. But I am not sure.

@AkihiroSuda
Copy link
Contributor

Didn't hit as well..

@heyitsanthony
Copy link
Contributor

@heyitsanthony
Copy link
Contributor

I was able to reproduce ctl_v3_test.go:138: test timed out for 7s by putting a if rand.Intn(2) == 0 { time.Sleep(15 * time.Second) } right after waitKeys in Lock() (the 7s is misleading; the timeout is really 15s). This simulates a broken watch cancel path where the watch channel never closes out after a cancel.

I don't think locks are broken here because otherwise the integration lock tests would be exploding as well. The difference from those tests and e2e is the e2e test will cancel a blocked lock (and therefore the watch) when it sends a SIGINT to the process. I suspect this is fixed by the watch refactor.

@xiang90
Copy link
Contributor Author

xiang90 commented Oct 6, 2016

@heyitsanthony Sounds reasonable. Can we:

  1. fix the wrong timeout printing
  2. panic when there is a timeout instead of fatal to print out the stack trace. if it happens again, it is easier for us to debug.

@heyitsanthony
Copy link
Contributor

  1. is OK
  2. panic() probably won't do much for e2e debugging. It'll catch the stalled process Close but the interesting traces will be in separate processes. Some infrastructure around the process stuff that would SIGQUIT + collect the resulting stack traces would be much more useful.

@xiang90
Copy link
Contributor Author

xiang90 commented Oct 6, 2016

@heyitsanthony right. let's close this for now.

@xiang90 xiang90 closed this as completed Oct 6, 2016
@heyitsanthony
Copy link
Contributor

@heyitsanthony heyitsanthony reopened this Nov 28, 2016
@gyuho gyuho modified the milestones: v3.2.0, v3.1.0 Jan 13, 2017
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue May 5, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
@heyitsanthony heyitsanthony self-assigned this May 5, 2017
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue May 8, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue May 9, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue May 10, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue May 11, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue May 12, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
@heyitsanthony heyitsanthony added this to the v3.3.0 milestone Jun 9, 2017
@heyitsanthony heyitsanthony removed this from the v3.2.0 milestone Jun 9, 2017
yudai pushed a commit to yudai/etcd that referenced this issue Oct 5, 2017
Meant to debug etcd-io#6464 and etcd-io#6934

Dumps the output from the etcd/etcdctl servers and SIGQUITs to get a
golang backtrace in case of a hanged process.
@xiang90
Copy link
Contributor Author

xiang90 commented Nov 9, 2017

i have seen this for quite a while, and cannot reproduce it. so closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants