-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientv3: added test for client metrics #6748
clientv3: added test for client metrics #6748
Conversation
@xiang90, @heyitsanthony ,@gyuho : I am facing an issue with this test case. The test cases are failing for a leak in go-routine due to the http.ListenAndServe i am starting as it won't close until the entire test case execution completes.
And before exiting i was doing:
But now the issue is once the listener is closing the entire test case ends and before the next test case execution it exits :-( Logs below:
Will be glad if you could help on this. |
@xiang90 I don't know why the semaphore build is failing for integration, in my local machine i have tested it almost 50 times now and its passing all the time :-(.
What do i do for this? |
We will take a look today |
@xiang90 Thank you! |
@@ -113,6 +113,7 @@ func interestingGoroutines() (gs []string) { | |||
strings.Contains(stack, "github.com/coreos/etcd/pkg/testutil.interestingGoroutines") || | |||
strings.Contains(stack, "github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop") || | |||
strings.Contains(stack, "github.com/golang/glog.(*loggingT).flushDaemon") || | |||
strings.Contains(stack, "github.com/coreos/etcd/integration.TestV3ClientMetrics") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't mask goroutine leaks for specific test cases; leak checking is incredibly useful for debugging and there's usually a way to write the test so it won't leak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyitsanthony i undertand your concern, i had initially mentioned a double check for this for net/http induced go routine as well as the current check, later i had modified this only to check if semaphore passes. But to no avail. As i had mentioned earlier as well i had written a custom code by passing listener to this, but on closing the listener the entire test was failing. Also i didn't find an alternative to close the go-routine i was setting up, But to no avail. Had asked for help from you as well for this but i guess you we busy with other work.As a last resort i had to dig deep and found an alternative this way. Would be grateful to you if you could help me out on closing the go routine which i am creating as i haven't been able to figure out an alternative to close this.
Thanks a lot for looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert this. see my comment on the http server setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// listen for all prometheus metrics | ||
go func() { | ||
http.Handle("/metrics", prometheus.Handler()) | ||
http.ListenAndServe(":27989", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to shutdown this http server after this test. or there will be go routine leak. you can create a listener first and close that listener after test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 if you see my sample code above i had already tried it. But closing the listener is also terminating the entire test case. Hence i had to come up with the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not quite understand. why does it terminate the entire test case? also when we decide to close the listener, we should expect the test to finish, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 yes i expected the same but i have already tried and saw it terminating here. Please take a look at the code i had posted earlier. Let me know if i am doing anything wrong there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Fatal(srv.Serve(ln.(*net.TCPListener)))
do not fatal here. get the error, if the error is closing ...
then gracefully exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will do that and will try it out. Thank you!
http.ListenAndServe(":27989", nil) | ||
}() | ||
|
||
clus := NewClusterV3(t, &ClusterConfig{Size: 3}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use size 1? we do not really need a multi-member etcd cluster for this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok Will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 For clientv3/integration it crating one client per member hence will require cluster of size 3 now. Or shall i move away from this behavior and write my custom code for this?
defer clus.Terminate(t) | ||
|
||
ctxb := context.Background() | ||
ctx, cancel := context.WithCancel(ctxb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx, cancel := context.WithCancel(context.Background())
and remove line42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will do that.
@@ -0,0 +1,142 @@ | |||
// Copyright 2016 The etcd Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please move this test to clientv3/integration?
/integration is for server side testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do it.
ctx, cancel := context.WithCancel(ctxb) | ||
defer cancel() | ||
|
||
wAPI := toGRPC(clus.RandClient()).Watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use clientv3's API not raw gRPC API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
} | ||
} | ||
|
||
func sumCountersForMetricAndLabels(metricName string, matchingLabelValues ...string) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we already have this somewhere? can we reuse that code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually i had checked it specific search at this depth is not available for members code it was available to one high level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to reconstruct the code to avoid this kind of code duplication.
can you point to me where is the code that you copied this from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 its a reference from promtheus test case written on similar terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not from etcd code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. let's put aside this for now. i will try to give this a closer look if we make other stuff work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 sure will do all the changes and test it. Will let you know once all the changes are done.
|
||
func getAllMetrics() []string { | ||
|
||
url := "http://localhost:27989/metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use hard coded value. pass the url into this func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will change it.
// make an http request to fetch all prometheus metrics | ||
resp, err := http.Get(url) | ||
if err != nil { | ||
log.Fatalf("fetch: %v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use t.fatal in tests, not log.fatal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will modify it.
if err == io.EOF { | ||
break | ||
} else { | ||
log.Fatalf("error reading stuff and the err is : %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use t.fatal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will change it
Let's clean this up. ping me when it is ready for the next round of review. sorry for the delay. |
@xiang90 I have done the changes. PTAL. |
) | ||
|
||
var ( | ||
host string = "localhost" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this into the func as a local var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also probably just addr := "localhost:27989"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
err error | ||
) | ||
|
||
// listen for all prometheus metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comments please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 only this comment or all the comments?
if addr == "" { | ||
addr = ":http" | ||
} | ||
ln, err = net.Listen("tcp", addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net.Listen("tcp", addr) (addr see https://github.com/coreos/etcd/pull/6748/files#r85951520)
}() | ||
}() | ||
|
||
clus := integration.NewClusterV3(t, &integration.ClusterConfig{Size: 3}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change size to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 currently the integration framework is designed to have one client per member. Hence i had to make it 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just use one client? why do we need multiple clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to have separate clients do to watch and put? client is thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok you were right one client is sufficient. Will change it.
w := clientv3.NewWatcher(wclient) | ||
defer w.Close() | ||
|
||
// select a different client from wclient so puts succeed if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? i do not quite understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 on one client its watching. so not getting the same one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not matter. it wont affect anything. one client can do multiple concurrent requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will test it out.
|
||
wc := w.Watch(ctxc, "foo1") | ||
|
||
watchResponseBefore := sumCountersForMetricAndLabels(t, "grpc_client_msg_received_total", "Watch", "bidi_stream") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wbefore (try to use short variable name for local var)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will chnge it
|
||
watchResponseBefore := sumCountersForMetricAndLabels(t, "grpc_client_msg_received_total", "Watch", "bidi_stream") | ||
|
||
putTotalBefore := sumCountersForMetricAndLabels(t, "grpc_client_started_total", "Put", "unary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pbefore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will change it
watchResponseBefore := sumCountersForMetricAndLabels(t, "grpc_client_msg_received_total", "Watch", "bidi_stream") | ||
|
||
putTotalBefore := sumCountersForMetricAndLabels(t, "grpc_client_started_total", "Put", "unary") | ||
kv.Put(ctxc, "foo1", "bar") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do that.
|
||
putTotalBefore := sumCountersForMetricAndLabels(t, "grpc_client_started_total", "Put", "unary") | ||
kv.Put(ctxc, "foo1", "bar") | ||
putTotalAfter := sumCountersForMetricAndLabels(t, "grpc_client_started_total", "Put", "unary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pafter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will modify it.
putTotalAfter := sumCountersForMetricAndLabels(t, "grpc_client_started_total", "Put", "unary") | ||
|
||
if putTotalBefore+1 != putTotalAfter { | ||
t.Errorf("grpc_client_started_total should be incremented but recieved : %v\n", putTotalAfter-putTotalBefore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Errorf ("after = %d, want %d")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will chng it
} | ||
|
||
// consume watch response | ||
<-wc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select {
case <-wc:
case <-time.After(10 * time.Second):
....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will chng it.
t.Errorf("grpc_client_msg_received_total should be incremented but recieved %v\n", watchResponseAfter-watchResponseBefore) | ||
} | ||
|
||
if ln != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove nil checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@xiang90 Do you mean using round trip? |
@sinsharat Yes. Use that directly. |
@xiang90 ok will try it out and check. Thanks for the suggestion! |
if err != nil { | ||
t.Fatalf("Error : %v occured while listening on addr : %v", err, addr) | ||
} | ||
err = srv.Serve(ln) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
srv.SetKeepAlivesEnabled(false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should go before the Serve
; it will fix the leak.
mux := http.NewServeMux() | ||
mux.Handle("/metrics", prometheus.Handler()) | ||
|
||
srv := &http.Server{Handler: mux} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mux isn't needed since only one handler:
srv := &http.Server{Handler: promehteus.Handler()}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyitsanthony @xiang90 had asked to not use nil and use custom handler instead. Hence had changed it to mux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sinsharat we should do what @heyitsanthony suggested. we cannot put it as nil. but since this is the only handler, we do not really need a mux. We can put the handler itself here as the mux today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? where's the nil here? I believe he was commenting about http.Handler
+ http.Server{Handler: nil}
; http.Handler
is bad because it's overriding defaults. This doesn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do that.
) | ||
|
||
// listen for all prometheus metrics | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test should probably wait until this goroutine completes,
donec := make(chan struct{})
go func() {
defer close(donec)
...
t.Errorf("grpc_client_msg_received_total expected : %d, got %d", 1, wAfter-wBefore) | ||
} | ||
ln.Close() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<-donec
@heyitsanthony I have done the changes as per your suggestion, but as i had judged earlier without channels as well the server was closing, and the problem of leak still persists. Will try doing what @xiang90 had suggested on top of your suggested changes. |
@sinsharat Can you push your recent changes to github? |
@xiang90 sure will do it. |
@xiang90 @heyitsanthony I have pushed the changes. PTAL. |
t.Fatalf("Error : %v occured while listening on addr : %v", err, addr) | ||
} | ||
err = srv.Serve(ln) | ||
srv.SetKeepAlivesEnabled(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this line before serve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyitsanthony @xiang90 You were right. it did fix the leak.
Thanks you so much for your help!
func getHTTPBodyAsLines(t *testing.T, url string) []string { | ||
cfgtls := transport.TLSInfo{} | ||
tr, err := transport.NewTransport(cfgtls, time.Second) | ||
tr.DisableKeepAlives = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also set MaxIdleConns to -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will set it.
@xiang90 @heyitsanthony PTAL. |
lines = append(lines, line) | ||
} | ||
resp.Body.Close() | ||
tr.CloseIdleConnections() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this? i think this is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will remove it.
// listen for all prometheus metrics | ||
donec := make(chan struct{}) | ||
go func() { | ||
// defer close(donec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still use defer close(donec)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes right. i missed it.
if err != nil && !strings.Contains(err.Error(), "use of closed network connection") { | ||
t.Fatalf("Err serving http requests: %v", err) | ||
} | ||
donec <- struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
ln, err = transport.NewUnixListener(addr) | ||
|
||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always put the error checking right after the operation? no empty lines between them. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure will keep in mind from next time.
LGTM. defer to @heyitsanthony and @gyuho |
} | ||
|
||
wAfter := sumCountersForMetricAndLabels(t, url, "grpc_client_msg_received_total", "Watch", "bidi_stream") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
lines := []string{} | ||
for { | ||
line, err := reader.ReadString('\n') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if err != nil { | ||
if err == io.EOF { | ||
break | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this else
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyuho We need this else for terminating if some un-expected error occurs.
lines = append(lines, line) | ||
} | ||
resp.Body.Close() | ||
cli = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? It will be garbage collected anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyuho had put that when i was looking for connection leak. Will remove it. Thanks!
lgtm. Defer to @heyitsanthony Thanks @sinsharat ! |
@xiang90 @heyitsanthony @gyuho Thank you so much for your support! |
@sinsharat Thanks for getting it done! |
@xiang90 To be honest i am thankful to you and @heyitsanthony for helping me out on this. |
#6711