-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue: etcd stops answering after ~250 concurent HTTP API clients. #11826
Comments
can you provide more info?such as metrics file?how to reproduce it? how do you use watchers?thanks. |
reproduce: basically each client logs into etcd, using username/password, and starts watching a single prefix. but just that there are hundreds of clients. i am sorry that i cannot provide code to reproduce this right now, but this is embedded into an c++ application. i am currently talking to our devs, if the can provide the exact calls that are made in the code. what do you mean by "metrics" file? are you talking about the metrics endpoint? |
okay, i was able to reproduce this issue with some python code & requests:
with 500 clients, the http part of the API crashes: i am no longer able to login using curl to the node:
this will timeout at some point in time: the gRPC api is not affected by this: calling "./etcdctl member list" and other commands still works. the interesting part is, that some of the HTTP connections initiated by this dummy client are still receiving updates. some not. also, as soon as i stop thy python script, curl requests to the affected etcd node are working again. |
i checked a little more, and it seems like in etcd there is a hard connection limit. when setting the number of threads above 250, etcd stops answering requests. |
there are no connection limit. authenticate interface has bad performance. you can see pr #11735 you can specify config --metric = extensive, then you can get the latency of every grpc method. curl http://hostip:2379/metrics > metrics, it is better if you can provider metric file. |
hmm, have you tried the above python code i posted? |
also, can you please explain what a metrics files is, and how to generate it? doing the curl request will most likely not work for me, because when i run the above python script, the API will not respond to HTTP calls anymore |
you can stop python script,then curl http://host:2379/metrics to get metrics info. by the way,can you do a test when you disable auth?
…---Original---
From: "schlitzered"<[email protected]>
Date: Thu, Apr 30, 2020 00:16 AM
To: "etcd-io/etcd"<[email protected]>;
Cc: "Comment"<[email protected]>;"tangcong"<[email protected]>;
Subject: Re: [etcd-io/etcd] Issue: etcd stops answering after ~250 concurent HTTP API clients. (#11826)
also, can you please explain what a metrics files is, and how to generate it?
doing the curl request will most likely not work for me, because when i run the above python script, the API will not respond to HTTP calls anymore
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
i did not try it,i guess it is caused by auth.
…---Original---
From: "schlitzered"<[email protected]>
Date: Thu, Apr 30, 2020 00:14 AM
To: "etcd-io/etcd"<[email protected]>;
Cc: "Comment"<[email protected]>;"tangcong"<[email protected]>;
Subject: Re: [etcd-io/etcd] Issue: etcd stops answering after ~250 concurent HTTP API clients. (#11826)
hmm, have you tried the above python code i posted?
for me it more or less always stops after 250 open threads, after this no new connections will be able to log into etcd, at least not using the HTTP api
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
i have just updated to the 2.4.9 release, and still facing the same issue. but it seems like, etcd can now handle a little more connections. i can now see ~310 established connections, but this is still to less. here is the output of curl https://$(hostname):2379/metrics |
Authenticate is very expensive. How about your etcd cpu load? etcd v3.4.9 includes a pr that can improve Auth performance from 18/s to 200/s in 16core32G machine.
|
please see issue #9615,you can also configure --bcrypt-cost to improve performance. @schlitzered |
except when restarting etcd, where it takes 800% cpu, the cpu load on etcd is usually well below 10%. i also just set "--bcrypt-cost 4", and still we are facing issues with HTTP requests that are not answered. here is a current metrics file: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
hey,
we are currently trying to adopt etcd for service discovery.
we are running a 3 node etcd cluster using "etcd-v3.4.7-linux-amd64"
out applications talk to etcd using the HTTP v3 api.
we noticed that the etcd cluster locks up after reaching a "high" number of watchers. whe where able to reproduce this with ~250 clients per etcd node.
the only way to recover from this situation is to restart the whole cluster.
we currently guess that we are triggering some kind of bug in etcd, since the cluster only recovers when restarting.
The text was updated successfully, but these errors were encountered: