-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed test stopped despite workers running #1707
Comments
You may be right, but any code that blocks the worker a significant amount of time (without doing async I/O or sleeping, and thus yielding control) must be considered out of scope for locust, as it will make response times wrong (for other concurrent Users). If you can make a PR (and maybe test case) then I would be happy to merge, just know that even then your test will be in a kind of a bad place. |
Thanx for the clarification. I didn't went that deep into the per-worker concurrency strategy (yet). This issue is not about long-running task resulting into the worker being reported as "missing" per se, but simply the fact it only takes half of workers to result into a tear-down of the whole test run. In the example above, I have 30 workers and as soon as 15 are being reported as "missing" all the other 15 workers which are running just fine are stopped by the master and the test run is ended immediately. If this is a wanted behavior, then it may not be a bug, but instead a configuration enhancement request. |
Ah, I didnt read your initial issue thoroughly enough. If losing only half of the workers makes the test shut down then that is definitely a bug. Dont have time to look into it myself though... |
I have created (and tested with docker-compose) the necessary modifications with PR #1710 , if you mind taking a look. |
Cool stuff! Would it be possible to add a unit test for it or is it too hard? |
Sure, I'll add a unit test. Didn't have the time to look into the test setup itself, yet. |
So, finally added a test to reproduce the issue (mainly by fixing and enhancing an existing test). Rebased the commits to verify failure before the fix commit. |
Using locust 1.4.4
Avg response time is around 2sec for the task master.conf :
After running for some requests it get stucked.. |
the same issue and I have tried with different version of locust since 1.5.* to 2.0.0* and I'm always getting Worker locust-worker-7f567764d7-5qtv7_0c6224c0bffe4faa89641944ddac097d failed to send heartbeat, setting state to missing. failed to send heartbeat, setting state to missing. I'm running it in kubernetes |
@roquemoyano-tc did you find the solution? |
no I didn't I have tried in AKS and in minikube and I'm getting the same issue, I hope someone here could help |
What do the worker logs say? |
Actually, you should probably open up a new ticket. This ticket was about locust shutting down despite some workers still being running (and connected). |
yes I have opened this #1843 |
Describe the bug
When executing a distributed load test where worker node might not heartbeat back in-time (which is not configurable anymore) due to CPU and/or I/O-intensive tasks, it can happen that the whole test is being stopped despite the workers being fine and just busy.
Expected behavior
The test continues to run.
Actual behavior
All workers are being stopped by the master after the following messages:
After logging some more internals, it became evident that the calculation is simply wrong:
...where:
workers = self.worker_count
(despite actually running 30 workers in my case)missing = len(self.clients.missing)
...however,
self.worker_count
doesn't even include missing clients, which makes the condition completely obsolete:So, either
self.worker_count
needs to include missing clients or the condition should changed to this instead:Steps to reproduce
Create a load-test that has a CPU-intensive task that runs for more than 3 seconds on each worker.
Environment
docker-compose up --scale worker=30
(see docker-compose file below)Docker compose file:
.env
file for the specific run:The text was updated successfully, but these errors were encountered: