-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ACK for worker connection #2044
Comments
Are you sure this is what is your underlying problem? A glitch when connecting is of course possible, but it should be very rare. Sounds like a reasonable feature though. But you’ll most likely have to build it yourself. |
If you approve the feature request, we will probably work on it :) |
Awesome, go for it! Maybe start by writing a test case that illustrates the problem that you're trying to solve. |
What about something like this for success:
and a failure case:
|
Looks good, but does it actually test the retry code itself? |
It checks for a number of messages in outbox - without the retry there would be only 1 |
Ah, I see. So it tests that retries are done, but not that they help? Maybe that is the best we can do.. |
Hi @cyberw Failure case for this approach:
|
I guess? The test case should also verify that the exception is really thrown I think. other than that it looks very reasonable. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
@marcinh Can you please add flags to set a custom time to these variables |
I dont understand. What is your problem exactly? |
@cyberw Now with this feature if we start the worker before the master or the master is not ready for any reason, the worker won't connect with the master if the connection timeout excedes, because the |
Ah. We should probably just change CONNECT_RETRY_COUNT to default to unlimited or something. I didnt fullt understand this behaviour. |
I dont usually make PR:s on my phone but here goes :) #2125 |
Is your feature request related to a problem? Please describe.
Currently there is no mechanism of confirmation that worker is successfully connected to master and no retry mechanism in case master didn't acknowledge worker's presence. So when there is any error in communication worker does not know about it and keeps sending reports to master while master keeps discarding those reports
Describe the solution you'd like
Implement ACK mechanism together with retry. Once worker sends 'client_ready' message, master would respond with e.g. 'client_ready_ack' message so worker knows it was successfully connected. If worker fails to receive ACK message within specified time it could retry (n-times, to reconsider number of retries as well) the operation.
Describe alternatives you've considered
--
Additional context
We are using Locust extensively and we observe number of failure in our tests due to lack of workers connected (network problems or issue) and such ack + retry mechanism would improve our executions
The text was updated successfully, but these errors were encountered: