Add ACK for worker connection #2044

marcinh · 2022-03-07T11:13:39Z

Is your feature request related to a problem? Please describe.

Currently there is no mechanism of confirmation that worker is successfully connected to master and no retry mechanism in case master didn't acknowledge worker's presence. So when there is any error in communication worker does not know about it and keeps sending reports to master while master keeps discarding those reports

Describe the solution you'd like

Implement ACK mechanism together with retry. Once worker sends 'client_ready' message, master would respond with e.g. 'client_ready_ack' message so worker knows it was successfully connected. If worker fails to receive ACK message within specified time it could retry (n-times, to reconsider number of retries as well) the operation.

Describe alternatives you've considered

--

Additional context

We are using Locust extensively and we observe number of failure in our tests due to lack of workers connected (network problems or issue) and such ack + retry mechanism would improve our executions

cyberw · 2022-03-07T12:32:36Z

Are you sure this is what is your underlying problem? A glitch when connecting is of course possible, but it should be very rare.

Sounds like a reasonable feature though. But you’ll most likely have to build it yourself.

marcinh · 2022-03-07T12:37:51Z

If you approve the feature request, we will probably work on it :)

cyberw · 2022-03-07T14:55:59Z

Awesome, go for it! Maybe start by writing a test case that illustrates the problem that you're trying to solve.

marcinh · 2022-03-08T15:52:21Z

What about something like this for success:

    def test_worker_connect_success(self):
        class MyTestUser(User):
            @task
            def the_task(self):
                pass

        with mock.patch("locust.runners.CONNECTION_TIMEOUT", new=1):
            with mock.patch("locust.rpc.rpc.Client", mocked_rpc()) as client:
                worker = self.get_runner(environment=Environment(), user_classes=[MyTestUser])
                client.mocked_send(Message('client_ready_ack', {}, worker.client_id))
                self.assertEqual(1, len(client.outbox))
                self.assertEqual('client_ready', client.outbox[0].type)
                self.assertTrue(worker.connected)

and a failure case:

    def test_worker_connect_failure(self):
        class MyTestUser(User):
            @task
            def the_task(self):
                pass

        with mock.patch("locust.runners.CONNECTION_TIMEOUT", new=0.1):
            with mock.patch("locust.runners.CONNECTION_RETRY", new=1):
                with mock.patch("locust.rpc.rpc.Client", mocked_rpc()) as client:
                    worker = self.get_runner(environment=Environment(), user_classes=[MyTestUser])
                    sleep(0.3)
                    self.assertEqual(2, len(client.outbox))
                    self.assertFalse(worker.connected)

cyberw · 2022-03-08T18:00:35Z

Looks good, but does it actually test the retry code itself?

marcinh · 2022-03-23T08:05:28Z

It checks for a number of messages in outbox - without the retry there would be only 1

cyberw · 2022-03-23T08:12:41Z

Ah, I see. So it tests that retries are done, but not that they help? Maybe that is the best we can do..

Nosibb · 2022-04-15T11:29:32Z

Hi @cyberw
Is it ok to throw an exception when a worker cannot connect to the master?

Failure case for this approach:

def test_worker_connect_failure(self):
    class MyTestUser(User):
        @task
        def the_task(self):
            pass

    with mock.patch("locust.runners.CONNECTION_TIMEOUT", new=0.01):
        with mock.patch("locust.runners.CONNECTION_RETRY_COUNT", new=1):
            with mock.patch("locust.rpc.rpc.Client", mocked_rpc()) as client:
                try:
                    self.get_runner(environment=Environment(), user_classes=[MyTestUser])
                except ConnectionError:
                    self.assertEqual(2, len(client.outbox))

cyberw · 2022-04-15T15:39:05Z

I guess? The test case should also verify that the exception is really thrown I think. other than that it looks very reasonable.

github-actions · 2022-06-15T02:17:08Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days.

alvaro-ajv · 2022-06-28T20:07:30Z

@marcinh Can you please add flags to set a custom time to these variables CONNECT_TIMEOUT and CONNECT_RETRY_COUNT as is implemented in the master to wait for the workers with this flag: --expect-workers-max-wait, so if the flag is present it will wait the time and if not it will work as in the past. This because sometimes the workers can start first and wait for master, specially if the implementation is in Kubernetes.

cyberw · 2022-06-28T20:49:13Z

I dont understand. What is your problem exactly?

alvaro-ajv · 2022-06-28T21:04:26Z

@cyberw Now with this feature if we start the worker before the master or the master is not ready for any reason, the worker won't connect with the master if the connection timeout excedes, because the CONNECT_TIMEOUT variable is waiting a max of 5 seconds and with 2 retries, that means every worker will wait a max of 15 seconds, this value is completely fixed, so what I'm requesting is a command flag similar to --expect-workers-max-wait in master. In locust version 2.9 the workers wait until the master is available.

cyberw · 2022-06-28T21:58:14Z

Ah. We should probably just change CONNECT_RETRY_COUNT to default to unlimited or something. I didnt fullt understand this behaviour.

cyberw · 2022-06-28T22:04:13Z

I dont usually make PR:s on my phone but here goes :) #2125

marcinh added the feature request label Mar 7, 2022

Nosibb mentioned this issue Apr 21, 2022

Add ack for worker connection #2077

Merged

github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jun 15, 2022

cyberw removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jun 15, 2022

cyberw closed this as completed Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ACK for worker connection #2044

Add ACK for worker connection #2044

marcinh commented Mar 7, 2022

cyberw commented Mar 7, 2022 •

edited

Loading

marcinh commented Mar 7, 2022

cyberw commented Mar 7, 2022

marcinh commented Mar 8, 2022

cyberw commented Mar 8, 2022 •

edited

Loading

marcinh commented Mar 23, 2022

cyberw commented Mar 23, 2022

Nosibb commented Apr 15, 2022

cyberw commented Apr 15, 2022 •

edited

Loading

github-actions bot commented Jun 15, 2022

alvaro-ajv commented Jun 28, 2022 •

edited

Loading

cyberw commented Jun 28, 2022 •

edited

Loading

alvaro-ajv commented Jun 28, 2022 •

edited

Loading

cyberw commented Jun 28, 2022

cyberw commented Jun 28, 2022

Add ACK for worker connection #2044

Add ACK for worker connection #2044

Comments

marcinh commented Mar 7, 2022

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

cyberw commented Mar 7, 2022 • edited Loading

marcinh commented Mar 7, 2022

cyberw commented Mar 7, 2022

marcinh commented Mar 8, 2022

cyberw commented Mar 8, 2022 • edited Loading

marcinh commented Mar 23, 2022

cyberw commented Mar 23, 2022

Nosibb commented Apr 15, 2022

cyberw commented Apr 15, 2022 • edited Loading

github-actions bot commented Jun 15, 2022

alvaro-ajv commented Jun 28, 2022 • edited Loading

cyberw commented Jun 28, 2022 • edited Loading

alvaro-ajv commented Jun 28, 2022 • edited Loading

cyberw commented Jun 28, 2022

cyberw commented Jun 28, 2022

cyberw commented Mar 7, 2022 •

edited

Loading

cyberw commented Mar 8, 2022 •

edited

Loading

cyberw commented Apr 15, 2022 •

edited

Loading

alvaro-ajv commented Jun 28, 2022 •

edited

Loading

cyberw commented Jun 28, 2022 •

edited

Loading

alvaro-ajv commented Jun 28, 2022 •

edited

Loading