Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_start event triggered multiple times on workers #1986

Closed
jacksmith15 opened this issue Jan 27, 2022 · 8 comments · Fixed by #2072
Closed

test_start event triggered multiple times on workers #1986

jacksmith15 opened this issue Jan 27, 2022 · 8 comments · Fixed by #2072
Labels

Comments

@jacksmith15
Copy link

jacksmith15 commented Jan 27, 2022

Describe the bug

When running locust in distributed mode, the test_start handler gets triggered every time new users are spawned, rather than once running once at the start of the test.

This only seems to occur if there is a test_start handler which takes longer than the interval between spawn events (I think 1s?).

This is problematic if the test needs to load test data from an external source and share it between the users.

To give an example of a concrete use case:

  • I want the test to access load logs from an external source and replay them.
  • These logs are valid for the lifetime of a test (rather than process or user) - so they can be shared between users on a worker.
  • When I trigger the test the log data-store gets hammered by duplicated requests (load testing the wrong thing! 😅).

Expected behavior

The test_start event handlers should be triggered once when the test begins, allowing e.g. expensive test data loading operations to occur only once.

Actual behavior

The test_start event handlers are triggered multiple times, with all but the final handler being killed prematurely.

Steps to reproduce

Given the following locustfile:

import time

from locust import HttpUser, events, runners, task


COUNT = 0

@events.test_start.add_listener
def test_start_worker(environment, **kwargs):
    if not isinstance(environment.runner, runners.WorkerRunner):
        return
    global COUNT
    COUNT += 1
    print(f"'test_start' triggered {COUNT} times.")
    time.sleep(1)  # Same result with `gevent.sleep`. Less than 1 second works fine, >=1 second causes the described behaviour.


class HelloWorldUser(HttpUser):
    @task
    def hello_world(self):
        self.client.get("/hello")
        self.client.get("/world")

And the following docker-compose.yml

version: '3'

services:
  master:
    image: locustio/locust:2.5.1
    ports:
    - "8089:8089"
    volumes:
    - ./:/mnt/locust
    command: -f /mnt/locust/locustfile.py --master -H http://master:8089 --headless -u 10 -r 1 --run-time 1m --expect-workers 1

  worker:
    image: locustio/locust:2.5.1
    volumes:
    - ./:/mnt/locust
    command: -f /mnt/locust/locustfile.py --worker --master-host master

Run

docker-compose up -d

And the logs for the worker will display:

worker_1  | [2022-01-27 11:56:19,677] 26ff2afeca30/INFO/locust.main: Starting Locust 2.5.1
worker_1  | 'test_start' triggered 1 times.
worker_1  | 'test_start' triggered 2 times.
worker_1  | 'test_start' triggered 3 times.
worker_1  | 'test_start' triggered 4 times.
worker_1  | 'test_start' triggered 5 times.
worker_1  | 'test_start' triggered 6 times.
worker_1  | 'test_start' triggered 7 times.
worker_1  | 'test_start' triggered 8 times.
worker_1  | 'test_start' triggered 9 times.
worker_1  | 'test_start' triggered 10 times.

Environment

  • OS: Ubuntu 21.10
  • Python version: 3.9.9
  • Locust version: 2.5.1
  • Locust command line that you ran: As described by docker-compose.yml
  • Locust file contents (anonymized if necessary): Shown above
  • docker-compose version 1.27.4
  • Docker version 20.10.7
@cyberw
Copy link
Collaborator

cyberw commented Jan 27, 2022

Interesting. Maybe something for you @mboutet ?

@jacksmith15
Copy link
Author

jacksmith15 commented Jan 27, 2022

📝 My current workaround for this is to spawn a background greenlet in the event handler, so that the event handler returns quickly and is not retriggered.

In the above example this would be:

@events.test_start.add_listener
def test_start_worker(environment, **kwargs):
    if not isinstance(environment.runner, runners.WorkerRunner):
        return
    global COUNT
    COUNT += 1
    print(f"'test_start' triggered {COUNT} times.")
    gevent.spawn_later(0, time.sleep, 1)

Naturally this means any user code which depends on this completing needs to be defensive, since the user will start before this completes.

Leaving this here as it may help others!

@mboutet
Copy link
Contributor

mboutet commented Jan 30, 2022

Sorry for the delay, I'll try to take a look, but this seems strange.

@github-actions
Copy link

github-actions bot commented Apr 1, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 1, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 10 days with no activity. This does not necessarily mean that the issue is bad, but it most likely means that nobody is willing to take the time to fix it. If you have found Locust useful, then consider contributing a fix yourself!

@cyberw cyberw reopened this Apr 12, 2022
@cyberw
Copy link
Collaborator

cyberw commented Apr 12, 2022

@mboutet any chance you can have a look?

@mboutet
Copy link
Contributor

mboutet commented Apr 13, 2022

This is definitely an undesirable behaviour. However, having a blocking test_start hook running for 1s or more is tricky to handle with the current code:

locust/locust/runners.py

Lines 1097 to 1113 in 300b93e

def start_worker(self, user_classes_count: Dict[str, int], **kwargs):
"""
Start running a load test as a worker
:param user_classes_count: Users to run
"""
self.target_user_classes_count = user_classes_count
self.target_user_count = sum(user_classes_count.values())
if self.worker_state != STATE_RUNNING and self.worker_state != STATE_SPAWNING:
self.stats.clear_all()
self.exceptions = {}
self.cpu_warning_emitted = False
self.worker_cpu_warning_emitted = False
self.environment._filter_tasks_by_tags()
self.environment.events.test_start.fire(environment=self.environment)
self.worker_state = STATE_SPAWNING

This method is called every time a worker receives a spawn message from the master as I explained in #1856 (comment). Therefore, if the test_start hook does not finish before the next spawn message, self.worker_state = STATE_SPAWNING never gets called.

We might be able to improve the behaviour by relocating the self.environment.events.test_start.fire just before the if self.spawning_greenlet: in:

locust/locust/runners.py

Lines 1194 to 1200 in 300b93e

if self.spawning_greenlet:
# kill existing spawning greenlet before we launch new one
self.spawning_greenlet.kill(block=True)
self.spawning_greenlet = self.greenlet.spawn(lambda: self.start_worker(job["user_classes_count"]))
self.spawning_greenlet.link_exception(greenlet_exception_handler)
last_received_spawn_timestamp = job["timestamp"]

However, a long-running test_start hook will still interfere with the ramp-up/down because the spawn messages from the master won't be processed until the hook is done running.

I'll open a draft PR with the change I'm proposing.

@mboutet
Copy link
Contributor

mboutet commented Apr 13, 2022

@jacksmith15, thanks for the detailed steps for reproducing the issue. Much appreciated!

@github-actions github-actions bot removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants