Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mitogen 0.3.3 + ansible 2.12.8+: Broker has exitted #967

Open
philfry opened this issue Sep 27, 2022 · 6 comments
Open

mitogen 0.3.3 + ansible 2.12.8+: Broker has exitted #967

philfry opened this issue Sep 27, 2022 · 6 comments
Labels
affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome

Comments

@philfry
Copy link
Contributor

philfry commented Sep 27, 2022

Hi,

I'm experiencing a strange issue when using ansible 2.12.8 and later with mitogen 0.3.3.
When running my (quite long running) playbook on more than 8 hosts, mitogen exits on (quite random, like hostname, systemd, service, template, make, …) tasks (but all hosts at the same time) with:

Traceback (most recent call last):
  File "/home/myuser/tmp/ansible/lib/ansible/executor/task_executor.py", line 158, in run
    res = self._execute()
  File "/home/myuser/tmp/ansible/lib/ansible/executor/task_executor.py", line 605, in _execute
    result = self._handler.run(task_vars=variables)
  File "/home/myuser/playbooks/plugins/strategy/mitogen/ansible_mitogen/mixins.py", line 146, in run
    return super(ActionModuleMixin, self).run(tmp, task_vars)
  File "/home/myuser/tmp/ansible/lib/ansible/plugins/action/normal.py", line 47, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "/home/myuser/playbooks/plugins/strategy/mitogen/ansible_mitogen/mixins.py", line 376, in _execute_module
    self._set_temp_file_args(module_args, wrap_async)
  File "/home/myuser/playbooks/plugins/strategy/mitogen/ansible_mitogen/mixins.py", line 355, in _set_temp_file_args
    self._connection.get_good_temp_dir()
  File "/home/myuser/playbooks/plugins/strategy/mitogen/ansible_mitogen/connection.py", line 832, in get_good_temp_dir
    self._connect()
  File "/home/myuser/playbooks/plugins/strategy/mitogen/ansible_mitogen/connection.py", line 854, in _connect
    self._connect_stack(stack)
  File "/home/myuser/playbooks/plugins/strategy/mitogen/ansible_mitogen/connection.py", line 801, in _connect_stack
    dct = mitogen.service.call(
  File "/home/myuser/playbooks/plugins/strategy/mitogen/mitogen/service.py", line 126, in call
    return call_context.call_service(service_name, method_name, **kwargs)
  File "/home/myuser/playbooks/plugins/strategy/mitogen/mitogen/core.py", line 2314, in call_service
    return recv.get().unpickle()
  File "/home/myuser/playbooks/plugins/strategy/mitogen/mitogen/core.py", line 1195, in get
    msg._throw_dead()
  File "/home/myuser/playbooks/plugins/strategy/mitogen/mitogen/core.py", line 935, in _throw_dead
    raise ChannelError(self.data.decode('utf-8', 'replace'))
mitogen.core.ChannelError: Broker has exitted

Running with 8 hosts or less or using ansible 2.12.7 and below works fine. Reducing ansible forks or MITOGEN_POOL_SIZE doesn't help.

I narrowed down the change in ansible that broke the playbook execution to ansible/ansible@45185b0 so reverting this commit fixes the problem.

Any ideas of what could be the incompatibility here?

@philfry philfry added affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome labels Sep 27, 2022
@sonnetmia
Copy link

facing the same issue,
ansible version is: 2.13.4
mitogen version: V0.3.4-beta

Wating for the fixes.

@ryan-u410
Copy link

Experiencing the same issues. Commenting out the line as per the comment here seemed to fix.

@philfry
Copy link
Contributor Author

philfry commented Feb 27, 2023

With 0af2ce8 this close statement was reworked but that didn't fix it.

The error is slightly different, though:

ERROR! [task 411936] 09:13:05.331167 E mitogen: broker crashed                                                                                                                                                                                 
Traceback (most recent call last):                                                                                                                                                                                                             
  File "/home/myuser/projects/3rdparty/mitogen/mitogen/core.py", line 3588, in _do_broker_main                                                                                                                                                 
    self._loop_once()                                                                                                                                                                                                                          
  File "/home/myuser/projects/3rdparty/mitogen/mitogen/core.py", line 3543, in _loop_once                                                                                                                                                      
    for side, func in self.poller.poll(timeout):                                                                                                                                                                                               
  File "/home/myuser/projects/3rdparty/mitogen/mitogen/core.py", line 2465, in _poll                                                                                                                                                           
    (rfds, wfds, _), _ = io_op(select.select,                                                                                                                                                                                                  
                         ^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                  
  File "/home/myuser/projects/3rdparty/mitogen/mitogen/core.py", line 567, in io_op                                                                                                                                                            
    return func(*args), None                                                                                                                                                                                                                   
           ^^^^^^^^^^^                                                                                                                                                                                                                         
ValueError: filedescriptor out of range in select()

Unfortunately, the only way I'm aware of to mitigate this is to downgrade to ansible 2.12.7.

@philfry
Copy link
Contributor Author

philfry commented Mar 24, 2023

Found the issue. select() is limited to 1024 fds and we need to use poll() here. Which is already implemented.
In https://github.com/mitogen-hq/mitogen/blob/master/ansible_mitogen/process.py#L282 the poller is reset to mitogen.core.Poller which is contraproductive here.
Just remove the class replace the poller_class = line with pass and be happy.

philfry added a commit to philfry/mitogen that referenced this issue Mar 24, 2023
philfry added a commit to philfry/mitogen that referenced this issue Mar 24, 2023
philfry added a commit to philfry/mitogen that referenced this issue Mar 24, 2023
@jbg-sc
Copy link

jbg-sc commented Jul 19, 2023

Found the issue. select() is limited to 1024 fds and we need to use poll() here. Which is already implemented. In https://github.com/mitogen-hq/mitogen/blob/master/ansible_mitogen/process.py#L282 the poller is reset to mitogen.core.Poller which is contraproductive here. Just remove the class replace the poller_class = line with pass and be happy.

This workaround did not work for me, when running on 50+ hosts the playbook just "freeze" , revert to ansible 2.12.7 also did not work at the moment, I have to pursue investigations

@amarao
Copy link

amarao commented Jul 20, 2023

I'm not sure about all details here, but ansible-mitogen uses CPU pinning onto first two CPUs (you can see it when you run ansible with mitogen_linear with big number of hosts and forks, only first two CPUs are 100% busy).

The more hosts you have to run, the more congested those CPUs become, and everything slows down. That may explain 'freeze' behavior.

I've solved that problem by running deployment in parallel from multiple hosts (github actions) with --limit, where each runner runs playbook for a single host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome
Projects
None yet
Development

No branches or pull requests

5 participants