Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional failure to loginByToken produces hang/timeout #17

Open
zacwest opened this issue Feb 18, 2023 · 10 comments
Open

Occasional failure to loginByToken produces hang/timeout #17

zacwest opened this issue Feb 18, 2023 · 10 comments

Comments

@zacwest
Copy link

zacwest commented Feb 18, 2023

Thanks for the library, it's made things a lot easier! I'm running into an issue where invocations end up being timed out by Ansible after some kind of internal failure. My setup is somewhat simple: Uptime-Kuma is running in a docker container on fly.io.

For example, running a command like:

- name: Get Uptime Kuma push monitor info
  delegate_to: 127.0.0.1
  become: false
  throttle: 1
  lucasheld.uptime_kuma.monitor_info:
    api_url: "{{ uptime_kuma_url }}"
    api_token: "{{ uptime_kuma_api_token }}"
    name: "{{ monitor_name }}"

I've traced this back to a timeout occurring in socketio (the log output here is my executing the ansible-generated python script manually repeatedly to try and induce the failure) and a raised exception going uncaught:

Traceback (most recent call last):
  File "/Users/zac/Servers/ovh/./test.py", line 107, in <module>
    _ansiballz_main()
  File "/Users/zac/Servers/ovh/./test.py", line 99, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File "/Users/zac/Servers/ovh/./test.py", line 47, in invoke_module
    runpy.run_module(mod_name='ansible_collections.lucasheld.uptime_kuma.plugins.modules.monitor_info', init_globals=dict(_module_fqn='ansible_collections.lucasheld.uptime_kuma.plugins.modules.monitor_info', _modlib_path=modlib_path),
  File "/opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 224, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/var/folders/1y/9pbgc3zx1kb_1mgvd5m97xc40000gn/T/ansible_lucasheld.uptime_kuma.monitor_info_payload_ha8zoka_/ansible_lucasheld.uptime_kuma.monitor_info_payload.zip/ansible_collections/lucasheld/uptime_kuma/plugins/modules/monitor_info.py", line 404, in <module>
  File "/var/folders/1y/9pbgc3zx1kb_1mgvd5m97xc40000gn/T/ansible_lucasheld.uptime_kuma.monitor_info_payload_ha8zoka_/ansible_lucasheld.uptime_kuma.monitor_info_payload.zip/ansible_collections/lucasheld/uptime_kuma/plugins/modules/monitor_info.py", line 381, in main
  File "/Users/zac/Servers/ovh/.venv/lib/python3.10/site-packages/uptime_kuma_api/api.py", line 2552, in login_by_token
    return self._call('loginByToken', token)
  File "/Users/zac/Servers/ovh/.venv/lib/python3.10/site-packages/uptime_kuma_api/api.py", line 480, in _call
    r = self.sio.call(event, data)
  File "/Users/zac/Servers/ovh/.venv/lib/python3.10/site-packages/socketio/client.py", line 471, in call
    raise exceptions.TimeoutError()
socketio.exceptions.TimeoutError

I added some logging around the call site in api.py:

https://github.com/lucasheld/uptime-kuma-api/blob/master/uptime_kuma_api/api.py#L478-L484

What's happening appears to be the loginByToken call attempts to occur, but times out. Weirdly, I do see this request coming through on the Uptime Kuma side:

2023-02-18T17:27:19Z app[21342cbf] sjc [info]2023-02-18T09:27:19-08:00 [AUTH] INFO: Login by token. IP=<snip>
2023-02-18T17:27:19Z app[21342cbf] sjc [info]2023-02-18T09:27:19-08:00 [AUTH] INFO: Username from JWT: <snip>
2023-02-18T17:27:19Z app[21342cbf] sjc [info]2023-02-18T09:27:19-08:00 [AUTH] INFO: Successfully logged in user <snip>. IP=<snip>

When this occurs, I see the _send call begin, but it never returns until it raises the exception, which doesn't appear to be caught successfully. The end result is the python script hangs indefinitely and ends up being killed by Ansible after the timeout, rather than sending the error up the stack.

So perhaps 2 things here:

  1. If this error occurs, it should be caught and raised to Ansible so it can do its own retry logic rather than timing out, which I think is unretryable?
  2. Something on the Uptime Kuma side, the Python API side, or the invocation by the Ansible library is failing to handle the response to the login call, but I haven't had a moment to stick another reverse proxy in front of Uptime Kuma to see if it is actually sending an HTTP response.
@lucasheld
Copy link
Owner

Thank you for the research.
Retries on timeouts would be useful, also for other temporary network errors.
Also logging should help to debug these problems.

I think I should refactor the code first so that there is less duplicate code and the two things can be implemented more easily.

@neilbags
Copy link

Thanks for this excellent ansible integration. It work as expected and right out-of-the-box.

I am however seeing this issue. When adding ~40 monitors, at least a few of them will fail every time. It doesn't appear to be related to server load, and it shouldn't be a network problem - the sites I am monitoring are all on the same server ~20ms away and the connection is solid.

It doesn't appear to matter whether you use a token or a username/password.

I can't see any errors in docker logs or nginx's error.log

Using throttle:1 has no effect, nor does forks = 1.

I suspect #20 may be a symptom of the same issue as I saw this behaviour initially as well.

I can reproduce this every time so can do testing if you can think of anything that will help

@neilbags
Copy link

Just one more bit of info:

Sometimes the monitor is added even when ansible says its failed, but sometimes it isn't

@namvan
Copy link

namvan commented Jul 27, 2023

Did you guys find a work-around for this as I am completely stuck with frozen runs?

@namvan
Copy link

namvan commented Jul 27, 2023

Just a quick note to you all that it seems to be an issue with the reverse proxy for me. I am using haproxy.
Pointing uptime_url direct to the app worked out perfectly and of course unsecure perfectly.

@exismys
Copy link

exismys commented Aug 9, 2023

Just one more bit of info:

Sometimes the monitor is added even when ansible says its failed, but sometimes it isn't

I faced the same issue. It shows an error on the Ansible side:
image
But the monitor has been successfully created on the uptime-kuma side (no errors in docker logs):
image

@etpedro
Copy link

etpedro commented Sep 12, 2023

Hi!

I'm having the same issue. I'm currently hosting Uptime Kuma on an Azure Web App and the Ansible playbook hangs everytime while executing different tasks.

Any idea on how to overcome this?

@derekkddj
Copy link

i have the same problem

@invisibleninja06
Copy link

Same issue here too, really annoying and trying to retry when it occurs so far is not working.
Makes using the module rather unstable and need to rerun playbooks over and over till everything is created.

@invisibleninja06
Copy link

Ok one thing to help people is to add retries to uptime kuma tasks

something like

register: task_results
retries: 5
until: task_results.rc | default(0) == 0
ignore_errors: true

This will set the return code to 0 if not defined and retry is its anything other than 0
When it hits those timeouts the return code (rc) is 1 so it will trigger a retry. ignore_errors is set to false so that the exception doesnt stop the playbook in its tracks.

Hope this helps someone hitting the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants