Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pyzabbix with zabbix_utils #618

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Replace pyzabbix with zabbix_utils #618

wants to merge 1 commit into from

Conversation

fauust
Copy link
Collaborator

@fauust fauust commented Nov 19, 2024

pyzabbix is not tested on latest Zabbix LTS (7.0)

@fauust fauust changed the title Zbx Replace pyzabbix with zabbix_utils Nov 19, 2024
@fauust
Copy link
Collaborator Author

fauust commented Nov 19, 2024

BTW, there is an async mode that may be interesting for BB:
https://github.com/zabbix/python-zabbix-utils/tree/main?tab=readme-ov-file#documentation

@fauust
Copy link
Collaborator Author

fauust commented Nov 19, 2024

Ci is red because it needs new containers...

pyzabbix is not tested on latest Zabbix LTS (7.0)
@RazvanLiviuVarzaru
Copy link
Collaborator

RazvanLiviuVarzaru commented Nov 19, 2024

BTW, there is an async mode that may be interesting for BB: https://github.com/zabbix/python-zabbix-utils/tree/main?tab=readme-ov-file#documentation

Thanks Faustin for looking for a library tested on 7.0
Even if it has only 2 contributors I like that the code is clean and documentation clear.

Async behavior would be great, unfortunately this is written on top of asyncio and is not compatible with the Twisted reactor as far as I know. Maybe @vladbogo or @cvicentiu can help me with an opinion on this.

One medium->high concern:
If we can't use async code, then all the calls to Zabbix API are blocking, meaning that a master won't do anything until the getMetric function returns. Current implementation suffers from this also. Multiply this by the number of workers for which we need load data.

Now, the implementation is isolated to the s390x machines at the moment.
I know that @vladbogo had plans to extend this to all the builders. I personally don't recommend it, because sync code that it's dependent on how fast Zabbix will return the results over the network, has the potential to freeze BuildBot.

I think the locks (maxBuilds) mechanism that BuildBot provides to restrict the number of parallel builds on a worker, if it's fine tuned, is great enough.

Let me know what are your thoughts, but my plan would be:
-> remove Zabbix dependency for BuildBot
-> get extra s390x machines
-> fine-tune the locks mechanism

@vladbogo
Copy link
Collaborator

vladbogo commented Nov 19, 2024

@RazvanLiviuVarzaru, I don’t have any specific insights regarding async operations and Twisted.

While I understand your concern, I don’t think it’s a major issue since Zabbix runs on the same machine. Handling several calls—approximately 2-3k per day by my estimates—should be fine. However, I don’t see significant benefits compared to locks for load management since both seem quite tricky to tune. But, the biggest drawback of the locks method is that builds start and just wait doing nothing, which I would say isn’t ideal.

On another note, I believe the Zabbix approach offers much more flexibility. As @fauust suggested some time ago, you can extend the Zabbix implementation to disable workers for maintenance without impacting Buildbot config at all. By setting a specific metric in Zabbix, Buildbot won’t start any builds if that metric is set, allowing to easily disable/enable machines for maintenance. Also, it could be extended to prevent any builds from starting if a machine suddenly goes down by checking the status.

Lastly, to me zabbix seemed quite stable connection wise and I don't remember having major issues for s390x.

@RazvanLiviuVarzaru
Copy link
Collaborator

@RazvanLiviuVarzaru, I don’t have any specific insights regarding async operations and Twisted.

While I understand your concern, I don’t think it’s a major issue since Zabbix runs on the same machine. Handling several calls—approximately 2-3k per day by my estimates—should be fine. However, I don’t see significant benefits compared to locks for load management since both seem quite tricky to tune. But, the biggest drawback of the locks method is that builds start and just wait doing nothing, which I would say isn’t ideal.

On another note, I believe the Zabbix approach offers much more flexibility. As @fauust suggested some time ago, you can extend the Zabbix implementation to disable workers for maintenance without impacting Buildbot config at all. By setting a specific metric in Zabbix, Buildbot won’t start any builds if that metric is set, allowing to easily disable/enable machines for maintenance. Also, it could be extended to prevent any builds from starting if a machine suddenly goes down by checking the status.

Lastly, to me zabbix seemed quite stable connection wise and I don't remember having major issues for s390x.

@vladbogo I forgot about Zabbix server running on the same server as buildbot.
I still consider that we must aim for an async behavior if we want to extend this to more arch's. To achieve this https://github.com/zabbix/python-zabbix-utils/blob/main/zabbix_utils/aioapi.py must be re-written for Twisted.

We can merge this patch just to benefit from 7.0 compatibility but I will advise to wait until we have Production in containers too, so it's easier to manage dependencies / rollback (previous container image) .

@fauust
Copy link
Collaborator Author

fauust commented Nov 25, 2024

@vladbogo I forgot about Zabbix server running on the same server as buildbot.

@RazvanLiviuVarzaru this is not entirely true, there is a proxy on the BB master which is responsible of buffering metrics from all the workers but then everything goes to the Zabbix server which is the API endpoint. That said, it's extremely stable and if we need HA one day, that's still possible and easy to implement.

But I fully agree, let's wait for BBM upgrade on prod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants