Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor zabbix server #1193

Merged
merged 15 commits into from
May 12, 2024
Merged

Conversation

eb4x
Copy link
Collaborator

@eb4x eb4x commented Mar 18, 2024

SUMMARY

This PR consists of a lot of iterative commits, they should all be (somewhat) self-explanatory, doing a "single self-contained" change and a rationale for it. If one of the changes are not to anyones liking, comment on it and we can either improve upon it or take it out.

This is a considerably big refactor of the zabbix_server role. I'm not expecting it to break for existing installations and whatever overrides have been provided. It's not intended to "fix" any issues, just clean up the tasks and make the experience somewhat more consistent. (Though it does bring back support for deploying Zabbix 5.0 LTS.)

The most "controversial" change might be stripping out the zabbix_valid_server_versions check.

Other decisions include moving package installation out of the os_family tasks, tasked with setting up the repo. But they did much more beyond that. So the installation happens in main.yml, after repo setup, and the dependencies for db-variants are installed in the mysql/postgresql task files, (instead of checking which dependencies are needed for which db-variant for which os, during repo setup)

Arguably the repo-setup should be it's own role, as it's repeated out for every single role.

ISSUE TYPE
  • Refactor Pull Request
COMPONENT NAME

zabbix_server

ADDITIONAL INFORMATION

I'd like for this refactor to retain all its commits and messages, and not become a squash merge. One thing I sorely missed during this refactor was the ability to find a log as to why things were done in a certain way.

I couldn't quite get molecule to work by itself, and to iterate and test multiple platforms I modified it a bit to have things happen in parallel. I also got a dedicated machine with multiple gh-runners set-up, but can't really explain why wasn't a great success with the existing workflow.

Also, I couldn't get the geerlingguy images to work, but I think the thing we want are simple systemd based images (alma and rocky provide these already) with python3 installed, which turns out being very simple Dockerfiles to create.

FROM {{ item.base_image }}
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y python3-pip systemd-sysv
CMD ["/sbin/init"]

So maybe rely on upstream images instead of geerlingguy in the future?

You can check out these kinds of changes here; https://github.com/eb4x/community.zabbix/blob/molecule5/molecule/zabbix_server/molecule.yml

@pyrodie18
Copy link
Collaborator

So without having seen if molecule pushes through all of this (just hit the go button) I will say this. First, you did a ton of work here, so thank you for that. It is a big refactor. I will say some of the reason why tasks are broken out the way they are (in Debian vs main for example) most likely goes back to how things were put together < 2.0 when we were trying to accommodate pretty much every version of Zabbix known to man. We cleaned up a lot of it, but kept the same basic form-factor of task distribution. I'm not opposed to doing that, but if we are, I would only support it if it was a common refactor across all of the roles.

I don't personally support bringing back 5.0. One if we're going to do it, we need to do it for all of the roles but more importantly our current stance is we only support fully supported versions of Zabbix. Yes it limits some people's use cases but the goal of the collection is not to provide 100% support for every use case, but to support the most common.

In one of your PRs you made a comment about the user should know the risks of trying to run against an unsupported version. I agree that they should but when you look at some of the tickets that get created, its obvious that they don't which is why we installed the sanity check. My own personal thinking when I introduced that was if they understood the risks, then they would also be able to comment that task out easy enough to do whatever they were wanting to do.

I'm curious to hear @D3DeFi and @BGmot thoughts.

@dj-wasabi
Copy link
Contributor

It is a huge effort made by you @eb4x , but I totally support @pyrodie18 with Zabbix 5.0 LTS, it is end of support since last year may (https://www.zabbix.com/life_cycle_and_release_policy). We can not keeping old versions like the beginning of the role when it was under my name, it takes a lot of effort to maintain them.

@eb4x
Copy link
Collaborator Author

eb4x commented Mar 21, 2024

Thank you for taking a look 😃

We cleaned up a lot of it, but kept the same basic form-factor of task distribution. I'm not opposed to doing that, but if we are, I would only support it if it was a common refactor across all of the roles.

I just wanted to get this PR out the door to see how it would be received. If this is something we would go forward with, I'll do a refactor on the rest of the roles, and this PR can just sit and wait.

I don't personally support bringing back 5.0.

There was so much code in place to bring it back already, it was like 8 lines missing. With the amount of code-reduction in this refactor it was a fun and easy thing to add.

we need to do it for all of the roles but ...

Can do 😸 !

but when you look at some of the tickets that get created, its obvious that they don't which is why we installed the sanity check.

Fair point 👍

they would also be able to comment that task out easy enough to do whatever they were wanting to do.

Perhaps supply a flag to a when condition of such a task, so we can bypass it. -e skip_version_check=true? I'll refactor in that change.

Zabbix 7 is coming soon. I've already done some work to the plugins to support the renamed mappings, but only for the ones I needed. That codebase could use a good refactor aswell, but I'd need some guidance and hear some opinions or preferences for the code.

@pyrodie18
Copy link
Collaborator

pyrodie18 commented Mar 21, 2024

I just wanted to get this PR out the door to see how it would be received. If this is something we would go forward with, I'll do a refactor on the rest of the roles, and this PR can just sit and wait.

Let me download this PR and look at it without the comparisons (its really hard to see simplification with all of the red lines but I'm sure its there and give you a better opinion later on today or this weekend.

There was so much code in place to bring it back already, it was like 8 lines missing. With the amount of code-reduction in this refactor it was a fun and easy thing to add.

I agree but then it becomes us going down this road that doesn't end. If I remember correctly there we comments made about us dropping support for 4.0 even though it was dead dead by that point and I'm sure the same argument could be made about only 9 lines (no idea but its a guess).

Perhaps supply a flag to a when condition of such a task, so we can bypass it. -e skip_version_check=true? I'll refactor in that change.

Ya I could support that

@pyrodie18
Copy link
Collaborator

One other comment I will make about bringing back 5.0 is it grows the testing matrix substantially to add another version. It already takes something like 2 hours to run through the full test matrix when we affect every role and plugin. Also we removed all of the 5.0 support from the modules which is definitely more than 9 lines.

@eb4x
Copy link
Collaborator Author

eb4x commented Mar 21, 2024

One other comment I will make about bringing back 5.0 is it grows the testing matrix substantially to add another version. It already takes something like 2 hours to run through the full test matrix when we affect every role and plugin.

Well, atleast we can strip out 6.2 now right? But yes, no-one in their right mind should be deploying 5.0, so I'm fully behind not testing for it. Since the scripts had a check for whether to use create.sql or server.sql, I opted to complete that functionaliy instead of taking it away. There is very little else required, just some config-options. (How about we not tell anyone the roles can also deploy 5.0 again?)

Also we removed all of the 5.0 support from the modules which is definitely more than 9 lines.

Ah, didn't think about those.

@eb4x
Copy link
Collaborator Author

eb4x commented Mar 21, 2024

Ok, so I took a 32nd look at this code, and there is a flaw here. It was always there, but the ansible python database dependencies are installed on the zabbix_server even though the database creation tasks could be delegated to an external database, in which case the dependencies could be missing. This was a problem before the refactor, it remains a problem after.

I'll drum up a fix for that.

@pyrodie18
Copy link
Collaborator

pyrodie18 commented Mar 21, 2024

Well, atleast we can strip out 6.2 now right?

Removing support for a already working feature/version is a breaking change. When we release 3.0 (which we plan on doing with the release of Zabbix 7) we'll drop support for 6.2 and 6.4 as well as a few operating systems.

@eb4x eb4x force-pushed the refactor_zabbix_server branch 3 times, most recently from dd71333 to b3b1c82 Compare March 25, 2024 13:47
@eb4x
Copy link
Collaborator Author

eb4x commented Mar 25, 2024

During the refactoring of zabbix_proxy I noticed more variables (zabbix_server_install_database_client) that we could take advantage of. So I've brought those features over here by updating the commits that affect those. Also renaming some tasks to better reflect what action is being taken, and trying to make the roles look as similar as possible.

@eb4x eb4x mentioned this pull request Mar 25, 2024
Copy link
Collaborator

@pyrodie18 pyrodie18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got a chance to actually look at what you did. Again, great work and I actually think you clean up some things nicely. I think I'm a fan of moving all of the install tasks out of the OS sections and putting them in the more relevant places and using the package module instead of yum/apt.

Comment on lines 10 to 28
- name: "Debian | Installing lsb-release"
ansible.builtin.apt:
pkg: lsb-release
update_cache: true
cache_valid_time: 3600
force: true
state: present
environment:
http_proxy: "{{ zabbix_http_proxy | default(None) | default(omit) }}"
https_proxy: "{{ zabbix_https_proxy | default(None) | default(omit) }}"
become: true
tags:
- install

- name: "Debian | Update ansible_lsb fact"
ansible.builtin.setup:
gather_subset:
- lsb

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably keep this in here since it ties back to the fix for #57. Again we don't officially support Raspbian (remember that slippery slope I mentioned) but removing it would make this a breaking change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't notice the raspbian in repo.zabbix.com. I'll modify so it works with raspbian 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to work with Raspbian, and made a note of my findings in 5c6337a

roles/zabbix_server/tasks/Debian.yml Outdated Show resolved Hide resolved
Comment on lines 100 to 106
# Since this is where Zabbix 5.0 installs its database schemas, we need to allow
# files to be installed to /usr/share/doc/zabbix-server-*
- name: "Debian | Allow Zabbix dpkg installs to /usr/share/doc/zabbix-server-*"
ansible.builtin.lineinfile:
path: /etc/dpkg/dpkg.cfg.d/excludes
line: "path-include=/usr/share/doc/zabbix*"
create: true
line: 'path-include=/usr/share/doc/zabbix\-server\-*'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be completely honest I've never totally understood what this task did (and never taken the time to figure it out) but if I read your change correctly it only applies to Zabbix < 6.0 and running in a container. Because of the < 6.0 part I think this entire task can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I discovered a similar task in molecule/zabbix_proxy/prepare.yml, and I think I questioned (in this commit) whether or not this should be done there instead. I've since moved it there. But yeah, it should get removed eventually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3923029

Comment on lines +18 to +34
when: zabbix_server_database_creation
delegate_to: "{{ zabbix_server_real_dbhost | default(zabbix_server_dbhost_run_install | ternary(delegated_dbhost, inventory_hostname)) }}"
vars:
delegated_dbhost: "{{ (zabbix_server_dbhost == 'localhost') | ternary(inventory_hostname, zabbix_server_dbhost) }}"
tags:
- database
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will admit this is me being OCD and if there's a best practice out there that I am unware about, please point me to it but everywhere else we put the when, delegate, tags, etc. options at the bottom of the task vs the top. I understand that this is a block module and that may well be why you did it (and in all honestly it makes sense) but my OCD caught it. Even as I type this I'm unsure if I actually think this is better and need to rethink my out practices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, until I started using ansible-lint, I also always put the when, delegate_to, vars and such at the bottom. ansible-lint had a pretty good argument for putting these things before block/rescue/always. Since then (even though it was a bit jarring at first) I've started putting when right after name for many non-block tasks aswell. It forces you to write a better description for name and you can immediately read the condition.

"Initialize database, when zabbix_server_database_creation

or

"Install packages, when item.required"

It doesn't feel right for every type of tast though. And again, I know it looks and feels weird at first, but if you give it a chance, it'll grow on you 😸

roles/zabbix_server/tasks/initialize-mysql.yml Outdated Show resolved Hide resolved
state: restore
target: "{{ is_legacy_version | ternary(legacy_path, modern_path) }}"
vars:
is_legacy_version: "{{ zabbix_server_version is version('6.0', '<') }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never noticed that this got left in here but would remove it given its only needed for < 6.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I meant when I said most of the code for deploying 5.0 was still there. I'd rather not break anything just yet. It's kinda fun for me refactoring, but maintaining all functionality.

There's already so much code trimmed, and the 5.0 tasks are easily identifiable.

Will eventually create a commit that strips it out though, don't worry :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the changes to support < 6.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3923029

@eb4x
Copy link
Collaborator Author

eb4x commented Apr 13, 2024

Apologies for the hold up, been busy trying to get my own actions-runner working so I can get these results myself, as currently I'm currently just testing against a few select platforms and using images generated from Dockerfiles.

I'm currently experimenting with running a kind kubernetes cluster, and the actions-runner-controller, and had some marginal success getting it to run atleast 😅

I'll get back to this once I have gotten repeatable tests.

@eb4x eb4x force-pushed the refactor_zabbix_server branch 4 times, most recently from 1e55259 to 2fa5fc6 Compare May 2, 2024 07:43
@eb4x eb4x requested a review from pyrodie18 May 2, 2024 07:45
@eb4x
Copy link
Collaborator Author

eb4x commented May 2, 2024

Apologies for the hold-up. To borrow a sentence from @fasterthanlime

GitHub actions-runners do not spark joy.

So there are the important changes since last;

  • using pymysql for all platforms (swapping out mysqldb on Debian systems, reasoning is explained in 02102f3)
  • bringing back lsb to identify raspbian (details in 5c6337a)
  • dropping code that made zabbix 5.0 installations work (2fa5fc6)

(I'll create a separate PR to address some issues I had with scaling actions-runner)

eb4x added 5 commits May 10, 2024 21:23
When they're the same for all versions of the platform, we can
drop the lookup.
This variable doesn't make sense. It's the same for Debian and
RedHat, and it's used to set a path for mysql or postgresql
schemas, which we already know by the time we're in the respective
tasks file.
There is some code here to handle legacy ways, but wasn't working
in it's current form. This should make it work.
Some further diggig reveals that this only applies to Zabbix 5.0,
which is going out of style, and further only on Ubuntu, and only
if they are container images. This is better handled in
molecule/zabbix_server/prepare.yml
Install the zabbix-server-{{ zabbix_server_database }} from a
single common task. Sure, the repo setup is quite different based
on os_family, but the package is practically the same, though there
are some things to pay attention to.

RedHat can pin the minor version, and has a toggle to disable repo.
Debian can't pin minor version, but has install_recommends.

We use the common package module, which is just a wrapper around
apt/yum, and use this construction;

  user_supplied_var | default(_calculated_var | default(omit))

to send additional parameters to the respective modules. You can
kinda read it like this;

If the user wants to override something, use that.
If it's not defined, try a calculated variable based on os_family.
If there's no calculated var, that means the option is not valid
for the package module, so omit this parameter.

I think it's good practice to prefix variables, that are calculated
or not meant for user-overrides with '_'.
eb4x added 10 commits May 10, 2024 21:24
We don't need additional tasks for installing zabbix-sql-scripts,
we can just tack them on to zabbix-server-{{ database }}. It was
probably introduced in 5.4. And since that's not maintained, but we
still try to support 5.0, which hasn't got this package, we create
a little when condition for it.
It's cleaner, less conditional checking.

Remove PyMySQL installation via pip from molecule/prepare.yml and
switch from mysqldb to pymysql for all debian based systems. The
only reason we're installing any python/mysql dependencies at all
is to use the ansible community.mysql collection, which has a
preference for pymysql, and mysqldb as a backup [1].

Upgrade pymysql for ubuntu-18.04. Upgrading system-packages with
pip is generally a bad idea, but ubuntu-18.04 comes with
pymysql==0.8.0, which seems to have a problem with newer versions
(>=8) of mysql.

pymysql>=0.9.0,<0.10.0 fixes the issue with passwords missing
during login, but still has a problem setting
log_bin_trust_function_creators.

pymysql>=0.10.0,<0.11.0, fixes the remaining issue. We don't want
to stray to far off from the system default of 0.8.0

The RedHat playbook already had a variable for the dependencies, we
give it a slighly more accurate name, and create a similar one for
Debian.

python2 has been sunset sometime back in January 2020, so we're
gonna assume it's always python3. I'm not sure why there's a
special case for debian-buster, where we opt for mariadb-client
instead of default-mysql-client, as the latter package is available
and seems to work. So we're dropping the special buster case.

[1] https://github.com/ansible-collections/community.mysql/blob/main/plugins/module_utils/mysql.py#L21-L36
It's cleaner, less conditional checking.

Similar to the mysql dependencies cleanup.

python2 has been sunset sometime back in January 2020, so we're
gonna assume it's always python3. There's a condition that checks
for zabbix_server_database_creation before installing the python
dependencies, but it's only checking this for RedHat, which leaves
systems inconsistent, so we're gonna drop that conditional.

The tags on this task are quite useful, and should be on the mysql
dependencies aswell. Also, zabbix_server_install_database_client
exists on the postgres side of things, but only for RedHat. Bring
this feature to all. Again, for consistency.
It was difficult to see what this code was supposed to achieve. The
most important variable zabbix_server_dbhost_run_install was hidden
behind two conditions of delegated_dbhost. But here we try to bring
it back to view.

if zabbix_server_dbhost_run_install is true, we want the code to be
delegated_to the zabbix_server_dbhost, otherwise we want to run it
from the zabbix_server.

However, we have a funny situation where zabbix_server_dbhost is
by default localhost, which means we'd still be on the
zabbix_server. We store this calculation in delegated_dbhost.

  if (dbhost == localhost) then zabbix_server else dbhost

We also have an additional variable zabbix_server_real_dbhost,
which we try applying first and if not, we default to whatever
zabbix_server_dbhost_run_install wants.

We move the create db and create user into a block, so conditions
and vars can be shared across both.
It was difficult to see what this code was supposed to achieve. The
most important variable zabbix_server_dbhost_run_install was hidden
behind two conditions of delegated_dbhost. But here we try to bring
it back to view.

if zabbix_server_dbhost_run_install is true, we want the code to be
delegated_to the zabbix_server_dbhost, otherwise we want to run it
from the zabbix_server.

However, we have a funny situation where zabbix_server_dbhost is
by default localhost, which means we'd still be on the
zabbix_server. We store this calculation in delegated_dbhost.

  if (dbhost == localhost) then zabbix_server else dbhost
With a clear understanding of zabbix_server_dbhost_run_install, we
can now use it to determine whether or not we need to become the
postgres user, and rely upon the default(omit) construction for any
potentially provided or missing arguments to login_{user,pass,host}
We check for existing dbversion in a block, if the query fails, we
create the database in a rescue section. There are a few things to
note however.

The check and schema creation happens from zabbix_server. This is
to verify that our dbuser can access the dbname from zabbix_server
and has all the permissions needed (after creation) to alter tables
between zabbix version upgrades. (So we're not copying schema to
other places, running with escalated privileges anymore)

For the schema creation we actually need some extra privileges, so
we read what they currently are, set them to what we need, import
our schema, and revert the extra privileges back to their original
state. The reverting happens in the `always:` section, so if any
other task should fail in schema creation (`rescue:` section) we
always revert the privileges before failing.
We take advantage of the community.postgres modules, and do a query
for 'mandatory' in 'dbversion', as is done in the mysql tasks. If
the query fails, we're going to assume the database has not been
populated yet, and populate it in the rescue section.

The query and schema import happens from the zabbix server, it
serves as a good test of verifying permissions previously created.
It reflects more accurately what we're trying to achieve. Also
strip out zabbix_db_type_long, it serves no purpose anymore.
This simplifies the code by removing zabbix-5.0 quirks/workarounds.
@eb4x
Copy link
Collaborator Author

eb4x commented May 10, 2024

Just adding a note here as I took a look at cleaning up the selinux.yml tasks.

I think the inclusion of selinux.yml needs to move into main.yml after the zabbix packages are installed for some of the sebooleans to exist. I'll create a Vagrantfile and do some testing on vm's as the molecule containers won't test the selinux bits.

@eb4x
Copy link
Collaborator Author

eb4x commented May 12, 2024

I think the inclusion of selinux.yml needs to move into main.yml after the zabbix packages are installed for some of the sebooleans to exist.

I was completely wrong about this, the booleans are present on systems that have no zabbix software.

when: item.required | default(true)
loop:
- name: "{{ zabbix_server_package | default(_zabbix_server_package) }}"
- required: "{{ zabbix_server_version is version('6.0', '>=') }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK maybe you can help me understand what you're doing here. What is the required parameter doing? I don't actually see it being used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting used in the when condition above. (Though all this code has since been removed with the 5.0 removal)

@pyrodie18 pyrodie18 merged commit d5bfac2 into ansible-collections:main May 12, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants