-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor zabbix server #1193
Refactor zabbix server #1193
Conversation
f0a805b
to
e00564c
Compare
e00564c
to
1940d48
Compare
So without having seen if molecule pushes through all of this (just hit the go button) I will say this. First, you did a ton of work here, so thank you for that. It is a big refactor. I will say some of the reason why tasks are broken out the way they are (in Debian vs main for example) most likely goes back to how things were put together < 2.0 when we were trying to accommodate pretty much every version of Zabbix known to man. We cleaned up a lot of it, but kept the same basic form-factor of task distribution. I'm not opposed to doing that, but if we are, I would only support it if it was a common refactor across all of the roles. I don't personally support bringing back 5.0. One if we're going to do it, we need to do it for all of the roles but more importantly our current stance is we only support fully supported versions of Zabbix. Yes it limits some people's use cases but the goal of the collection is not to provide 100% support for every use case, but to support the most common. In one of your PRs you made a comment about the user should know the risks of trying to run against an unsupported version. I agree that they should but when you look at some of the tickets that get created, its obvious that they don't which is why we installed the sanity check. My own personal thinking when I introduced that was if they understood the risks, then they would also be able to comment that task out easy enough to do whatever they were wanting to do. |
It is a huge effort made by you @eb4x , but I totally support @pyrodie18 with Zabbix 5.0 LTS, it is end of support since last year may (https://www.zabbix.com/life_cycle_and_release_policy). We can not keeping old versions like the beginning of the role when it was under my name, it takes a lot of effort to maintain them. |
Thank you for taking a look 😃
I just wanted to get this PR out the door to see how it would be received. If this is something we would go forward with, I'll do a refactor on the rest of the roles, and this PR can just sit and wait.
There was so much code in place to bring it back already, it was like 8 lines missing. With the amount of code-reduction in this refactor it was a fun and easy thing to add.
Can do 😸 !
Fair point 👍
Perhaps supply a flag to a when condition of such a task, so we can bypass it. Zabbix 7 is coming soon. I've already done some work to the plugins to support the renamed mappings, but only for the ones I needed. That codebase could use a good refactor aswell, but I'd need some guidance and hear some opinions or preferences for the code. |
Let me download this PR and look at it without the comparisons (its really hard to see simplification with all of the red lines but I'm sure its there and give you a better opinion later on today or this weekend.
I agree but then it becomes us going down this road that doesn't end. If I remember correctly there we comments made about us dropping support for 4.0 even though it was dead dead by that point and I'm sure the same argument could be made about only 9 lines (no idea but its a guess).
Ya I could support that |
One other comment I will make about bringing back 5.0 is it grows the testing matrix substantially to add another version. It already takes something like 2 hours to run through the full test matrix when we affect every role and plugin. Also we removed all of the 5.0 support from the modules which is definitely more than 9 lines. |
Well, atleast we can strip out 6.2 now right? But yes, no-one in their right mind should be deploying 5.0, so I'm fully behind not testing for it. Since the scripts had a check for whether to use create.sql or server.sql, I opted to complete that functionaliy instead of taking it away. There is very little else required, just some config-options. (How about we not tell anyone the roles can also deploy 5.0 again?)
Ah, didn't think about those. |
Ok, so I took a 32nd look at this code, and there is a flaw here. It was always there, but the ansible python database dependencies are installed on the zabbix_server even though the database creation tasks could be delegated to an external database, in which case the dependencies could be missing. This was a problem before the refactor, it remains a problem after. I'll drum up a fix for that. |
Removing support for a already working feature/version is a breaking change. When we release 3.0 (which we plan on doing with the release of Zabbix 7) we'll drop support for 6.2 and 6.4 as well as a few operating systems. |
dd71333
to
b3b1c82
Compare
During the refactoring of zabbix_proxy I noticed more variables ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally got a chance to actually look at what you did. Again, great work and I actually think you clean up some things nicely. I think I'm a fan of moving all of the install tasks out of the OS sections and putting them in the more relevant places and using the package module instead of yum/apt.
roles/zabbix_server/tasks/Debian.yml
Outdated
- name: "Debian | Installing lsb-release" | ||
ansible.builtin.apt: | ||
pkg: lsb-release | ||
update_cache: true | ||
cache_valid_time: 3600 | ||
force: true | ||
state: present | ||
environment: | ||
http_proxy: "{{ zabbix_http_proxy | default(None) | default(omit) }}" | ||
https_proxy: "{{ zabbix_https_proxy | default(None) | default(omit) }}" | ||
become: true | ||
tags: | ||
- install | ||
|
||
- name: "Debian | Update ansible_lsb fact" | ||
ansible.builtin.setup: | ||
gather_subset: | ||
- lsb | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably keep this in here since it ties back to the fix for #57. Again we don't officially support Raspbian (remember that slippery slope I mentioned) but removing it would make this a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't notice the raspbian in repo.zabbix.com. I'll modify so it works with raspbian 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code to work with Raspbian, and made a note of my findings in 5c6337a
roles/zabbix_server/tasks/Debian.yml
Outdated
# Since this is where Zabbix 5.0 installs its database schemas, we need to allow | ||
# files to be installed to /usr/share/doc/zabbix-server-* | ||
- name: "Debian | Allow Zabbix dpkg installs to /usr/share/doc/zabbix-server-*" | ||
ansible.builtin.lineinfile: | ||
path: /etc/dpkg/dpkg.cfg.d/excludes | ||
line: "path-include=/usr/share/doc/zabbix*" | ||
create: true | ||
line: 'path-include=/usr/share/doc/zabbix\-server\-*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completely honest I've never totally understood what this task did (and never taken the time to figure it out) but if I read your change correctly it only applies to Zabbix < 6.0 and running in a container. Because of the < 6.0 part I think this entire task can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I discovered a similar task in molecule/zabbix_proxy/prepare.yml, and I think I questioned (in this commit) whether or not this should be done there instead. I've since moved it there. But yeah, it should get removed eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3923029
when: zabbix_server_database_creation | ||
delegate_to: "{{ zabbix_server_real_dbhost | default(zabbix_server_dbhost_run_install | ternary(delegated_dbhost, inventory_hostname)) }}" | ||
vars: | ||
delegated_dbhost: "{{ (zabbix_server_dbhost == 'localhost') | ternary(inventory_hostname, zabbix_server_dbhost) }}" | ||
tags: | ||
- database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will admit this is me being OCD and if there's a best practice out there that I am unware about, please point me to it but everywhere else we put the when, delegate, tags, etc. options at the bottom of the task vs the top. I understand that this is a block module and that may well be why you did it (and in all honestly it makes sense) but my OCD caught it. Even as I type this I'm unsure if I actually think this is better and need to rethink my out practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, until I started using ansible-lint, I also always put the when
, delegate_to
, vars
and such at the bottom. ansible-lint had a pretty good argument for putting these things before block/rescue/always
. Since then (even though it was a bit jarring at first) I've started putting when
right after name
for many non-block tasks aswell. It forces you to write a better description for name
and you can immediately read the condition.
"Initialize database, when zabbix_server_database_creation
or
"Install packages, when item.required"
It doesn't feel right for every type of tast though. And again, I know it looks and feels weird at first, but if you give it a chance, it'll grow on you 😸
state: restore | ||
target: "{{ is_legacy_version | ternary(legacy_path, modern_path) }}" | ||
vars: | ||
is_legacy_version: "{{ zabbix_server_version is version('6.0', '<') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never noticed that this got left in here but would remove it given its only needed for < 6.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I meant when I said most of the code for deploying 5.0 was still there. I'd rather not break anything just yet. It's kinda fun for me refactoring, but maintaining all functionality.
There's already so much code trimmed, and the 5.0 tasks are easily identifiable.
Will eventually create a commit that strips it out though, don't worry :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the changes to support < 6.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3923029
Apologies for the hold up, been busy trying to get my own actions-runner working so I can get these results myself, as currently I'm currently just testing against a few select platforms and using images generated from Dockerfiles. I'm currently experimenting with running a kind kubernetes cluster, and the actions-runner-controller, and had some marginal success getting it to run atleast 😅 I'll get back to this once I have gotten repeatable tests. |
1e55259
to
2fa5fc6
Compare
Apologies for the hold-up. To borrow a sentence from @fasterthanlime
So there are the important changes since last;
(I'll create a separate PR to address some issues I had with scaling actions-runner) |
When they're the same for all versions of the platform, we can drop the lookup.
This variable doesn't make sense. It's the same for Debian and RedHat, and it's used to set a path for mysql or postgresql schemas, which we already know by the time we're in the respective tasks file.
There is some code here to handle legacy ways, but wasn't working in it's current form. This should make it work.
Some further diggig reveals that this only applies to Zabbix 5.0, which is going out of style, and further only on Ubuntu, and only if they are container images. This is better handled in molecule/zabbix_server/prepare.yml
Install the zabbix-server-{{ zabbix_server_database }} from a single common task. Sure, the repo setup is quite different based on os_family, but the package is practically the same, though there are some things to pay attention to. RedHat can pin the minor version, and has a toggle to disable repo. Debian can't pin minor version, but has install_recommends. We use the common package module, which is just a wrapper around apt/yum, and use this construction; user_supplied_var | default(_calculated_var | default(omit)) to send additional parameters to the respective modules. You can kinda read it like this; If the user wants to override something, use that. If it's not defined, try a calculated variable based on os_family. If there's no calculated var, that means the option is not valid for the package module, so omit this parameter. I think it's good practice to prefix variables, that are calculated or not meant for user-overrides with '_'.
We don't need additional tasks for installing zabbix-sql-scripts, we can just tack them on to zabbix-server-{{ database }}. It was probably introduced in 5.4. And since that's not maintained, but we still try to support 5.0, which hasn't got this package, we create a little when condition for it.
It's cleaner, less conditional checking. Remove PyMySQL installation via pip from molecule/prepare.yml and switch from mysqldb to pymysql for all debian based systems. The only reason we're installing any python/mysql dependencies at all is to use the ansible community.mysql collection, which has a preference for pymysql, and mysqldb as a backup [1]. Upgrade pymysql for ubuntu-18.04. Upgrading system-packages with pip is generally a bad idea, but ubuntu-18.04 comes with pymysql==0.8.0, which seems to have a problem with newer versions (>=8) of mysql. pymysql>=0.9.0,<0.10.0 fixes the issue with passwords missing during login, but still has a problem setting log_bin_trust_function_creators. pymysql>=0.10.0,<0.11.0, fixes the remaining issue. We don't want to stray to far off from the system default of 0.8.0 The RedHat playbook already had a variable for the dependencies, we give it a slighly more accurate name, and create a similar one for Debian. python2 has been sunset sometime back in January 2020, so we're gonna assume it's always python3. I'm not sure why there's a special case for debian-buster, where we opt for mariadb-client instead of default-mysql-client, as the latter package is available and seems to work. So we're dropping the special buster case. [1] https://github.com/ansible-collections/community.mysql/blob/main/plugins/module_utils/mysql.py#L21-L36
It's cleaner, less conditional checking. Similar to the mysql dependencies cleanup. python2 has been sunset sometime back in January 2020, so we're gonna assume it's always python3. There's a condition that checks for zabbix_server_database_creation before installing the python dependencies, but it's only checking this for RedHat, which leaves systems inconsistent, so we're gonna drop that conditional. The tags on this task are quite useful, and should be on the mysql dependencies aswell. Also, zabbix_server_install_database_client exists on the postgres side of things, but only for RedHat. Bring this feature to all. Again, for consistency.
It was difficult to see what this code was supposed to achieve. The most important variable zabbix_server_dbhost_run_install was hidden behind two conditions of delegated_dbhost. But here we try to bring it back to view. if zabbix_server_dbhost_run_install is true, we want the code to be delegated_to the zabbix_server_dbhost, otherwise we want to run it from the zabbix_server. However, we have a funny situation where zabbix_server_dbhost is by default localhost, which means we'd still be on the zabbix_server. We store this calculation in delegated_dbhost. if (dbhost == localhost) then zabbix_server else dbhost We also have an additional variable zabbix_server_real_dbhost, which we try applying first and if not, we default to whatever zabbix_server_dbhost_run_install wants. We move the create db and create user into a block, so conditions and vars can be shared across both.
It was difficult to see what this code was supposed to achieve. The most important variable zabbix_server_dbhost_run_install was hidden behind two conditions of delegated_dbhost. But here we try to bring it back to view. if zabbix_server_dbhost_run_install is true, we want the code to be delegated_to the zabbix_server_dbhost, otherwise we want to run it from the zabbix_server. However, we have a funny situation where zabbix_server_dbhost is by default localhost, which means we'd still be on the zabbix_server. We store this calculation in delegated_dbhost. if (dbhost == localhost) then zabbix_server else dbhost
With a clear understanding of zabbix_server_dbhost_run_install, we can now use it to determine whether or not we need to become the postgres user, and rely upon the default(omit) construction for any potentially provided or missing arguments to login_{user,pass,host}
We check for existing dbversion in a block, if the query fails, we create the database in a rescue section. There are a few things to note however. The check and schema creation happens from zabbix_server. This is to verify that our dbuser can access the dbname from zabbix_server and has all the permissions needed (after creation) to alter tables between zabbix version upgrades. (So we're not copying schema to other places, running with escalated privileges anymore) For the schema creation we actually need some extra privileges, so we read what they currently are, set them to what we need, import our schema, and revert the extra privileges back to their original state. The reverting happens in the `always:` section, so if any other task should fail in schema creation (`rescue:` section) we always revert the privileges before failing.
We take advantage of the community.postgres modules, and do a query for 'mandatory' in 'dbversion', as is done in the mysql tasks. If the query fails, we're going to assume the database has not been populated yet, and populate it in the rescue section. The query and schema import happens from the zabbix server, it serves as a good test of verifying permissions previously created.
It reflects more accurately what we're trying to achieve. Also strip out zabbix_db_type_long, it serves no purpose anymore.
This simplifies the code by removing zabbix-5.0 quirks/workarounds.
2fa5fc6
to
494a68f
Compare
Just adding a note here as I took a look at cleaning up the selinux.yml tasks. I think the inclusion of selinux.yml needs to move into main.yml after the zabbix packages are installed for some of the sebooleans to exist. I'll create a Vagrantfile and do some testing on vm's as the molecule containers won't test the selinux bits. |
I was completely wrong about this, the booleans are present on systems that have no zabbix software. |
roles/zabbix_server/tasks/main.yml
Outdated
when: item.required | default(true) | ||
loop: | ||
- name: "{{ zabbix_server_package | default(_zabbix_server_package) }}" | ||
- required: "{{ zabbix_server_version is version('6.0', '>=') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK maybe you can help me understand what you're doing here. What is the required parameter doing? I don't actually see it being used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting used in the when condition above. (Though all this code has since been removed with the 5.0 removal)
SUMMARY
This PR consists of a lot of iterative commits, they should all be (somewhat) self-explanatory, doing a "single self-contained" change and a rationale for it. If one of the changes are not to anyones liking, comment on it and we can either improve upon it or take it out.
This is a considerably big refactor of the zabbix_server role. I'm not expecting it to break for existing installations and whatever overrides have been provided. It's not intended to "fix" any issues, just clean up the tasks and make the experience somewhat more consistent. (Though it does bring back support for deploying Zabbix 5.0 LTS.)
The most "controversial" change might be stripping out the zabbix_valid_server_versions check.
Other decisions include moving package installation out of the os_family tasks, tasked with setting up the repo. But they did much more beyond that. So the installation happens in main.yml, after repo setup, and the dependencies for db-variants are installed in the mysql/postgresql task files, (instead of checking which dependencies are needed for which db-variant for which os, during repo setup)
Arguably the repo-setup should be it's own role, as it's repeated out for every single role.
ISSUE TYPE
COMPONENT NAME
zabbix_server
ADDITIONAL INFORMATION
I'd like for this refactor to retain all its commits and messages, and not become a squash merge. One thing I sorely missed during this refactor was the ability to find a log as to why things were done in a certain way.
I couldn't quite get molecule to work by itself, and to iterate and test multiple platforms I modified it a bit to have things happen in parallel. I also got a dedicated machine with multiple gh-runners set-up, but can't really explain why wasn't a great success with the existing workflow.
Also, I couldn't get the geerlingguy images to work, but I think the thing we want are simple systemd based images (alma and rocky provide these already) with python3 installed, which turns out being very simple Dockerfiles to create.
So maybe rely on upstream images instead of geerlingguy in the future?
You can check out these kinds of changes here; https://github.com/eb4x/community.zabbix/blob/molecule5/molecule/zabbix_server/molecule.yml