Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE - OVA #119

Open
teddytpc1 opened this issue Nov 20, 2024 · 15 comments
Open

SPIKE - OVA #119

teddytpc1 opened this issue Nov 20, 2024 · 15 comments
Assignees
Labels
level/task Task issue type/enhancement Enhancement issue

Comments

@teddytpc1
Copy link
Member

teddytpc1 commented Nov 20, 2024

Objective
https://github.com/wazuh/internal-devel-requests/issues/1319

Description

We need to analyze if we can improve/simplify the OVA generation. Aspects to analyze:

  • Time to build
  • Maintenance
  • Complexity
  • Configurations
  • Tools used to build

As a requirement, the OVA must not be built with the Installation assistant.

Additionally, we need to design and implement DevOps-owned OVA testing. Currently, there are no tests for the OVA. The goal is to create a GitHub Action (GHA) workflow that serves both as a PR check and an on-demand testing tool. The GHA should validate:

  1. The successful deployment of the OVA.
  2. The status of its components (services running as expected).
  3. Main logs, scanning for errors or warnings.

Implementation restrictions

  • Testing Environment: The tests must be implemented using GitHub Actions (GHA).
  • Compatibility: The workflow should be compatible with the environments used for PR testing and manual testing.
  • Logs Validation: The logs checking must identify and report critical issues (e.g., errors, warnings) in a clear and actionable way.
  • Minimal Maintenance: The implementation should aim for low complexity and minimal maintenance overhead.
  • Allocator module: the allocator module should be used if applicable.

Plan

  1. Research & Analysis
  • Review the current OVA generation process, identifying areas for improvement in terms of time, complexity, and maintenance.
  • Analyze existing tools and configurations to determine potential simplifications.
  • Analyze the impact of installing the components without using the Installation assistant and propose a new approach.
  1. Workflow Design
  • Define the key steps and criteria for validating OVA deployment.
  • Identify the components to monitor and logs to analyze for errors or warnings.
  • Analyze where the OVA can be tested (an environment that allows Virtualization).
@teddytpc1 teddytpc1 added level/task Task issue type/enhancement Enhancement issue labels Nov 20, 2024
@wazuhci wazuhci moved this to Triage in Release 5.0.0 Nov 20, 2024
@wazuhci wazuhci moved this from Triage to Backlog in Release 5.0.0 Nov 20, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 5.0.0 Nov 22, 2024
@CarlosALgit
Copy link
Member

Update Report

I've been reading what is the plan and documentating about how we are making the OVA right now.

@CarlosALgit
Copy link
Member

CarlosALgit commented Nov 25, 2024

Update Report

I've been taking a look to the worflow and process we have right now for the creation of the OVA.
The main task in which we are taking so much time is in exporting the AWS instance to a VM. We could try to change the OS base to Amazon Linux 2023 instead of Amazon Linux 2 in order to test if that reduces the export time.
The other task we invest more time is on installing Wazuh and as I am able to remember, AL2 had a problem with the YUM lock that I'm not sure if it's replicating in the OVA instance. So we can try to switch the OS ot the newer one.

@CarlosALgit
Copy link
Member

Update Report

@Enaraque and I have been working together on the possible of chaning the base operating system for OVA and AMI. In this case the goal is to use Amazon Linux 2023 as the base OS, looking for an improvement thanks to its continuous maintenance by AWS, thus allowing greater security.

The problem we have right now is that AWS does not allow to import or export instances with AL2023 as OS, so it is necessary to find a way to do it. For this, other alternatives have been investigated:

  • Create a base AMI with nested virtualization. This would allow to create a virtual machine inside the instance with AL2023 and create an OVA from this VM.
  • Create an OVA as in the previous option but using MacStadium to create the VM with AL2023.
  • Use a self-hosted runner in a GHA in which the OVA would be created and exported directly.

These three options would allow us to reduce the export time since we would not depend on the EC2 export process. Simply when the .ova is created it would be uploaded to the corresponding S3 bucket.

In addition, they would allow that the subsequent tests to be developed could be executed in a more efficient and simple way.

Once this research has been carried out, it would be necessary to carry out the necessary tests to see if it is possible and profitable to implement any of these options.

To create a VM it is necessary to first create an AL2023 Vagrant box. For this, we have been following and testing the scripts discussed in this issue.

@CarlosALgit
Copy link
Member

Update Report

Research and testing has continued in order to create the AL2023 VM on macStadium.

With the generate_base_box.sh and setup.sh scripts it has been possible to create a box with the necessary configuration to be able to install Wazuh in it. This box was then sent to the intel macStadium and the VM was successfully created and we were able to connect remotely and the VPN was active.

Important

All these tests have been done manually, the VM has not been created from the allocator. So an important part to implement would be to be able to deploy the al2023 VM in the macStadium.

Once this is achieved, we will continue with the configuration of the OVA and the export to see if it is completed correctly.

@CarlosALgit
Copy link
Member

Update Report

After having the AL2023 VM in the macStadium it was time to install the Wazuh core components and make the pertinent configurations. For this, I cloned the wazuh-installation-assistant and wazuh-virtual-machines repositories and launched the provision.sh script as it is done right now in the OVA creation workflow.

Once the script was finished, I cleaned the machine in the same way as it is done in the workflow and proceeded to export it as OVA.

Once exported I copied it to my local machine to later import it in VirtualBox. Finally, when importing it and turning on the machine we see several things.

  • The Wazuh logo that appears at startup in the official OVA now does not appear (this configuration is done in the provision.sh, I have to check what happened).
  • The network interface that was giving error yesterday and we managed to fix, seems to have been restarted and is back as before which does not make too much sense.

Investigating, I think it may be due to the cloud-init that AL2023 machines come with by default, so the next step will be to investigate if this is the reason for the network configuration being reset.

@CarlosALgit
Copy link
Member

Update Report

I've been trying to understand why the network interface is not being correctly deployed.

I tried to force the assignment of the network interface manually or with the Vagrantfile but it failed every time. I investigated if this could be coming because of the cloud-init but it does not seem to be the issue.

Also, we checked the VboxGuestAdditions because we saw an error in the execution. When creating the base box of AL2023, it has a problem because it takes the kernel from our local machines instead of the kernel of AL2023. We tried to give it a solution but we were unable to because it's a VBox process that we cannot control or edit.

Changing the perspective, I tried to launch the Vagrant box we deployed in the MacStadium in my local machine but we failed due to an unkown error. I investigated the error but the solutions I found did not work or have no sense in my case.

So, tomorrow I will try to generate the base box again and repeat the creation of the OVA in my local machine so we discard that creating it in the MacStadium is what is generating the issues.

@CarlosALgit
Copy link
Member

CarlosALgit commented Dec 3, 2024

Update Report

I generated again the box with Amazon Linux 2023 in it in my local machine.

Then I started it but using the configuration in the Vagrantfile that we use in the Allocator does not seem to work properly. It's not assigning the correct IP in the enp0s8 network interface. So, I did a research and after several tries with different Vagrantfiles I found a way in which it deploys the machine with the necessary IP.
This method consists in configuring the Vagrantfile to create the machine with an static IP. This forces VirtualBox to create the machine with the IPv4 IP. Otherwise, if we configure it to use DHCP when creating the machine, it always uses an IPv6 IP.

Anyway, I tried to continue the process by installing the Wazuh components in the machine with the static IPv4 configured to see if we could force the use of DHCP when exporting the OVA. I installed all the Wazuh central components and did the configurations. When exporting the OVA I could see that the enp0s8 network interface was down and no address was giveen to that interface.

So, as far as I have come I see that the process of creating our Vagrant box from the AL2023 OVA, then personalise it by installing Wazuh central components and do some configurations and finally exporting it as an OVA is giving us so many errors that are taking so much time to fix and in cases, we can't find the proper fix.

The following screenshot shows the initiation of the OVA we created:

imagen

As it can be seen there is an error that displays that VBox had timesync Error. This is suppossed to be fixed using this command in the Vagrantfile:

vb.customize ["setextradata", :id, "VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled", 1]

If we run systemctl status commands of the different Wazuh central components we can see that they are installed and running:

  • Wazuh Indexer

imagen

  • Wazuh Manager

imagen

  • Wazuh Dashboard

imagen

But if we check the network interfaces we see that the eth1 is DOWN:

imagen

Next workaround

I consider interesting to change and investigate if by changing the OS and using the EC2 export tool (as we are currently doing with Amazon Linux 2), decreases the creation times or makes it easier for us to implement the tests.

@CarlosALgit
Copy link
Member

Update Report

Today we've tried to make the OVA using the metal instances of AWS.
This metal instances allow nested virtualization so we can use VirtualBox and create a VM there as we did in the macStadium and my local machine.

First, we had to install VirtualBox, Vagrant and the scripts to generate and setup the Amazon Linux 2023 OVA. Then, we ran the scripts and once the OVA is generated we created a Vagrant box as we have been doing before.
Then, we deployed the VM using Vagrant. This process of deploying the machine has given us some problems as it may fail several times before creating the machine.
Finally, when deployed, we performed the installation of the Wazuh central components and did the configurations. And once exported the VM to OVA file and imported in my VirtualBox it gave the same results. It had no network interface attached.

Workaround

The workaround I have come is that we can try to add a network interface permanent in the system using systemd-network or some related tool. This is by default in Amazon Linux 2023 and I've been doing some tries with it. They have not worked yet.

Also, we have tried using the cloud-init to assign an IPv4 IP to the machine when it loads. We did a little test adding a customized cloud-init configuration file and then exported the OVA without the Wazuh components installed just to check if that configuration works. And it seems to be working as can be seen in the following screenshot:

imagen

So, after observing this behavior we can assume that this might work in the macStadium as the process have not changed in that way. The next step will be to try the deploy the full OVA with the Wazuh central components installed using the cloud-init config in the macStadium and check if it has the correct IPs and they're up and reachable.

@CarlosALgit
Copy link
Member

CarlosALgit commented Dec 5, 2024

Update Report

We've been working on the proposed solution yesterday and exported a new OVA that was built in the macStadium.
Unfortunately it did not work and it is giving the same result and not deploying the desired network interface.

Next steps

Yesterday, @Enaraque deployed the Wazuh 4.9.2 OVA and noticed that in the /etc/cloud/cloud.cfg.d/ directory there are two files that are created by AWS in the import process. The files are 99-disable-network-config.cfg and vmimport.99-disable-network-config.cfg

Then, the next paths we can try are two:

  1. We can try to copy those two files created by AWS in the import process and try to replicate the behavior.
  2. Try to deploy a network interface with the vboxmanage tool when we create the Amazon Linux 2023 OVA and then assign it an IP using DHCP.

@CarlosALgit
Copy link
Member

CarlosALgit commented Dec 11, 2024

Update Report

I've been testing with the vboxmanage settings in an already exported OVA with Wazuh that I exported while testing.

This machine didn't boot up with a reachable network interface as can be seen in the previous reports. Then, I halted the machine and added this configuration:

vboxmanage modifyvm "vm_name" --nic2 hostonly
vboxmanage modifyvm "vm_name" --hostonlyadapter2 vboxnet0
vboxmanage modifyvm "vm_name" --cableconnected2 on

Note

It might be possible that the vboxnet0 adapter is not created in the MacStadium for example as I have done this test in my local machine. So there is another vboxmanage command to create this but I haven't tried it yet.

And this worked so when I booted up the vm again it had the Wazuh central components installed and the IP was reachable. I could enter the Wazuh Dashboard via the web browser and it worked fine.

So, this is not the final test but it's a proof that using the vboxmanage commands work for creating a network interface.

Now, there are two possible ways of implementing this in the OVA creation process:

  1. Running this commands when creating the first box of Amazon Linux 2023 in our generate_base_box.sh script.
  2. Running them once the box is created and personalized (Wazuh installing and so on) and then exporting the box to an OVA.

Latest research

I've been reading and thinking about how we can add this to our script to be able to execute this on our MacStadium machines. The configuration I proposed uses the hostonly type of network interface and in the MacStadium we are not using that kind of network interface right now. Instead, we are using the bridged method that physically attaches the network card of the machine to the VM. I do not know well if this will suppose an issue but I have been researching and there's a way of creating a vbox network adapter in the MacStadium machine via commands:

vboxmanage hostonlyif create
vboxmanage hostonlyif ipconfig vboxnet0 --ip 192.168.56.1 --netmask 255.255.255.0
vboxmanage dhcpserver add --netname HostInterfaceNetworking-vboxnet0 \
    --ip 192.168.56.2 --netmask 255.255.255.0 \
    --lowerip 192.168.56.100 --upperip 192.168.56.200 --enable

This will create an hostonly network interface that has dhcp that can assign IPs from 192.168.56.100 to 192.168.56.200.

So, in conclusion, tomorrow I will test if the proposed solutions work in the MacStadium and doesn't break anything that we have working right now that is what I fear the most. 😟

@CarlosALgit
Copy link
Member

Update Report

I started the day by testing the first option of the two proposed yesterday, that is, running these VBox commands in the script we use to generate the AL2023 OVA that will later serve as the basis for installing the core components of Wazuh and from there making the OVA.

This option did not work so I opted to try the second option which is to use these commands once the VM where Wazuh is installed is stopped to proceed with the export as OVA.

Before trying this option I made several tests because the results I was getting were quite strange and so I made tests configuring inside the VM dhclient (not supported in AL2023), NetworkManager or nmcli (not supported in AL2023 either) and in the end I found the service used by AL2023 for the network configuration which is systemd. I had tried this option before but I think it didn't work because the tests I performed were not in a clean environment and there were other configurations that could be affecting this one.

The solution is to create a file /etc/systemd/network/20-eth1.network and adding inside the following configurations I checked that it raised the network interface that was available but was down and also assigned an IPv4 with DHCP.

Subsequently, I then tried combining the two solutions, i.e. using the VBox commands to add the network interface and also adding that file inside the VM and subsequently exporting the VM.

Once this was done, I copied the OVA to my machine and imported it into VirtualBox. Upon testing it, it worked correctly, the network interfaces are properly up and Wazuh Dashboard is accessible via the web browser as you can see in the screenshots below.

OVA in VirtualBox

Running ip a in the OVA

imagen

Acessing the Wazuh Dashboard via web browser

imagen

OVA in vmware

For the OVA test on vmware, the first time I imported the OVA it worked correctly, it deployed the network interfaces and Wazuh Dashboard was accessible via web browser. After that, I re-imported it again and have not been able to make the network interfaces accessible again. It should be noted that in between that first attempt and the others, my pc crashed due to exceeding the RAM limit. Also, I have tried importing the official 4.9.2 OVA and have not been able to get it to work either so I am starting to think that it may be the fault of either my version of vmware, or that something in vmware was broken in the crash. I will ask someone on my team to run the same tests to corroborate.

It is worth noting that in both cases (VBox and vmware) Wazuh Indexer took some time to get up but specifically in vmware it reached the service timeout and I had to restart it by changing the timeout time.

In addition, VBox error messages do not appear in vmware which confirms that these messages are from VirtualBox and not related to Wazuh or the OVA creation process. We have a related documentation here.

Next steps

So, the next things to do are to try the process in the MacStadium and to test the OVA in vmware in my team mates pc's to check if it's a problem of my machine.

@CarlosALgit
Copy link
Member

Update Report

This morning I have been testing with vmware again because yesterday it worked fine the first time and then my pc crashed due to RAM limit and then the OVA in vmware did not deploy the network interfaces right.

So, in the tests done this morning I could check that the OVA works fine in vmware as it can be seen in the following screenshots:

Running ip a in the OVA

Captura desde 2024-12-13 11-54-04

Acessing the Wazuh Dashboard via web browser

imagen

@CarlosALgit
Copy link
Member

Update Report

I have tried to follow the same steps using the macStadium Intel as base to create the OVA but when exported the OVA it did not have the Wazuh configuration and the network configration I did myself by hand. I still have the console logs opened of creating the needed file for the network configuration so I think something happened in the meanwhile.
It was probably my mistake while doing the steps by hand so I will retry it again next week and check.

@CarlosALgit
Copy link
Member

Update Report

Today, I retried the creation of the OVA using the macStadium as the base. I followed the steps carefully and made sure that at each step I got the expected response.

Finally, I tested the exported OVA in both VirtualBox and vmware and the results were as follows:

OVA in VirtualBox

Running ip a

Captura desde 2024-12-16 13-38-29

Accessing the Wazuh Dashboard via web browser

imagen

OVA in vmware

Running ip a

imagen

Accessing the Wazuh Dashboard via web browser

imagen

Ending

Finally, we see that it is possible to create the OVA using Amazon Linux 2023 as the base operating system.

The next task I will be working on is to explain as well as possible the steps to follow to create the OVA, thinking about the possible implications that each of them may have and that we should take into account. I will also try to compare with the option of migrating the creation process to a metal instance in AWS.

As a last note, it is worth mentioning that the option of doing it as before, using AWS EC2 services, will have to be considered because sooner or later AWS will introduce AL2023 as a supported operating system for exports.

@CarlosALgit
Copy link
Member

CarlosALgit commented Dec 16, 2024

Update Report

Description of the alternatives we have

  1. Using the macStadium as base
  2. Using a metal AWS EC2 instance
  3. Keep using the EC2 export tool and think about changing the base OS to one supported

For both first and second options, it would be needed to create a base Vagrant box with Amazon Linux 2023 as base operating system. This would cause us to host and maintain the scripts that creates that box. Then, that created box is the one we will use to create a virtual machine and install Wazuh in it.

1. Using the macStadium as base

We can use our Intel macStadium to perform the OVA building process.

The process would be something like this:

  1. Run the generate_ova.sh script to generate the box with AL2023.
  2. Run vagrant box add to add the newly generated box.
  3. Deploy the VM using vagrant up.
  4. Enter the VM using vagrant ssh.
  5. Clone the wazuh-installation-assistant repo.
  6. Clone the wazuh-virtual-machines repo.
  7. Move to the wazuh-installation-assistant repo and checkout to the desired tag.
  8. Build the wazuh-install.sh
  9. Move the wazuh-install.sh to the /tmp folder.
  10. Move to the wazuh-virtual-machines repo and checkout to the desired tag.
  11. Run the provision.sh script using dev and yes as parameters.
  12. Create the /etc/systemd/network/20-eth1.network file.
  13. Write the desired configuration in it.
  14. Restart the systemd-networkd service.
  15. Clean the files as it was done in the existing workflow.
  16. Exit the VM.
  17. Stop the VM.
  18. Run the vboxmanage modifyvm "vm_name" --nic2 hostonly command.
  19. Run the vboxmanage modifyvm "vm_name" --cableconnected2 on command.
  20. Export the OVA using the vboxmanage export "vm_name" --output "ova_name.ova"
  21. Upload the OVA file to S3.

Note

This is the general process I have been following to do the tests. This will require taking a look at it and refining the steps as necessary.

Here we would have to discuss if creating the Vagrant box each time we create a new OVA is needed. The only purpose of renewing the existent Vagrant box would be to update the version of Amazon Linux 2023. Skipping this step would make the process a lot faster. So, in my personal opinion, there are two options to consider here:

  1. Check the latest version of the uploaded OVA from AWS and compare it to the one we already have in the Vagrant box.
  2. Add a check in the workflow so we can choose if we want to recreate the box or use the one that we have.

I would go for the option two, because thinking in our release process, it would be rare for them to release a new AL2023 version between one of our stages. Eg: between alpha3 and beta1 theres a little amount of time. So, choosing this option I consider it would be nice to update the Vagrant box each time we change a minor of out product, eg: 4.10.1 to 4.10.2. Or something similar.

Summarizing, the pros and cons of using this option are described below:

Pros Cons
Low cost Requires maintenance of the scripts used
Easy to define tests Change the allocator logic to deploy the AL2023 base instance
Time savings

As can be seen the pros are that we no longer depend on AWS so that reduces both costs and time. Also, the definition and the completion of the tests should be easier because we can enter the instance and check that the configurations and the Wazuh installation is done.

On the other hand, we would have a needed maintenance and host of the scripts used for the base Vagrant box with Amazon Linux 2023. This can cause some problems if something in the process changes and stops working. Also, we will have to define if we will be using the Allocator to deploy the instance or if we are going to directly connect to our macStadium through the workflow using a VPN making it easier for us to run the commands. Anyway, we would need direct connection with the macStadium to run the final configurations using vboxmanage, exporting the OVA and uploading it to S3.

2. Using a metal AWS EC2 instance

The process of using a metal EC2 instance is the same as explained for the macStadium except some previous installations.

Here, we would also need to install VirtualBox and Vagrant in the instance. Also, we would have to copy the generate_ova.sh and setup.sh scripts. We would also need to copy the Vagrantfile.
From here the steps would be exactly the same as those described for macStadium. Also, the to-define tasks would be the same.

Pros Cons
Easy to define tests Requires maintenance of the scripts used
Time savings Elevated costs
No logic needed for the Allocator

The only thing we would need to change in the Allocator would be the addition of this type of instance in the os.yml file.

3. Keep using the EC2 export tool and think about changing the base OS to one supported

Finally, we can continue using the AWS EC2 tool to export an instance as OVA. At the moment, Amazon Linux 2023 is still not supported by AWS to export it. So here we have some options:

  1. Maintain the process as we do right now, using AL2 as base OS.
  2. Change the OS to a supported one.
  3. Wait for AWS to support AL2023.

1. Maintain the process as we do right now, using AL2 as base OS

Pros Cons
We know it works From 2025 summer on AL2 lacks of support
No need of maintain Harder to define tests because the export process is not ours
No logic needed for the Allocator Much time needed (and it varies)

This option is the safer one, as we know it works but the process of creating the OVA is slow and the time may vary from one run to another and we can't do anything there.
There is no need of maintaning nothing because we delegate this to AWS. At the same time, this makes it difficult to define tests since the process is not transparent to us.
And the lack of support from 2025 summer onwards in Amazon Linux 2.

2. Change the OS to a supported one.

Pros Cons
LTS supported OS Extra job of investigation required
No need of maintain Harder to define tests because the export process is not ours
No logic needed for the Allocator Much time needed (and it varies)
We wil have to align with the AMI

Changing the base OS would require an extra job of checking that everything works as we intend to do for the user that downloads the Wazuh OVA.

Also, it would require us to align the OS with the Wazuh AMI. This would also cause an extra job of testing that the AMI works fine in other OS.

3. Wait for AWS to support AL2023.

Pros Cons
LTS supported OS Harder to define tests because the export process is not ours
No need of maintain Much time needed (and it varies)
No logic needed for the Allocator Not available

This would have been the preferred option for us because the process would be the same, we would not have to change how the workflows work, we would only have to add the tests.

Unfortunately, this is still not available.

Testing

For the testing task the tests to be done would be similar to the ones that we do in the Installation Assistant test.
There, we test that the Wazuh Indexer, Server and Dashboard services are up. Also, we check the logs filtering by errors or warnings and we compare if these errors are already known or new.

We could use Python to implement those tests, running the commands in the OVA machine and processing the output to know if all checks passed or there are new errors in the logs or some service is not working. And then, if some errors are found, we could upload an artifact to the workflow run with the logs that contains errors.

@wazuhci wazuhci moved this from In progress to Pending review in Release 5.0.0 Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/enhancement Enhancement issue
Projects
Status: Pending review
Development

No branches or pull requests

2 participants