Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline installation workflows use release packages instead of pre-release packages #161

Open
Enaraque opened this issue Dec 16, 2024 · 5 comments
Assignees
Labels
level/task Task issue type/bug Bug issue

Comments

@Enaraque
Copy link
Member

Description

While working on #160, it was found that the offline installation workflow uses the release packages and not the pre-release packages, making it unable to run on versions that are still under development.

function download_resources() {
check_file "${ABSOLUTE_PATH}"/wazuh-install.sh
bash "${ABSOLUTE_PATH}"/wazuh-install.sh -dw "${sys_type}"

We must add the -d option in the above command and check it works properly.

@Enaraque Enaraque added level/task Task issue type/bug Bug issue labels Dec 16, 2024
@Enaraque Enaraque self-assigned this Dec 17, 2024
@Enaraque
Copy link
Member Author

Update report

A new input has been added to the workflow dispatch so that we can select between pre-release and staging. In addition, an environment variable has been created so that, if the workflow is being executed from a PR, the pre-release repository is selected by default and, if it has been executed manually, the variable selected in the input is used.

For now this PR has been opened with the changes made so far in order to test the workflow when triggered by a PR.

Testing 🧪

Several tests are being done but the workflow fails in different occasions. Most likely it is not due to the change made in this issue, but more testing needs to be done to find out for sure where the error is coming from.

@Enaraque
Copy link
Member Author

Update report

The issue with the job corresponding to Debian packages has been successfully resolved. However, regarding the CentOS job, since it runs inside a Docker container and requires Wazuh installations, it is likely that it will not function correctly. Further investigation into a solution is ongoing.

A change that should be considered is to run these tests on two separate EC2 machines so that both tests are conducted under equal conditions. This approach would eliminate the need to execute the CentOS job in a Docker container, which might yield biased results due to being run in a container instead of on a dedicated machine.

@Enaraque
Copy link
Member Author

Update report

The workflow has been redefined to use the allocator to test the offline installation. It will allow selecting the machines that can be used to execute the offline installation tests.

Additionally, the various scripts used during testing have been fixed.

Important

These scripts have been tested on different operating systems. The only step remaining is to test them during the workflow execution to confirm their functionality.

This PR has been opened with the changes made so far in order to test the workflow when triggered by a PR.

@fcaffieri
Copy link
Member

fcaffieri commented Dec 23, 2024

Update report

Errors found with the new implementation:

  • Cannot assume role:

Image

Fixed 🟢: The problem was adding JWT permission to the workflow.

  • Cannot clone repository;

Image

Fixed 🟢: This problem was due to a validation assigned a tag or branch that did not exist.

  • Problem with Allocator aws-account was provided and not needed
    Fixed 🟢

  • Error executing playbooks:

Image

Fixed 🟢: Modify the Ansible installation to fix the problem. It occurs because Ansible is trying to format the stdout output using the YAML callback plugin, but it is unable to locate the required module to support this plugin.

  • Remove workflow steps to generate and upload ZIP artifacts, not needed.

  • Added a new step into workflow, for editing the Security Group with no internet access.

Image

$ aws ec2 modify-instance-attribute --instance-id i-0e741b1b928d7a18d --groups sg-03c53339089a65829 --profile wazuh-qa
$

Image

We need to add permission to the role, for editing SG group.

An error occurred (UnauthorizedOperation) when calling the ModifyInstanceAttribute operation: You are not authorized to perform this operation. User: arn:aws:xxx::xxxxxx:xxxxxxxxxxxxxxxx/oidc-xxxxx-xxxx-xxxxx-xxxx/GitHubActions is not authorized to perform: ec2:ModifyInstanceAttribute on resource

@fcaffieri
Copy link
Member

fcaffieri commented Dec 24, 2024

Update report

After granting the necessary roles to change the Security Group (SG) to offline mode, an error was detected during the installation process:

ASYNC POLL on ec2-3-80-172-229.compute-1.amazonaws.com: jid=j132987319944.11512 started=1 finished=0
ASYNC FAILED on ec2-3-80-172-229.compute-1.amazonaws.com: jid=j132987319944.11512
fatal: [ec2-3-80-172-229.compute-1.amazonaws.com]: FAILED! => changed=false 
  ansible_job_id: j132987319944.11512
  child_pid: 11517
  finished: 1
  msg: Timeout exceeded
  results_file: /root/.ansible_async/j132987319944.11512
  started: 1
  stderr: ''
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP *********************************************************************
ec2-3-80-172-229.compute-1.amazonaws.com : ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

Error: Process completed with exit code 2.

This error occurs because the instances lack internet connectivity, preventing them from executing or downloading dependencies.
Upon analyzing the offline test process, we determined that refactoring these tests is necessary to resolve the situation.

Current Process:

The Workflow executes the following steps for each instance specified in the inputs parameter:

  • Check out the wazuh-installation-assistant repository
  • Clones the wazuh-automation repository (required for the allocator)
  • Sets COMPOSITE_NAME for the allocator
  • Install dependencies as necessary for the entire process
  • Executes the allocator to launch the instance
  • Installs and configures Ansible
  • Executes the provision playbook to install Wazuh dependencies
  • Obtains the instance_id of the generated instance and modifies the SG to offline mode (new steps added for genuine offline testing)
  • Executes the offline-installation.yaml playbook
  • Delete the launched instances

After resolving the workflow-related issues, we discovered that the tests cannot be executed when the SG is modified to offline mode due to dependency and package download requirements.
The test executes these steps:

  • check_system: Determines if the system is RPM or DEB
  • install_dependencies: Installs dependencies (openssl, initscripts, etc.)
  • download_resources: Downloads installation resources
  • indexer_installation: Installs and tests Wazuh indexer
  • manager_installation: Installs and tests Wazuh manager
  • filebeat_installation: Installs and tests Filebeat
  • dashboard_installation: Installs and tests the Wazuh dashboard

Two major issues have been identified:

  1. The process continues to install the remaining dependencies, requiring internet connectivity. The dependency installation is fragmented across the workflow, provision, and installation test. Centralizing these installations before removing the internet connection would require modifications to all playbooks.

  2. Wazuh component installation resources download within the same playbook as the tests. Separating these functions would require refactoring this playbook.

In conclusion, while we resolved the workflow issues and improved the steps, making the test fully functional would require extensive refactoring of all playbooks and the workflow structure. Given that this test will be deprecated in upcoming Wazuh versions, investing such significant effort in these modifications is not justified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/bug Bug issue
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants