Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research implementation of dependency and parallel tests in the current QA framework #4588

Closed
Tracked by #4369
Rebits opened this issue Oct 6, 2023 · 12 comments
Closed
Tracked by #4369
Assignees

Comments

@Rebits
Copy link
Member

Rebits commented Oct 6, 2023

Description

In the context of the new System tests for Vulnerability Detector module, it is imperative that we explore efficient ways to implement parallel testing within our testing framework.

This necessity arises from our plan to launch these new tests across multiple agents. Given the considerable time required for the proposed test cases, it is crucial to execute parallel tests for each agent, thereby significantly reducing both the costs and time associated with the complete testing suite.

@wazuhci wazuhci moved this to Backlog in Release 4.8.0 Oct 6, 2023
@Rebits Rebits self-assigned this Oct 9, 2023
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.8.0 Oct 9, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 9, 2023

On hold in favor of wazuh/wazuh#19446

@wazuhci wazuhci moved this from In progress to On hold in Release 4.8.0 Oct 9, 2023
@wazuhci wazuhci moved this from On hold to In progress in Release 4.8.0 Oct 10, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 10, 2023

On hold in favor of #4597

@wazuhci wazuhci moved this from In progress to On hold in Release 4.8.0 Oct 10, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 10, 2023

Researched about pytest-xdist and pytest-dependency.
Created some pytest hooks for a PoC

@Rebits
Copy link
Member Author

Rebits commented Oct 10, 2023

On hold in favor of wazuh/wazuh#19477

@wazuhci wazuhci moved this from On hold to In progress in Release 4.8.0 Oct 11, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 11, 2023

Description

In the context of the new System tests for Vulnerability Detector module, it is crucial that we explore efficient methods for implementing parallel testing within our testing framework. This necessity arises from our plan to execute these new tests across multiple agents. Given the considerable time required for the proposed test cases, it is imperative to conduct parallel tests for each agent, significantly reducing both the costs and time associated with the entire testing suite.

Dependency & Parallelization

Dependency

Tests should ideally be independent; however, due to the product's requirements, most high-level tests cannot be fully isolated. In the case of Vulnerability Detector tests, the repetitive installation and uninstallation of packages would incur a high cost in terms of time and resources. Therefore, we propose minimizing these operations as much as possible. We have devised the following sequential workflow.

To implement this new approach in our environment, we research the possibility of using the pytest-dependency plugin. This plugin helps manage test dependencies and is suitable for our case, where we want to skip tests if any of the preceding tests fail.

import pytest

@pytest.mark.dependency()
def test_one():
    assert True

@pytest.mark.dependency(depends=["test_one"])
def test_two():
    assert True

@pytest.mark.dependency(depends=["test_one", "test_two"])
def test_three():
    assert True

However, it's important to note that test run parallelization, using pytest-xdist, is incompatible with pytest-dependency, as indicated in the documentation here.

Example of use

➜  dependency python3 -m pytest test_example.py                                          
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.10.6, pytest-7.1.2, pluggy-1.2.0
rootdir: /home/rebits/Projects/Pytest/1/dependency
plugins: html-3.1.1, variables-3.0.0, dependency-0.5.1, testinfra-5.0.0, json-report-1.5.0, metadata-2.0.1, asyncio-0.20.3, docgen-1.3.0, xdist-3.3.1
asyncio: mode=strict
collected 3 items                                                                                                                                                                            

test_example.py ...                                                                                                                                                                    [100%]

===================================================================================== 3 passed in 0.02s ======================================================================================
➜  dependency  
➜  dependency python3 -m pytest test_example.py -k test_two
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.10.6, pytest-7.1.2, pluggy-1.2.0
rootdir: /home/rebits/Projects/Pytest/1/dependency
plugins: html-3.1.1, variables-3.0.0, dependency-0.5.1, testinfra-5.0.0, json-report-1.5.0, metadata-2.0.1, asyncio-0.20.3, docgen-1.3.0, xdist-3.3.1
asyncio: mode=strict
collected 3 items / 2 deselected / 1 selected                                                                                                                                                

test_example.py s                                                                                                                                                                      [100%]

============================================================================== 1 skipped, 2 deselected in 0.01s ==============================================================================

Based on the information provided, this approach may not be suitable for our use case. We should consider one of the following alternatives:

  • Full Parallel Approach: Instead of minimizing remote operations, we can implement them fully in parallel. Test cases should be structured similarly to:
def test_example(install_package, remove_package):
    ...

def test_example_2(install_package, update_package, remove_package):
    ...
  • Install and Remove Packages as Fixtures: Implement package installation and removal as fixtures for a module or class. Test bodies should only ensure that the alert appears.

  • Combine Multiple Tests in a Single Test Case: Collapse multiple tests into a single test case following an approach similar to:

def test_example():
    check_condition1()
    check_condition2()

Parallelization

Operation Parallelization

Mostly implemented by the new framework, this approach involves not running multiple test cases simultaneously. Instead, we can perform multiple remote operations (log monitoring, service restarts, etc.) across all hosts in the environment simultaneously. This can be implemented concurrently with test parallelization. An example of this approach would be:

class WazuhEnv:
    ...

    def restart_agents(self, agent_list=None, parallel=True):
        """Restart a list of agents.

        Args:
            agent_list (list, optional): Agent list. Defaults to None.
            parallel (bool, optional): Parallel execution. Defaults to True.
        """

        self.logger.info(f'Restarting agents: {agent_list}')

        if parallel:
            agent_restart_tasks = self.pool.map(self.restart_agent, agent_list)
        else:
            for agent in agent_list:
                self.restart_agent(agent)

        self.logger.info(f'Agents restarted successfully: {agent_list}')

...

env = WazuhEnv('inventory')
env.restart_agents(agent_list=['agent1', 'agent2'])

Tests Parallelization

Infrastructure Decoupling

It is desirable for tests not to be hardcoded to a single environment but to be dynamically launched based on the provided inventory. This is beneficial for our product's nature because it allows for more intensive testing on various operating systems during certain testing phases. This is viable because many test cases remain consistent across different versions of the same operating system (e.g., generating a syslog alert is the same for all versions of Windows Server). Additionally, this approach facilitates the integration of new environments into the testing process.

This can be easily implemented using the current approach, as we obtain the inventory as a parameter and can launch tests dynamically:

def pytest_addoption(parser):
    parser.addoption("--inventory", action="store", default=None, help="Inventory")

@pytest.fixture(scope="session")
def environment(pytestconfig):
    return WazuhEnvironmentHandler(pytestconfig.getoption("inventory"))

def test_e2e(environment):
    # Here we can perform operations in the environment based on the data provided by the environment

However, there is an issue with parameterization. We need to parallelize tests for each agent, which means generating multiple test cases based on the provided inventory. After some investigation, we found a way to achieve this:

def pytest_generate_tests(metafunc):
    hosts = get_hosts_from_inventory(metafunc.config.option.inventory, 'agent')
    metafunc.parametrize("agents", hosts, scope='session')

This hook will parameterize each test for each agent, resulting in test cases like:

  • test_example[agent1_macOS]
  • test_example[agent2_windows]
  • test_example[agent3_centOS]

Pytest-Xdist

We can implement parallelization using the pytest-xdist plugin here. This plugin provides an easy way to parallelize multiple tests:

def test_addition():
    assert 1 + 2 == 3

def test_subtraction():
    assert 4 - 2 == 2

def test_multiplication():
    assert 3 * 5 == 15

def test_division():
    assert 8 / 4 == 2

To run these tests in parallel, you can use the following command:

python3 -m pytest -v test_para.py -n auto
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.10.6, pytest-7.1.2, pluggy-1.2.0 -- /usr/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.6', 'Platform': 'Linux-6.2.6-76060206-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '7.1.2', 'py': '1.10.0', 'pluggy': '1.2.0'}, 'Plugins': {'html': '3.1.1', 'variables': '3.0.0', 'dependency': '0.5.1', 'testinfra': '5.0.0', 'json-report': '1.5.0', 'metadata': '2.0.1', 'asyncio': '0.20.3', 'docgen': '1.3.0', 'xdist': '3.3.1'}}
rootdir: /home/rebits/Projects/Pytest/1/dependency
plugins: html-3.1.1, variables-3.0.0, dependency-0.5.1, testinfra-5.0.0, json-report-1.5.0, metadata-2.0.1, asyncio-0.20.3, docgen-1.3.0, xdist-3.3.1
asyncio: mode=strict
4 workers [4 items]     
scheduling tests via LoadScheduling

test_para.py::test_multiplication 
test_para.py::test_subtraction 
test_para.py::test_division 
test_para.py::test_addition 
[gw1] [ 25%] PASSED test_para.py::test_division 
[gw0] [ 50%] PASSED test_para.py::test_multiplication 
[gw3] [ 75%] PASSED test_para.py::test_subtraction 
[gw2] [100%] PASSED test_para.py::test_addition 

===================================================================================== 4 passed in 1.28s 

However, the default behavior of pytest-xdist is to run all tests in parallel, which might make the framework less maintainable. To address this, we propose creating two categories of tests using a marker, e.g., parallel. When running tests, all parallel tests will be executed with the recommended number of workers, followed by sequential tests. This approach will maintain test structure and manage complexity effectively.

import pytest

@pytest.mark.parallel
def test_vuln_package(configure_parallel_tests, agents, environment):
    WazuhEnv.install_package(packages_vul[agent])
    ...

@pytest.mark.parallel
def test_vuln_package2(configure_parallel_tests, agents, environment):
    WazuhEnv.install_package(packages_vul[agent])

# Sequential test
def test_vuln_package3(environment):
    ...

You can then run parallel tests using the marker:

python3 -m pytest -v test_examples --inventory=../Wazuh_QA_environment305_testing_inventory.yaml -m parallel -n auto
python3 -m pytest -v test_examples --inventory=../Wazuh_QA_environment305_testing_inventory.yaml -m "not parallel" 

This approach helps maintain the structure and manage the complexity of your test suite effectively.

@Rebits
Copy link
Member Author

Rebits commented Oct 11, 2023

Based on the analysis presented in this comment, our testing strategy will involve the utilization of pytest-xdist for parallelization, implementing the hooks as specified. This approach allows us to eliminate the need for dependent tests in the case of VD tests, instead opting for tests that, despite repeating operations, will employ a parallel structure for more efficient execution.

@wazuhci wazuhci moved this from In progress to Pending review in Release 4.8.0 Oct 11, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 16, 2023

To move forward with the proposed approach, we must develop a launcher comprising a Python script. This script will handle the separate launch of sequential and parallel tests, taking charge of environment setup and appropriate cleanup based on the specific test type being executed.

@Rebits
Copy link
Member Author

Rebits commented Oct 16, 2023

Developer branch
4588-parallel-launcher

Currently working on cleanup and configuration functions of the launcher

@wazuhci wazuhci moved this from Pending review to In progress in Release 4.8.0 Oct 16, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 17, 2023

I have been in a meeting with @Deblintrake09 regarding the possible options of parallelization.

The current approach detailed in this comment would imply to increase the number of vulnerable packages to use. In addition it would be difficult to handle the requirements of each tests case.

For this reason, we are currently researching the possibility to launch the tests cases parallel and dependent.

For now we have achieved parallelization encapsulated for each agent.

In addition we have started to work in the launcher script, which is planned to be used in order to launch parallel and sequential tests correctly. In addition this launcher will be the responsible of the configuration and cleaning of the environment

@Rebits
Copy link
Member Author

Rebits commented Oct 18, 2023

We have research possible packages to follow the parallel approach

Note
Currently, there is only packages for centOS endpoint

@wazuhci wazuhci moved this from In progress to On hold in Release 4.8.0 Oct 20, 2023
@wazuhci wazuhci moved this from On hold to In progress in Release 4.8.0 Oct 23, 2023
@Rebits
Copy link
Member Author

Rebits commented Oct 23, 2023

Perform a little presentation in order to present current status of the development and possible approaches

@Rebits
Copy link
Member Author

Rebits commented Oct 24, 2023

After a meeting with @davidjiglesias we have confirm that the approach to be followed for VD testing is going to be sequential using parallel operations among all the host of the environment.
Closing this issue as not applied for the development.

@Rebits Rebits closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2023
@wazuhci wazuhci moved this from In progress to Done in Release 4.8.0 Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant