Fix unstable tests #6763

kozlovsky · 2022-02-09T15:32:55Z

I discovered the reason for unstable tests. It was caused by an interaction between one broken fixture, one broken test, and the aiohttp async plugin. The fix is simple, but the actual sequence of events that led to the error was amazing.

Initially, there was no error on my local machine, as the error is unstable and caused by a specific order of tests. I added the following pytest option to my local configuration to reproduce the error: --randomly-seed=1. It created a stable pseudo-random test sequence that failed stably on my machine, and then I could debug the reason for the error.

The failed test looked pretty innocent, and the error does not seem to be related to the test code:

async def test_get_private_key_filename(tribler_config):
    private_key_file_name = KeyComponent.get_private_key_filename(tribler_config)
    tribler_config.general.testnet = True
    testnet_private_key_file_name = KeyComponent.get_private_key_filename(tribler_config)
    assert private_key_file_name != testnet_private_key_file_name

The error was:

..\..\..\venv\lib\site-packages\aiohttp\pytest_plugin.py:138: RuntimeError
RuntimeError: 1 Runtime Warning,
c:\dev\tribler\venv\lib\site-packages\aiohttp\test_utils.py:536:coroutine 'interval_runner' was never awaited

As it turns out, the reason for the error was a completely unrelated fixture:

@pytest.fixture
def tag_rules_processor():
    return TagRulesProcessor(...)

This is what happened:

TagRulesProcessor inherits from TaskManager, which __init__ creates a task _check_tasks. We need to properly shutdown classes that inherit from the TaskManager so all their tasks are properly finished, including the _check_tasks task. So, the first part of the problem was skipping the shutdown_task_manager() call in the finalization part of the fixture.

The tests in which the fixture is used are not async, and the unhandled task remained in the memory.

Then test_get_private_key_filename was executed if the order of tests were unlucky. The function was declared as async, but @pytest.mark.asyncio decorator was not applied to the test, which caused the function to be treated as an async test by the aiohttp plugin that we use in REST API tests.

The aiohttp plugin, when executing a test function, do two additional things: first, it runs garbage collection at the end of the test to b sure that all unfinished tasks created in the function generate warnings. Second, it converts warnings to RuntimeError to ensure that they will not be missed accidentally.

The garbage collection triggered the warning from the task created in a previous test, and the warning became converted to RuntimeError, leading to the failed test.

As a fix, I:

Removed async from the test functions that does not execute any async code, so the aiohttp plugin no longer tries to handle these tests;
Added the shutdown method to TagRulesProcessor;
Converted tag_rules_processor fixture code to a generator to be able to call processor.shutdown() at the end of the fixture.

In the future, we probably need to add some safety checks to be sure a coroutine created in one test cannot cause an error in the following test during the garbage collection phase. We can add a plugin that triggers garbage collection at the end of each test as an option. It may slow down test execution, but it should make tests more stable.