-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-4856][MISC] Port some test_self_healing
CI fixes + update check for invalid extra user credentials
#546
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #546 +/- ##
=======================================
Coverage 70.89% 70.89%
=======================================
Files 11 11
Lines 3024 3024
Branches 535 535
=======================================
Hits 2144 2144
Misses 764 764
Partials 116 116 ☔ View full report in Codecov by Sentry. |
test_self_healing
CI fixes + update check for invalid extra user credentials
@@ -259,7 +259,7 @@ def check_for_invalid_extra_user_roles(self, relation_id: int) -> bool: | |||
for data in relation.data.values(): | |||
extra_user_roles = data.get("extra-user-roles") | |||
if extra_user_roles is None: | |||
break | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
necessary to correctly set/keep the blocked status due to invalid extra user roles, see typo in test below.
@@ -563,7 +563,7 @@ async def test_invalid_extra_user_roles(ops_test: OpsTest): | |||
f"{DATABASE_APP_NAME}:database", f"{DATA_INTEGRATOR_APP_NAME}:postgresql" | |||
) | |||
await ops_test.model.wait_for_idle(apps=[DATABASE_APP_NAME]) | |||
ops_test.model.block_until( | |||
await ops_test.model.block_until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a missing await here, hiding a bug in the charm behavior :( thankfully @dragomirp caught this typo when looking at the logs when debugging the Nextcloud error.
@@ -583,29 +583,40 @@ async def test_invalid_extra_user_roles(ops_test: OpsTest): | |||
) | |||
|
|||
|
|||
@pytest.mark.group(1) | |||
@pytest.mark.group(2) | |||
@markers.amd64_only # nextcloud charm not available for arm64 | |||
async def test_nextcloud_db_blocked(ops_test: OpsTest, charm: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a new update to nextcloud charm, its deploy procedure became longer + the fast-forwarding of update status hook was causing the charm to never enter blocked status. To avoid making the charm too long, this test was moved to a different group.
@@ -261,6 +244,9 @@ async def test_full_cluster_restart( | |||
|
|||
# Change the loop wait setting to make Patroni wait more time before restarting PostgreSQL. | |||
initial_loop_wait = await get_patroni_setting(ops_test, "loop_wait") | |||
initial_ttl = await get_patroni_setting(ops_test, "ttl") | |||
# loop_wait parameter is limited by ttl value, thus we should increase it first | |||
await change_patroni_setting(ops_test, "ttl", 600, use_random_unit=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to increase ttl configuration before increasing loop_wait due to a constraint from Patroni, see the warning in https://patroni.readthedocs.io/en/latest/dynamic_configuration.html
@@ -145,8 +148,9 @@ async def test_storage_re_use(ops_test, continuous_writes): | |||
@pytest.mark.group(1) | |||
@pytest.mark.abort_on_fail | |||
@pytest.mark.parametrize("process", DB_PROCESSES) | |||
async def test_kill_db_process( | |||
ops_test: OpsTest, process: str, continuous_writes, primary_start_timeout | |||
@pytest.mark.parametrize("signal", ["SIGTERM", pytest.param("SIGKILL", marks=markers.juju2)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor akin to what was previously done on K8s.
@@ -642,15 +642,13 @@ async def get_primary(ops_test: OpsTest, app, down_unit: str = None) -> str: | |||
""" | |||
for unit in ops_test.model.applications[app].units: | |||
if unit.name != down_unit: | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this helper function to match the behavior on K8s part, running the action in each unit once, instead of retrying multiple times on the same unit.
await send_signal_to_process(ops_test, primary_name, process, signal) | ||
|
||
# Wait some time to elect a new primary. | ||
sleep(MEDIAN_ELECTION_TIME * 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the other sleeps added allow for time to a new primary to be elected, much like we do on K8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Advanced sleep() programming!
test_self_healing
on K8s, improving overall test stability (but still not stabilizing it completely)test_new_relations
+ adapting the logic of checking for invalid user roles to make the test pass (credits to @dragomirp )Issues encountered and follow-up tickets: