Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4856][MISC] Port some test_self_healing CI fixes + update check for invalid extra user credentials #546

Merged
merged 23 commits into from
Aug 1, 2024

Conversation

lucasgameiroborges
Copy link
Member

@lucasgameiroborges lucasgameiroborges commented Jul 25, 2024

  • Port some of the CI tweaks made on test_self_healing on K8s, improving overall test stability (but still not stabilizing it completely)
  • Fixes a typo on test_new_relations + adapting the logic of checking for invalid user roles to make the test pass (credits to @dragomirp )
  • Other minor tweaks to improve test stability in general, inspired by the progress made on K8s repo

Issues encountered and follow-up tickets:

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.89%. Comparing base (2c0f64b) to head (10a96e5).
Report is 4 commits behind head on main.

Files Patch % Lines
src/relations/postgresql_provider.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #546   +/-   ##
=======================================
  Coverage   70.89%   70.89%           
=======================================
  Files          11       11           
  Lines        3024     3024           
  Branches      535      535           
=======================================
  Hits         2144     2144           
  Misses        764      764           
  Partials      116      116           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lucasgameiroborges lucasgameiroborges marked this pull request as ready for review July 31, 2024 16:33
@lucasgameiroborges lucasgameiroborges changed the title Port CI fixes [DPE-4856][MISC] Port some test_self_healing CI fixes + update check for invalid extra user credentials Jul 31, 2024
@@ -259,7 +259,7 @@ def check_for_invalid_extra_user_roles(self, relation_id: int) -> bool:
for data in relation.data.values():
extra_user_roles = data.get("extra-user-roles")
if extra_user_roles is None:
break
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary to correctly set/keep the blocked status due to invalid extra user roles, see typo in test below.

@@ -563,7 +563,7 @@ async def test_invalid_extra_user_roles(ops_test: OpsTest):
f"{DATABASE_APP_NAME}:database", f"{DATA_INTEGRATOR_APP_NAME}:postgresql"
)
await ops_test.model.wait_for_idle(apps=[DATABASE_APP_NAME])
ops_test.model.block_until(
await ops_test.model.block_until(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a missing await here, hiding a bug in the charm behavior :( thankfully @dragomirp caught this typo when looking at the logs when debugging the Nextcloud error.

@@ -583,29 +583,40 @@ async def test_invalid_extra_user_roles(ops_test: OpsTest):
)


@pytest.mark.group(1)
@pytest.mark.group(2)
@markers.amd64_only # nextcloud charm not available for arm64
async def test_nextcloud_db_blocked(ops_test: OpsTest, charm: str) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a new update to nextcloud charm, its deploy procedure became longer + the fast-forwarding of update status hook was causing the charm to never enter blocked status. To avoid making the charm too long, this test was moved to a different group.

@@ -261,6 +244,9 @@ async def test_full_cluster_restart(

# Change the loop wait setting to make Patroni wait more time before restarting PostgreSQL.
initial_loop_wait = await get_patroni_setting(ops_test, "loop_wait")
initial_ttl = await get_patroni_setting(ops_test, "ttl")
# loop_wait parameter is limited by ttl value, thus we should increase it first
await change_patroni_setting(ops_test, "ttl", 600, use_random_unit=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to increase ttl configuration before increasing loop_wait due to a constraint from Patroni, see the warning in https://patroni.readthedocs.io/en/latest/dynamic_configuration.html

@@ -145,8 +148,9 @@ async def test_storage_re_use(ops_test, continuous_writes):
@pytest.mark.group(1)
@pytest.mark.abort_on_fail
@pytest.mark.parametrize("process", DB_PROCESSES)
async def test_kill_db_process(
ops_test: OpsTest, process: str, continuous_writes, primary_start_timeout
@pytest.mark.parametrize("signal", ["SIGTERM", pytest.param("SIGKILL", marks=markers.juju2)])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor akin to what was previously done on K8s.

@@ -642,15 +642,13 @@ async def get_primary(ops_test: OpsTest, app, down_unit: str = None) -> str:
"""
for unit in ops_test.model.applications[app].units:
if unit.name != down_unit:
break
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this helper function to match the behavior on K8s part, running the action in each unit once, instead of retrying multiple times on the same unit.

await send_signal_to_process(ops_test, primary_name, process, signal)

# Wait some time to elect a new primary.
sleep(MEDIAN_ELECTION_TIME * 6)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the other sleeps added allow for time to a new primary to be elected, much like we do on K8s

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advanced sleep() programming!

@lucasgameiroborges lucasgameiroborges merged commit 57eb505 into main Aug 1, 2024
83 checks passed
@lucasgameiroborges lucasgameiroborges deleted the lucas/port-ci-fixes branch August 1, 2024 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants