[DPE-4416] URI exists while re-creating secret with modified label #170

juditnovak · 2024-05-24T13:55:22Z

Issue

Bug reported in Jira ticket https://warthogs.atlassian.net/browse/DPE-4416

The root cause was identified as an issue within the logic implemented for rolling upgades. In particular on an occasion where the (peer) secret label may change (as it happened in v34).

Since Juju is not supporting label changes, whenever we need to change a label for a secret, we need to create a new secret (using the previous content) and associate this new secret with the new label.

(The underlying logic is more complicated than it sounds, as we need to maintain backwards compatibility as well, until the first write operation.)

This logic had a bug, revealing on this particular use case:

charm upgrade from libs < v34 to libs > v34
peer secret changes
peer secret changes for a second time, within the same event handler

At this point internal logic following the label switch was not maintained correctly. Namely the CachedSecret object was still associated with the label that --in reality-- we already have deteached from.

Solution

Cleaning up the corresponding internal field.

(Also, in a mid-term: refactor and cleanup of the data_interfaces module.)

data_interfaces had tests to verify that any new version is compatible with a charm using databag.
However we were missing to have a test module that verifies rolling upgrades from previous versions.

A new test module was added, that simulating upgrades from an arbitraty version.
This test is executed in 4 different pipelines, going back from the latest version of the lib down to latest - 4 simulating an upgrade from that version both for peer and cross-charm relations.

juditnovak · 2024-05-27T05:54:07Z

lib/charms/data_platform_libs/v0/data_interfaces.py


        # I wish we could just check if we are the owners of the secret...
        try:
            self._secret_meta = self.add_secret(content, label=self.label)
        except ModelError as err:
            if "this unit is not the leader" not in str(err):
                raise
-        old_meta.remove_all_revisions()


We need this in a long term, but I rather leave "garbage" around for now, and bring this code back after having verified that under no circumstances we may remove this data pre-mature.

See issue on this matter #171

tox.ini

juditnovak · 2024-05-27T06:17:56Z

tests/unit/test_data_interfaces.py

@@ -632,8 +632,6 @@ def test_peer_relation_interface_backwards_compatible_legacy_label(self, interfa
        secret2 = self.harness.model.get_secret(label=f"{PEER_RELATION_NAME}.database.{scope}")
        assert secret2.id != secret_id
        assert interface.fetch_my_relation_field(relation_id, "secret-field") == "blabla"
-        with pytest.raises(SecretNotFoundError):
-            secret = self.harness.model.get_secret(label=f"database.{scope}")


Disabled for now, to be re-enabled in #171

taurus-forever

I am giving LGTM to library changes here, as I have tested them on weekend and the original issue is no longer reproducible. Well done.

Re: tests... I simply cannot load 4*3.5K tests into my mind...
and this makes me crazy:

    cp tests/integration/data/data_interfaces.py.4 tests/integration/data/data_interfaces.py_old
    pytest -v --tb native --log-cli-level=INFO -s {posargs} {[vars]tests_path}/integration/test_rolling_upgrade_from_specific_version.py

Why do you commit files py.1-4 and copying them into py_old is unclear for me.... please add to PR description the black magic logic. Tnx!

Anyway LGTM for the library fix to unblock PG promotion to candidate for Landscape Team.

delgod

The ticket number should be part of the PR description.
The PR description should clearly explain the problem and how the new code solves it.
At least one technical reviewer should be added.

juditnovak · 2024-05-27T16:38:42Z

lib/charms/data_platform_libs/v0/data_interfaces.py

        content = self._secret_meta.get_content()
+        self._secret_uri = None


Underlying logic: Normally this helper field is to hold the URI of a secret (in case the secret was newly created, or fetched by URI).

At this point, we are switching the self (CachedSecret) object to point to a new secret object (that's associated with the new label).

Thus we have to "unlink" the object from the old URI. (Otherwise new secret creation is blocked: as the error in https://warthogs.atlassian.net/browse/DPE-4416 was indicating so.)

juditnovak · 2024-05-27T16:52:26Z

lib/charms/data_platform_libs/v0/data_interfaces.py


        # I wish we could just check if we are the owners of the secret...
        try:
            self._secret_meta = self.add_secret(content, label=self.label)
        except ModelError as err:
            if "this unit is not the leader" not in str(err):
                raise
-        old_meta.remove_all_revisions()
+        self.current_label = None


This field is used to indicate when an upgrade was performed, where a secret label change may have happened.

We detect dynamically if a secret with an old label may be "hanging around", from an old version of the charm. In order to ensure smooth upgrades, we are leaving the secret associated with the old label, as long as only read operations are performed on the secret.
We hold the value of the outdated label in the CachedSecret.current_label field (while CachedSecret.label is always pointing to the label that is to be used in a long term.

When the above code is executed, we are at the point of the first write operation impacting this secret.
I.e. move to the new label (by having created a brand-new secret, that's recognized by the new label).
We need to clean the current_label field, as we have no link to the old (labelled) secret anymore.

juditnovak · 2024-05-27T17:55:48Z

@taurus-forever to answer your question.

We have a new test module, that performs a simulated rolling upgrade on test charms.

The initial PR (to deliver a quick yet reliable solution) took a local copy of older versions of the lib, and "rotated" them to a static location for the the test module to pick up.

As I got the opportunity to clean up the "brute-force" approach, now the tests are dynamically fetching corresponding latest - N versions from git on each run of the corresponding, new pipelines.

tests/unit/test_upgrade.py

tox.ini

tests/integration/test_rolling_upgrade_from_specific_version.py

marceloneppel

LGTM!

juditnovak force-pushed the DPE-4416_bugfix_uri_already_exists branch 3 times, most recently from 6996942 to fd22321 Compare May 27, 2024 05:44

juditnovak commented May 27, 2024

View reviewed changes

tox.ini Show resolved Hide resolved

juditnovak requested review from delgod and taurus-forever May 27, 2024 05:58

juditnovak commented May 27, 2024

View reviewed changes

juditnovak marked this pull request as ready for review May 27, 2024 06:29

taurus-forever approved these changes May 27, 2024

View reviewed changes

delgod requested changes May 27, 2024

View reviewed changes

taurus-forever mentioned this pull request May 27, 2024

[DPE-4416] Fetch charm libs to the latest LIBPATCH (update dp-libs to v36 to fix secrets issue) canonical/postgresql-operator#475

Merged

juditnovak changed the title ~~[BUGFIX] URI exists while re-creating secret with modified label~~ [DPE-4416] URI exists while re-creating secret with modified label May 27, 2024

juditnovak commented May 27, 2024

View reviewed changes

juditnovak added 5 commits May 27, 2024 19:08

Logging format change, outside of the socpe of the PR

9e95539

codespell-required changes

fbad650

[BUGFIX] URI exists while re-creating secret with modified label

8f85434

Unittests

760933e

Integration tests

3430095

juditnovak force-pushed the DPE-4416_bugfix_uri_already_exists branch from 7769493 to 3430095 Compare May 27, 2024 17:26

juditnovak requested a review from marceloneppel May 27, 2024 17:40

delgod approved these changes May 27, 2024

View reviewed changes

marceloneppel reviewed May 27, 2024

View reviewed changes

Changes on PR request

9a30354

juditnovak force-pushed the DPE-4416_bugfix_uri_already_exists branch from 2eed416 to 9a30354 Compare May 27, 2024 19:46

juditnovak requested a review from marceloneppel May 28, 2024 07:08

marceloneppel approved these changes May 28, 2024

View reviewed changes

juditnovak merged commit 7c80365 into main May 28, 2024
28 checks passed

juditnovak deleted the DPE-4416_bugfix_uri_already_exists branch May 28, 2024 12:29

juditnovak mentioned this pull request Jun 5, 2024

[DPE-4530] CI testing both forward (Juju) and backwards (focal) #172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPE-4416] URI exists while re-creating secret with modified label #170

[DPE-4416] URI exists while re-creating secret with modified label #170

juditnovak commented May 24, 2024 •

edited

Loading

juditnovak May 27, 2024

juditnovak May 27, 2024

taurus-forever left a comment

delgod left a comment

juditnovak May 27, 2024

juditnovak May 27, 2024

juditnovak commented May 27, 2024

marceloneppel left a comment

		content = self._secret_meta.get_content()
		self._secret_uri = None

[DPE-4416] URI exists while re-creating secret with modified label #170

[DPE-4416] URI exists while re-creating secret with modified label #170

Conversation

juditnovak commented May 24, 2024 • edited Loading

Issue

Solution

juditnovak May 27, 2024

Choose a reason for hiding this comment

juditnovak May 27, 2024

Choose a reason for hiding this comment

taurus-forever left a comment

Choose a reason for hiding this comment

delgod left a comment

Choose a reason for hiding this comment

juditnovak May 27, 2024

Choose a reason for hiding this comment

juditnovak May 27, 2024

Choose a reason for hiding this comment

juditnovak commented May 27, 2024

marceloneppel left a comment

Choose a reason for hiding this comment

juditnovak commented May 24, 2024 •

edited

Loading