Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4416] URI exists while re-creating secret with modified label #170

Merged
merged 6 commits into from
May 28, 2024

Conversation

juditnovak
Copy link
Contributor

@juditnovak juditnovak commented May 24, 2024

Issue

Bug reported in Jira ticket https://warthogs.atlassian.net/browse/DPE-4416

The root cause was identified as an issue within the logic implemented for rolling upgades. In particular on an occasion where the (peer) secret label may change (as it happened in v34).

Since Juju is not supporting label changes, whenever we need to change a label for a secret, we need to create a new secret (using the previous content) and associate this new secret with the new label.

(The underlying logic is more complicated than it sounds, as we need to maintain backwards compatibility as well, until the first write operation.)

This logic had a bug, revealing on this particular use case:

  • charm upgrade from libs < v34 to libs > v34
  • peer secret changes
  • peer secret changes for a second time, within the same event handler

At this point internal logic following the label switch was not maintained correctly. Namely the CachedSecret object was still associated with the label that --in reality-- we already have deteached from.

Solution

Cleaning up the corresponding internal field.

(Also, in a mid-term: refactor and cleanup of the data_interfaces module.)

data_interfaces had tests to verify that any new version is compatible with a charm using databag.
However we were missing to have a test module that verifies rolling upgrades from previous versions.

A new test module was added, that simulating upgrades from an arbitraty version.
This test is executed in 4 different pipelines, going back from the latest version of the lib down to latest - 4 simulating an upgrade from that version both for peer and cross-charm relations.

@juditnovak juditnovak force-pushed the DPE-4416_bugfix_uri_already_exists branch 3 times, most recently from 6996942 to fd22321 Compare May 27, 2024 05:44

# I wish we could just check if we are the owners of the secret...
try:
self._secret_meta = self.add_secret(content, label=self.label)
except ModelError as err:
if "this unit is not the leader" not in str(err):
raise
old_meta.remove_all_revisions()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this in a long term, but I rather leave "garbage" around for now, and bring this code back after having verified that under no circumstances we may remove this data pre-mature.

See issue on this matter #171

tox.ini Show resolved Hide resolved
@@ -632,8 +632,6 @@ def test_peer_relation_interface_backwards_compatible_legacy_label(self, interfa
secret2 = self.harness.model.get_secret(label=f"{PEER_RELATION_NAME}.database.{scope}")
assert secret2.id != secret_id
assert interface.fetch_my_relation_field(relation_id, "secret-field") == "blabla"
with pytest.raises(SecretNotFoundError):
secret = self.harness.model.get_secret(label=f"database.{scope}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled for now, to be re-enabled in #171

@juditnovak juditnovak marked this pull request as ready for review May 27, 2024 06:29
Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am giving LGTM to library changes here, as I have tested them on weekend and the original issue is no longer reproducible. Well done.

Re: tests... I simply cannot load 4*3.5K tests into my mind...
and this makes me crazy:

    cp tests/integration/data/data_interfaces.py.4 tests/integration/data/data_interfaces.py_old
    pytest -v --tb native --log-cli-level=INFO -s {posargs} {[vars]tests_path}/integration/test_rolling_upgrade_from_specific_version.py

Why do you commit files py.1-4 and copying them into py_old is unclear for me.... please add to PR description the black magic logic. Tnx!

Anyway LGTM for the library fix to unblock PG promotion to candidate for Landscape Team.

Copy link
Member

@delgod delgod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ticket number should be part of the PR description.
The PR description should clearly explain the problem and how the new code solves it.
At least one technical reviewer should be added.

@juditnovak juditnovak changed the title [BUGFIX] URI exists while re-creating secret with modified label [DPE-4416] URI exists while re-creating secret with modified label May 27, 2024
content = self._secret_meta.get_content()
self._secret_uri = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlying logic: Normally this helper field is to hold the URI of a secret (in case the secret was newly created, or fetched by URI).

At this point, we are switching the self (CachedSecret) object to point to a new secret object (that's associated with the new label).

Thus we have to "unlink" the object from the old URI. (Otherwise new secret creation is blocked: as the error in https://warthogs.atlassian.net/browse/DPE-4416 was indicating so.)


# I wish we could just check if we are the owners of the secret...
try:
self._secret_meta = self.add_secret(content, label=self.label)
except ModelError as err:
if "this unit is not the leader" not in str(err):
raise
old_meta.remove_all_revisions()
self.current_label = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is used to indicate when an upgrade was performed, where a secret label change may have happened.

We detect dynamically if a secret with an old label may be "hanging around", from an old version of the charm. In order to ensure smooth upgrades, we are leaving the secret associated with the old label, as long as only read operations are performed on the secret.
We hold the value of the outdated label in the CachedSecret.current_label field (while CachedSecret.label is always pointing to the label that is to be used in a long term.

When the above code is executed, we are at the point of the first write operation impacting this secret.
I.e. move to the new label (by having created a brand-new secret, that's recognized by the new label).
We need to clean the current_label field, as we have no link to the old (labelled) secret anymore.

@juditnovak juditnovak force-pushed the DPE-4416_bugfix_uri_already_exists branch from 7769493 to 3430095 Compare May 27, 2024 17:26
@juditnovak juditnovak requested a review from marceloneppel May 27, 2024 17:40
@juditnovak
Copy link
Contributor Author

@taurus-forever to answer your question.

We have a new test module, that performs a simulated rolling upgrade on test charms.

The initial PR (to deliver a quick yet reliable solution) took a local copy of older versions of the lib, and "rotated" them to a static location for the the test module to pick up.

As I got the opportunity to clean up the "brute-force" approach, now the tests are dynamically fetching corresponding latest - N versions from git on each run of the corresponding, new pipelines.

@juditnovak juditnovak force-pushed the DPE-4416_bugfix_uri_already_exists branch from 2eed416 to 9a30354 Compare May 27, 2024 19:46
@juditnovak juditnovak requested a review from marceloneppel May 28, 2024 07:08
Copy link
Member

@marceloneppel marceloneppel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@juditnovak juditnovak merged commit 7c80365 into main May 28, 2024
28 checks passed
@juditnovak juditnovak deleted the DPE-4416_bugfix_uri_already_exists branch May 28, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants