Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModelError: ERROR cannot ensure service account [...] Internal error occurred: resource quota evaluation timed out #415

Closed
DanielArndt opened this issue Jul 3, 2024 · 0 comments · Fixed by #417
Assignees
Labels
bug Something isn't working

Comments

@DanielArndt
Copy link
Member

DanielArndt commented Jul 3, 2024

Reported by @alesstimec

Bug Description

This error seems to come from k8s, and it looks like a resource (etcd? ceph?) is just taking too long to respond. This aligns with the ceph outage mentioned at the time.

To Reproduce

Unknown

Environment

#407 (comment)

this was in a live model on our prodstack. I believe there might have been an ongoing ceph outage at the time, so the entire controller was a bit sluggish to respond. To resolve the issue i had to redeploy vault once controller was back on track.

Relevant log output

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 1575, in <module>
    main(VaultCharm)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 548, in main
    manager.run()
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 527, in run
    self._emit()
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 516, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 408, in _configure
    self._configure_pki_secrets_engine()
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 485, in _configure_pki_secrets_engine
    vault = self._get_active_vault_client()
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 1377, in _get_active_vault_client
    role_id, secret_id = self._get_approle_auth_secret()
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 1252, in _get_approle_auth_secret
    juju_secret = self.model.get_secret(label=VAULT_CHARM_APPROLE_SECRET_LABEL)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/model.py", line 285, in get_secret
    content = self._backend.secret_get(id=id, label=label)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/model.py", line 3504, in secret_get
    result = self._run('secret-get', *args, return_output=True, use_json=True)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/model.py", line 3141, in _run
    raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR cannot ensure service account "unit-vault-0": Internal error occurred: resource quota evaluation timed out

Additional context

We can catch the error here, but there are some implications. If we can't retrieve something because of an intermittent error, what do we set the status to? what if the calls are asymmetrical (we can retrieve in configure, but not in collect status, or vice-versa). Even more disruptively, what do we do when we're trying to store a secret and this happens? Since we don't use defers, we lose the context in which we were attempting to add/update this secret. I think this topic deserves a bigger discussion.

For now, I'll move forward with this and catch the error; I think we may be able to remove some dependence on secrets. For example, we store the CSR for PKI in 3 separate places -- vault, the relation data, and juju secrets. In the other cases, it should be fairly straight-forward to write the code such that subsequent calls will update the secret as expected, although there might be some inconsistency in the in-between time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant