ModelError: ERROR cannot ensure service account [...] Internal error occurred: resource quota evaluation timed out #415

DanielArndt · 2024-07-03T12:17:37Z

Bug Description

This error seems to come from k8s, and it looks like a resource (etcd? ceph?) is just taking too long to respond. This aligns with the ceph outage mentioned at the time.

To Reproduce

Unknown

Environment

#407 (comment)

this was in a live model on our prodstack. I believe there might have been an ongoing ceph outage at the time, so the entire controller was a bit sluggish to respond. To resolve the issue i had to redeploy vault once controller was back on track.

Relevant log output

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 1575, in <module>
    main(VaultCharm)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 548, in main
    manager.run()
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 527, in run
    self._emit()
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 516, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 408, in _configure
    self._configure_pki_secrets_engine()
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 485, in _configure_pki_secrets_engine
    vault = self._get_active_vault_client()
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 1377, in _get_active_vault_client
    role_id, secret_id = self._get_approle_auth_secret()
  File "/var/lib/juju/agents/unit-vault-0/charm/./src/charm.py", line 1252, in _get_approle_auth_secret
    juju_secret = self.model.get_secret(label=VAULT_CHARM_APPROLE_SECRET_LABEL)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/model.py", line 285, in get_secret
    content = self._backend.secret_get(id=id, label=label)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/model.py", line 3504, in secret_get
    result = self._run('secret-get', *args, return_output=True, use_json=True)
  File "/var/lib/juju/agents/unit-vault-0/charm/venv/ops/model.py", line 3141, in _run
    raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR cannot ensure service account "unit-vault-0": Internal error occurred: resource quota evaluation timed out

Additional context

We can catch the error here, but there are some implications. If we can't retrieve something because of an intermittent error, what do we set the status to? what if the calls are asymmetrical (we can retrieve in configure, but not in collect status, or vice-versa). Even more disruptively, what do we do when we're trying to store a secret and this happens? Since we don't use defers, we lose the context in which we were attempting to add/update this secret. I think this topic deserves a bigger discussion.

For now, I'll move forward with this and catch the error; I think we may be able to remove some dependence on secrets. For example, we store the CSR for PKI in 3 separate places -- vault, the relation data, and juju secrets. In the other cases, it should be fairly straight-forward to write the code such that subsequent calls will update the secret as expected, although there might be some inconsistency in the in-between time.

DanielArndt added the bug Something isn't working label Jul 3, 2024

DanielArndt mentioned this issue Jul 3, 2024

ModelError: ERROR invalid value [...] filesystem attachment [...] not provisioned #407

Closed

DanielArndt self-assigned this Jul 3, 2024

This was referenced Jul 22, 2024

refactor: Remove secret usage from Vault PKI implementation for CA CSR #434

Merged

fix: Handle secret retrieval/storage errors #417

Merged

DanielArndt closed this as completed in #417 Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelError: ERROR cannot ensure service account [...] Internal error occurred: resource quota evaluation timed out #415

ModelError: ERROR cannot ensure service account [...] Internal error occurred: resource quota evaluation timed out #415

DanielArndt commented Jul 3, 2024 •

edited

Loading

ModelError: ERROR cannot ensure service account [...] Internal error occurred: resource quota evaluation timed out #415

ModelError: ERROR cannot ensure service account [...] Internal error occurred: resource quota evaluation timed out #415

Comments

DanielArndt commented Jul 3, 2024 • edited Loading

Bug Description

To Reproduce

Environment

Relevant log output

Additional context

DanielArndt commented Jul 3, 2024 •

edited

Loading