Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "Reason: Expired: too old resource version: 379140622 (380367990)" #23504

Closed
wants to merge 1 commit into from

Conversation

ecerulm
Copy link
Contributor

@ecerulm ecerulm commented May 5, 2022

The previous implementation relied on watch returning the events
sorted by resource_version which is not guaranteed at least in EKS.

So previously you could end up with KubernetesJobWatcher retrying watch from a resource_version that is not valid (too old already)

2022-05-05 08:41:40,522] {kubernetes_executor.py:126} INFO - Event: and now my watch begins starting at resource_version: 379140622
[2022-05-05 08:41:40,545] {kubernetes_executor.py:111} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 102, in run
    self.resource_version = self._run(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 145, in _run
    for event in list_worker_pods():
  File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 182, in stream
    raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 379140622 (380367990)

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels May 5, 2022
The previous implementation relied on watch returning the events
sorted by resource_version which is not guaranteed.
@ecerulm ecerulm force-pushed the resource_too_old branch from 677164f to 66f69ca Compare May 5, 2022 21:23
@ecerulm
Copy link
Contributor Author

ecerulm commented May 5, 2022

I think this probably

Closes #21087

@ecerulm
Copy link
Contributor Author

ecerulm commented May 6, 2022

Although this PR works in EKS it's not legal to assume that the resourceVersion are numeric or that they can be sorted in any meaningful way.

From Resource Version Semantics

You must not assume resource versions are numeric or collatable. API clients may only compare two resource versions for equality (this means that you must not compare resource versions for greater-than or less-than relationships).

@ecerulm ecerulm closed this May 6, 2022
@cansjt
Copy link

cansjt commented May 6, 2022

Arf too bad! Thanks for trying though.

Ain't that weird. Because then how can kubernetes tell you that a resource is too old then? Doesn't that implies some sort of order? Maybe it's just a matter of phrasing and what it means is that the resource just no longer exists (if it ever existed). But that's very missleading.

And then that means there are no other way than deal with the bookmark events to implement this correctly, right?

@ecerulm
Copy link
Contributor Author

ecerulm commented May 6, 2022

Maybe it's just a matter of phrasing and what it means is that the resource just no longer exists (if it ever existed). But that's very missleading.

Yes, I think it's misleading too. That's why I thought I could sort by resource version, but no, resource versions are just opaque ids.

And then that means there are no other way than deal with the bookmark events to implement this correctly, right?

I don't think the bookmark thing solves it either as stated:

but you shouldn't assume bookmarks are returned at any specific interval, nor can clients assume that the API server will send any BOOKMARK event even when requested

I have another PR #23521 which just resets to a fresh watch if there is any error. It will probably get some old (already processed) events but I don't think that breaks anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants