Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'CeleryKubernetesExecutor' object has no attribute '_task_event_logs' #41891

Closed
1 of 2 tasks
hadarsharon opened this issue Aug 30, 2024 · 15 comments · Fixed by #41904 or #41906
Closed
1 of 2 tasks

'CeleryKubernetesExecutor' object has no attribute '_task_event_logs' #41891

hadarsharon opened this issue Aug 30, 2024 · 15 comments · Fixed by #41904 or #41906
Labels
area:core kind:bug This is a clearly a bug

Comments

@hadarsharon
Copy link

Apache Airflow version

2.10.0

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When running tasks using the CeleryKubernetesExecutor on the latest Airflow version (2.10.0), a lot of these messages pop up in the Airflow Scheduler logs:

[2024-08-30T13:10:15.553+0000] {scheduler_job_runner.py:1141} ERROR - Something went wrong when trying to save task event logs.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py", line 1139, in _run_scheduler_loop
    self._process_task_event_logs(executor._task_event_logs, session)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'CeleryKubernetesExecutor' object has no attribute '_task_event_logs'

What you think should happen instead?

Prior to upgrading Airflow from 2.9.3 to 2.10.0, this was not an issue. I suspect that the introduction of Hybrid Executors, combined with CeleryKubernetesExecutor becoming somewhat redundant, is causing this issue.

I understand that with hybrid executors, this executor is bound to be deprecated at some point, but this could be considered a violation of backwards compatibility for those who are still using it.

How to reproduce

  1. Deploy Airflow 2.10.0 with CeleryKubernetesExecutor
  2. Run a task and observe the scheduler logs
  3. This error should pop up when the scheduler tries to save task event logs

Operating System

Debian GNU/Linux 12 (bookworm)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.27.0
apache-airflow-providers-atlassian-jira==2.7.0
apache-airflow-providers-celery==3.8.1
apache-airflow-providers-cncf-kubernetes==8.3.4
apache-airflow-providers-common-compat==1.1.0
apache-airflow-providers-common-io==1.4.0
apache-airflow-providers-common-sql==1.15.0
apache-airflow-providers-docker==3.12.3
apache-airflow-providers-elasticsearch==5.4.2
apache-airflow-providers-fab==1.2.2
apache-airflow-providers-ftp==3.10.1
apache-airflow-providers-google==10.21.1
apache-airflow-providers-grpc==3.5.2
apache-airflow-providers-hashicorp==3.7.1
apache-airflow-providers-http==4.12.0
apache-airflow-providers-imap==3.6.1
apache-airflow-providers-jenkins==3.7.0
apache-airflow-providers-microsoft-azure==10.3.0
apache-airflow-providers-mysql==5.6.3
apache-airflow-providers-odbc==4.6.3
apache-airflow-providers-openlineage==1.10.0
apache-airflow-providers-postgres==5.11.3
apache-airflow-providers-redis==3.7.1
apache-airflow-providers-salesforce==5.8.0
apache-airflow-providers-sendgrid==3.5.1
apache-airflow-providers-sftp==4.10.3
apache-airflow-providers-slack==8.8.0
apache-airflow-providers-smtp==1.7.1
apache-airflow-providers-snowflake==5.6.1
apache-airflow-providers-sqlite==3.8.2
apache-airflow-providers-ssh==3.12.0
apache-airflow-providers-tableau==4.6.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Kubernetes deployment on AWS EKS using Airflow Helm Chart 1.15.0 with CeleryKubernetesExecutor

Anything else?

This happens pretty much every time a task runs as long as you are using the CeleryKubernetesExecutor

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hadarsharon hadarsharon added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Aug 30, 2024
Copy link

boring-cyborg bot commented Aug 30, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@potiuk
Copy link
Member

potiuk commented Aug 30, 2024

FYI: @o-nikolas - looks similar to #41525

@potiuk potiuk removed the needs-triage label for new issues that we didn't triage yet label Aug 30, 2024
@o-nikolas
Copy link
Contributor

@potiuk This one is not related to the new hybrid/multiple executor config changes. It's actually related to the recent changes to remove the task context logger in favour of the log table #40867 (CC @dstandish)

What is similar is that it's another issue with these old HybridExecutors not using the base executor interface directly so every time you modify it you must update these by hand, which just got missed in the context logger work. Let's deprecate these thigns as soon as we feel comfortable doing.

@potiuk
Copy link
Member

potiuk commented Aug 30, 2024

I see. I guess a discussion and proposal for that should happen - do you want to open it ? I guess we have some arguments and evidences already that we can use. We could also potentially add some tests (but not sure how) to see if there are some breaking changes introduced.

I am not even sure if that change (@dstandish ?) is generally compatible with previous versions of executors (on the phone now) - I see that there were some 'potentially internal' changes but they really impacted the executor interface and the way it is used - strange that some other executors were modified in this change in some providers but the 'old hybrid's weren't so we likely have a bit complex incompatibility scenarios here.

@o-nikolas
Copy link
Contributor

I guess a discussion and proposal for that should happen - do you want to open it ? I guess we have some arguments and evidences already that we can use

I'm happy to open it, but I was hoping to keep the new feature experimental until at least 2.11. So that folks using the new multiple executor configuration can find bugs and we can fix them to ensure it's working well and is stable before removing experimental. We're kind of stuck in the middle as of now. Do you think we should do it sooner and stabilize the new multiple executor config as we go?

I am not even sure if that change (@dstandish ?) is generally compatible with previous versions of executors (on the phone now) - I see that there were some 'potentially internal' changes but they really impacted the executor interface and the way it is used

The compat is a good point I'll leave @dstandish to comment. Executors from older versions of provider will definitely not have the new _task_event_logs property so when run with the new Airflow you'll see the above exception. Nothing will fail critically since the attempt to read that property is within a try/catch, but it will be incredibly spammy as it will try that over and over and log an exception each time it looks like.

strange that some other executors were modified in this change in some providers but the 'old hybrid's weren't so we likely have a bit complex incompatibility scenarios here.

The executors that were modified were the ones that were using the old task context logger, they were updated to use this new logger. Not all executors used the logger which is why only some were updated.

@dstandish
Copy link
Contributor

I think the problem is that CKE does not inherit from BaseExecutor, but that code assumes that it does.

@dstandish
Copy link
Contributor

Executors from older versions of provider will definitely not have the new _task_event_logs property

They will if they inherit from BaseExecutor, which they all should. The fact that we provide executors that don't, is I think a mistake. But anyway, this was simply a miss. The solution would be to make CKE inherit from base executor.

@dstandish
Copy link
Contributor

I will PR the inheritance fix

@o-nikolas
Copy link
Contributor

Yeah, the hybrid executors never have inherited from the base executor, since what would it mean for the wrapper class to have _task_event_logs property for example? What you really care about is that property of both of the contained executors/classes combined. So the old hybrids often have implementations of the base exec functions that combine the values from the two contained classes.
Any who, the whole thing is a mess and we should deprecate them any way, so I'd be happy with any level of fix for this 👍

@dstandish
Copy link
Contributor

yeah _task_event_logs is just a deque that is there so that the executor can pass messages for the scheduler to consume. it's there because the executor does not have access to the session object (somewhat surprisingly!) so it cannot write log events. so when the executor notices something that has to be logged, it dumps to the queue and the scheduler consumes the queue as part of executor processing. so it's not really a problem for hybrid executors. but anyway, i am making progress on making them inherit. nice thing about doing this is this can be fixed for customers with a provider release. but we could also look at patching the scheduler in 2.10.x 🤷

@dstandish
Copy link
Contributor

Alright, let's see how #41904 does in CI

@dstandish
Copy link
Contributor

so it's not really a problem for hybrid executors

well i guess it kindof is kindof is a problem since each of the executors would write to the queue on their individual instances, oy, yeah let's 🪓 these hybrid executors ... sigh...

@o-nikolas
Copy link
Contributor

well i guess it kindof is kindof is a problem since each of the executors would write to the queue on their individual instances, oy, yeah let's 🪓 these hybrid executors ... sigh...

Yeah, this is what I was trying to get at, with these old hybrids, you're constantly trying to let the individual executors do their individual thing, then you have to merge the results, state, logs, or whatever you're working with in the parent hybrid exec. It's awful

@potiuk
Copy link
Member

potiuk commented Sep 3, 2024

Yeah, this is what I was trying to get at, with these old hybrids, you're constantly trying to let the individual executors do their individual thing, then you have to merge the results, state, logs, or whatever you're working with in the parent hybrid exec. It's awful

It is :). Unfortunately - as it will all our things we release, once it's out it will take quite some time to 🪓 it...

@hadarsharon
Copy link
Author

I just checked and can confirm that this issue is resolved in v2.10.1.rc1

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
4 participants