Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit celery by excluding 5.3.2 and 5.3.3 #34031

Merged
merged 1 commit into from
Sep 2, 2023
Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Sep 2, 2023

There is a new database field introduced by Celery in 5.3.2 and repeated in 5.3.3 wihch is not included in automated migrations, so users upgrading celery might have failing celery installation.

The issue is already reported celery/celery#8470
and acknowledged, so it is lilely to be fixed in 5.3.4 - so excluding 5.3.2 and 5.3.3 is the best approach.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

There is a new database field introduced by Celery in 5.3.2 and
repeated in 5.3.3 wihch is not included in automated migrations,
so users upgrading celery might have failing celery installation.

The issue is already reported and acknowledged, so it is lilely
to be fixed in 5.3.4 - so excluding 5.3.2 and 5.3.4 is the best
approach.
@hussein-awala
Copy link
Member

Could you check the last sentence in the PR description?

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

Of course :)

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

Also - question - I understand that this is purely celery thing ? When celery uses the DB, they manage their own migrations? I looked in our migrations and I have not found any celery migrations we run :)?

@ephraimbuddy
Copy link
Contributor

ephraimbuddy commented Sep 2, 2023

Also - question - I understand that this is purely celery thing ? When celery uses the DB, they manage their own migrations? I looked in our migrations and I have not found any celery migrations we run :)?

In airflow, we run a query that somehow calls the new field and fails. It was reported in our internal testing of main yesterday.

scheduler Traceback (most recent call last):
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/cli/commands/scheduler_command.py", line 47, in _run_scheduler_job
scheduler     run_job(job=job_runner.job, execute_callable=job_runner._execute)
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/utils/session.py", line 79, in wrapper
scheduler     return func(*args, session=session, **kwargs)
scheduler            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/jobs/job.py", line 292, in run_job
scheduler     return execute_job(job, execute_callable=execute_callable)
scheduler            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/jobs/job.py", line 321, in execute_job
scheduler     ret = execute_callable()
scheduler           ^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 851, in _execute
scheduler     self._run_scheduler_loop()
scheduler   File "/usr/local/lib/python3.11/site-packages/astronomer/airflow/version_check/plugin.py", line 30, in run_before
scheduler     fn(*args, **kwargs)
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/jobs/scheduler_job_runner.py", line 987, in _run_scheduler_loop
scheduler     self.job.executor.heartbeat()
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/executors/base_executor.py", line 235, in heartbeat
scheduler     self.trigger_tasks(open_slots)
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/executors/base_executor.py", line 293, in trigger_tasks
scheduler     self._process_tasks(task_tuples)
scheduler   File "/usr/local/lib/python3.11/site-packages/airflow/providers/celery/executors/celery_executor.py", line 316, in _process_tasks
scheduler     self.update_task_state(key, result.state, getattr(result, "info", None))
scheduler                                 ^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/celery/result.py", line 502, in state
scheduler     return self._get_task_meta()['status']
scheduler            ^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/celery/result.py", line 441, in _get_task_meta
scheduler     return self._maybe_set_cache(self.backend.get_task_meta(self.id))
scheduler                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/celery/backends/base.py", line 608, in get_task_meta
scheduler     meta = self._get_task_meta_for(task_id)
scheduler            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/celery/backends/database/__init__.py", line 47, in _inner
scheduler     return fun(*args, **kwargs)
scheduler            ^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/celery/backends/database/__init__.py", line 152, in _get_task_meta_for
scheduler     task = list(session.query(self.task_cls).filter(self.task_cls.task_id == task_id))
scheduler            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2901, in __iter__
scheduler     result = self._iter()
scheduler              ^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
scheduler     result = self.session.execute(
scheduler              ^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
scheduler     result = conn._execute_20(statement, params or {}, execution_options)
scheduler              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
scheduler     return meth(self, args_10style, kwargs_10style, execution_options)
scheduler            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
scheduler     return connection._execute_clauseelement(
scheduler            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
scheduler     ret = self._execute_context(
scheduler           ^^^^^^^^^^^^^^^^^^^^^^
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
scheduler     self._handle_dbapi_exception(
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
scheduler     util.raise_(
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
scheduler     raise exception
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
scheduler     self.dialect.do_execute(
scheduler   File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
scheduler     cursor.execute(statement, parameters)
scheduler sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column celery_taskmeta.children does not exist
scheduler LINE 1: ..._taskmeta.traceback AS celery_taskmeta_traceback, celery_tas...

@hussein-awala
Copy link
Member

Also - question - I understand that this is purely celery thing ? When celery uses the DB, they manage their own migrations? I looked in our migrations and I have not found any celery migrations we run :)?

Yes, when user configures Celery to use SQLAlchemy as results backend, Celery will take care of table migration and schema. I think the bug is in their classes which try to use a field not created because of problem in the migration script.

However Celery supports other backends (Redis, RabbitMQ, custom result backend, ...), meaning users using Airflow with Celery with a different backend than SQLAlchemy will not be impacted by this bug.

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

Yeah. It's all the "celery" internal code. And yeah - not everyone is affected.

So I think the approach where we exclude only those two versions is a sound approach. I also asked if we can expect that 5.3.4 will have a fix - but I guess they have no choice but to fix it.

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

Yep. Looks like the exclusion works as expected:

Screenshot 2023-09-02 at 10 37 29

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

However Celery supports other backends (Redis, RabbitMQ, custom result backend, ...), meaning users using Airflow with Celery with a different backend than SQLAlchemy will not be impacted by this bug.

BTW. This is why our canary builds did not detect it - we use Redis as the only backend we test for integration. But maybe this is a good idea to split them to "celery-redis", "celery-sqlalchemy", "celery-rabbitmq". Then we would detect it a bit earlier.

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

Added task for it #34032

@potiuk potiuk merged commit b6318ff into apache:main Sep 2, 2023
64 checks passed
@potiuk potiuk deleted the limit-celery branch September 2, 2023 13:19
@Nusnus
Copy link

Nusnus commented Sep 2, 2023

Celery v5.3.2 & v5.3.3 were yanked and the breaking changes reverted in main.
A new stable patch version will be released soon.

Keep this in mind so you won't exclude a valid version with important bug fixes @potiuk

@hussein-awala
Copy link
Member

Celery v5.3.2 & v5.3.3 were yanked and the breaking changes reverted in main. A new stable patch version will be released soon.

Keep this in mind so you won't exclude a valid version with important bug fixes @potiuk

@Nusnus This new stable patch you're talking about will be released in v5.3.4, right?

@Nusnus
Copy link

Nusnus commented Sep 2, 2023

Celery v5.3.2 & v5.3.3 were yanked and the breaking changes reverted in main. A new stable patch version will be released soon.
Keep this in mind so you won't exclude a valid version with important bug fixes @potiuk

@Nusnus This new stable patch you're talking about will be released in v5.3.4, right?

We've completely removed the 5.3.2+ releases. We'll have a single 5.3.2 new release. Not 5.3.4.
That's why I commented on this issue to let you know guys that 5.3.2 does not exist anymore and once it does it will be without the reported breaking changes.

My attention right now is unfortunately going one by one on all of the changes from v5.3.1 and do my best to find out if there is anything else we missed. If we didn't miss anything else, the new release (5.3.2) should be out in the next 24-48h.

@hussein-awala
Copy link
Member

Celery v5.3.2 & v5.3.3 were yanked and the breaking changes reverted in main. A new stable patch version will be released soon.
Keep this in mind so you won't exclude a valid version with important bug fixes @potiuk

@Nusnus This new stable patch you're talking about will be released in v5.3.4, right?

We've completely removed the 5.3.2+ releases. We'll have a single 5.3.2 new release. Not 5.3.4. That's why I commented on this issue to let you know guys that 5.3.2 does not exist anymore and once it does it will be without the reported breaking changes.

My attention right now is unfortunately going one by one on all of the changes from v5.3.1 and do my best to find out if there is anything else we missed. If we didn't miss anything else, the new release (5.3.2) should be out in the next 24-48h.

Thank you @Nusnus for this information!

However, deleting a release is completely different from yanking it, and IMHO it's a bad practice. As far as I know, to yank a python release we just need to yank it in PyPI, and update the github release by adding (YANKED) to the description. I really don't know how you manage the releases in Celery, I will check if this procedure is described somewhere.

Again thank you for this comment!

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

I think you really can't event have another 5.3.2 release. PyPi will not let you upload another 5.3.2 even if you delete the previous one

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

@Nusnus look here pypa/packaging-problems#74

@Nusnus
Copy link

Nusnus commented Sep 2, 2023

@hussein-awala @potiuk

With every failure, we learn new lessons.

As the most recent Owner at Celery, I am still discovering what's under the hood and I've been investigating the release flow for some time.

I really don't know how you manage the releases in Celery, I will check if this procedure is described somewhere.

Unfortunately, the release flow is very lacking as I have discovered these past few days. As far as I am aware, there isn't an official updated procedure, which is why this latest release was messy.

I am discussing this exact subject with the rest of the Celery Owners team to improve on this flow as part of a general effort from 5.3.0+ to increase the frequency of new releases, mainly with patch/bugfixes releases (which 5.3.2 was supposed to be, fixing important bugs), so I am totally onboard with improving this procedure.

However, deleting a release is completely different from yanking it, and IMHO it's a bad practice. As far as I know, to yank a python release we just need to yank it in PyPI, and update the github release by adding (YANKED) to the description.

This is good feedback, I'll pass it on to our team to avoid such practices in the future. Thank you!

I think you really can't event have another 5.3.2 release. PyPi will not let you upload another 5.3.2 even if you delete the previous one

Are you sure? I need to confirm that. If correct, then 5.3.4 is inevitable.
This will cause a gap between 5.3.1 -> 5.3.4 but I guess that's the consequence of the removal instead of labeling of the versions.

@Nusnus
Copy link

Nusnus commented Sep 2, 2023

@Nusnus look here pypa/packaging-problems#74

Seems like the “solutions" there are even messier.

v5.3.4 then.
Lesson learned.

Thank you!

@potiuk
Copy link
Member Author

potiuk commented Sep 2, 2023

Lesson learned.

Been there, done that :)

ephraimbuddy pushed a commit that referenced this pull request Sep 3, 2023
There is a new database field introduced by Celery in 5.3.2 and
repeated in 5.3.3 wihch is not included in automated migrations,
so users upgrading celery might have failing celery installation.

The issue is already reported and acknowledged, so it is lilely
to be fixed in 5.3.4 - so excluding 5.3.2 and 5.3.4 is the best
approach.

(cherry picked from commit b6318ff)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants