-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/prevent concurrency with cached venv #35258
Bugfix/prevent concurrency with cached venv #35258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Looks good. Let me also rebase my #35160 - the new improved test harness runs all the Venv tests using caching so it has a chance to detect similar problems.
UPDATE: https://github.com/apache/airflow/actions/runs/6686368999 is the run of the new test harness rebased on top of this one :).
Yep. Not needed any more, agree. |
BTW. @jens-scheffler-bosch - first test suites are getting green in #35160, so I confirm it fixes the original problem too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jens-scheffler-bosch Just curious, what other mechanish we have in place to debug the failed run?
Same here. I assume we have a general contratiction / conflict between "ensure that all mess is cleaned up so that no temp traces are left behind so that worker is not running out of disk space after some time" and "I want to debug, keep all the logs please". I think the best is first hand to clean up so that worker can run stable over a long time, otherwise if we miss details of traces we need to add (e.g. by cat to stdout) to put all problems into airflow task stdout so that the (whateverr logging facility of the worker) archives the logs. |
Yeah - the log is the way to pass it to the place it will be visible (airflow log). And it's not much different from what we have for any other operator - we basically only see the logs + exception thrown. |
This PR reverts PR #35252 as the fix made in this PR just fixed one side effect.
Unfortunately when initially the PR #33355 was created I did not see that all temporary scripts and in/out serialized values are written into the tmp folder of the venv and thus pollute the venv. As the vench in the cached case might be used by multiple task slots in parallel, temporary files in a cached venc really generate a concurrency issue - more than just a termination log.
This PR:
If there is a general demand to keep the termination log after a failed run, we need to find a mechanism on top of the tempfolder so that it is not deleted as cleanup.
Note, the DIFF is much smaller if you ignore the whitespace.