-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move setgid as the first command executed in forked task runner #20040
Merged
potiuk
merged 1 commit into
apache:main
from
potiuk:fix-failing-test-start-and-terminate-test
Dec 4, 2021
Merged
Move setgid as the first command executed in forked task runner #20040
potiuk
merged 1 commit into
apache:main
from
potiuk:fix-failing-test-start-and-terminate-test
Dec 4, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boring-cyborg
bot
added
the
area:Scheduler
including HA (high availability) scheduler
label
Dec 4, 2021
potiuk
requested review from
ashb,
kaxil,
uranusjr,
jmcarp,
dstandish and
jedcunningham
December 4, 2021 20:28
potiuk
force-pushed
the
fix-failing-test-start-and-terminate-test
branch
3 times, most recently
from
December 4, 2021 20:54
5ef4449
to
b7b8735
Compare
The runner setgid command was executed after importing several airflow imports, which - when executed for the first time could take quite some time (possibly even few seconds). The setgid command should be done as soon as possible, in case of any errors in the import, it would fail and the setgid could be never set. Also this caused the test_start_and_terminate test to fail in CI because the imports could take arbitrary long time (depending on parallel tests and whether the imported modules were already loaded in the process so setting the gid could be set after more than 0.5 seconds. This change fixes it twofold: * setgid is moved to be first instruction to be executed (also signal handling was moved to before the potentially long imports) * the test was fixed to wait actively and only fail after the timeout of 1s (which should not happen before of the fix above) Additionally the test was using `task test` command rather than task run, and in some circumstances when you tried to run it locally, when FORK was disabled (MacOS) the same test could fail with a different error because --error-file flag is not defined for `task test` command but it is automatically added by the runner. The task command has been changed to `run' Fixing this tests caused occasional test_on_kill failure which suffered from similar problem and had similar sleep implemented. Thanks to that the test will be usually faster as no significant delays will be introduced.
potiuk
force-pushed
the
fix-failing-test-start-and-terminate-test
branch
from
December 4, 2021 21:18
b7b8735
to
49096bb
Compare
mik-laj
approved these changes
Dec 4, 2021
github-actions
bot
added
the
full tests needed
We need to run full set of tests for this PR to merge
label
Dec 4, 2021
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Dec 5, 2021
The previous fix in apache#20040 improved forked tests but also caused instability in the "on_kill" test for standard task runner. This PR fixes the instability by signalling when the task started rather than waiting for fixed amount of time and it adds better diagnostics for the test.
Merged
potiuk
added a commit
that referenced
this pull request
Dec 5, 2021
The previous fix in #20040 improved forked tests but also caused instability in the "on_kill" test for standard task runner. This PR fixes the instability by signalling when the task started rather than waiting for fixed amount of time and it adds better diagnostics for the test.
potiuk
added a commit
that referenced
this pull request
Dec 11, 2021
The runner setgid command was executed after importing several airflow imports, which - when executed for the first time could take quite some time (possibly even few seconds). The setgid command should be done as soon as possible, in case of any errors in the import, it would fail and the setgid could be never set. Also this caused the test_start_and_terminate test to fail in CI because the imports could take arbitrary long time (depending on parallel tests and whether the imported modules were already loaded in the process so setting the gid could be set after more than 0.5 seconds. This change fixes it twofold: * setgid is moved to be first instruction to be executed (also signal handling was moved to before the potentially long imports) * the test was fixed to wait actively and only fail after the timeout of 1s (which should not happen before of the fix above) Additionally the test was using `task test` command rather than task run, and in some circumstances when you tried to run it locally, when FORK was disabled (MacOS) the same test could fail with a different error because --error-file flag is not defined for `task test` command but it is automatically added by the runner. The task command has been changed to `run' Fixing this tests caused occasional test_on_kill failure which suffered from similar problem and had similar sleep implemented. Thanks to that the test will be usually faster as no significant delays will be introduced. (cherry picked from commit abe01fa)
potiuk
added a commit
that referenced
this pull request
Dec 11, 2021
The previous fix in #20040 improved forked tests but also caused instability in the "on_kill" test for standard task runner. This PR fixes the instability by signalling when the task started rather than waiting for fixed amount of time and it adds better diagnostics for the test. (cherry picked from commit e2345ff)
38 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:Scheduler
including HA (high availability) scheduler
full tests needed
We need to run full set of tests for this PR to merge
type:bug-fix
Changelog: Bug Fixes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The runner setgid command was executed after importing several airflow
imports, which - when executed for the first time could take quite
some time (possibly even few seconds). The setgid command should be
done as soon as possible, in case of any errors in the import, it
would fail and the setgid could be never set.
Also this caused the test_start_and_terminate test to fail in CI
because the imports could take arbitrary long time (depending on
parallel tests and whether the imported modules were already
loaded in the process so setting the gid could be set after more
than 0.5 seconds.
This change fixes it twofold:
signal handling was moved to before the potentially long
imports)
timeout of 1s (which should not happen before of the fix above)
Additionally the test was using
task test
command rather than task run,and in some circumstances when you tried to run it locally,
when FORK was disabled (MacOS) the same test could fail with
a different error because --error-file flag is not defined for
task test
command but it is automatically added by the runner.The task command has been changed to `run'
Fixing this tests caused occasional test_on_kill failure
which suffered from similar problem and had similar sleep
implemented.
Thanks to that the test will be usually faster as no significant delays
will be introduced.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.