Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move setgid as the first command executed in forked task runner #20040

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Dec 4, 2021

The runner setgid command was executed after importing several airflow
imports, which - when executed for the first time could take quite
some time (possibly even few seconds). The setgid command should be
done as soon as possible, in case of any errors in the import, it
would fail and the setgid could be never set.

Also this caused the test_start_and_terminate test to fail in CI
because the imports could take arbitrary long time (depending on
parallel tests and whether the imported modules were already
loaded in the process so setting the gid could be set after more
than 0.5 seconds.

This change fixes it twofold:

  • setgid is moved to be first instruction to be executed (also
    signal handling was moved to before the potentially long
    imports)
  • the test was fixed to wait actively and only fail after the
    timeout of 1s (which should not happen before of the fix above)

Additionally the test was using task test command rather than task run,
and in some circumstances when you tried to run it locally,
when FORK was disabled (MacOS) the same test could fail with
a different error because --error-file flag is not defined for
task test command but it is automatically added by the runner.

The task command has been changed to `run'

Fixing this tests caused occasional test_on_kill failure
which suffered from similar problem and had similar sleep
implemented.

Thanks to that the test will be usually faster as no significant delays
will be introduced.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Dec 4, 2021
@potiuk potiuk force-pushed the fix-failing-test-start-and-terminate-test branch 3 times, most recently from 5ef4449 to b7b8735 Compare December 4, 2021 20:54
The runner setgid command was executed after importing several airflow
imports, which - when executed for the first time could take quite
some time (possibly even few seconds). The setgid command should be
done as soon as possible, in case of any errors in the import, it
would fail and the setgid could be never set.

Also this caused the test_start_and_terminate test to fail in CI
because the imports could take arbitrary long time (depending on
parallel tests and whether the imported modules were already
loaded in the process so setting the gid could be set after more
than 0.5 seconds.

This change fixes it twofold:

* setgid is moved to be first instruction to be executed (also
  signal handling was moved to before the potentially long
  imports)
* the test was fixed to wait actively and only fail after the
  timeout of 1s (which should not happen before of the fix above)

Additionally the test was using `task test` command rather than task run,
and in some circumstances when you tried to run it locally,
when FORK was disabled (MacOS) the same test could fail with
a different error because --error-file flag is not defined for
`task test` command but it is automatically added by the runner.

The task command has been changed to `run'

Fixing this tests caused occasional test_on_kill failure
which suffered from similar problem and had similar sleep
implemented.

Thanks to that the test will be usually faster as no significant delays
will be introduced.
@potiuk potiuk force-pushed the fix-failing-test-start-and-terminate-test branch from b7b8735 to 49096bb Compare December 4, 2021 21:18
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 4, 2021
@github-actions
Copy link

github-actions bot commented Dec 4, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit abe01fa into apache:main Dec 4, 2021
@potiuk potiuk deleted the fix-failing-test-start-and-terminate-test branch December 4, 2021 22:38
potiuk added a commit to potiuk/airflow that referenced this pull request Dec 5, 2021
The previous fix in apache#20040 improved forked tests but also caused
instability in the "on_kill" test for standard task runner.

This PR fixes the instability by signalling when the task started
rather than waiting for fixed amount of time and it adds better
diagnostics for the test.
@potiuk potiuk mentioned this pull request Dec 5, 2021
potiuk added a commit that referenced this pull request Dec 5, 2021
The previous fix in #20040 improved forked tests but also caused
instability in the "on_kill" test for standard task runner.

This PR fixes the instability by signalling when the task started
rather than waiting for fixed amount of time and it adds better
diagnostics for the test.
@jedcunningham jedcunningham added this to the Airflow 2.2.3 milestone Dec 11, 2021
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Dec 11, 2021
potiuk added a commit that referenced this pull request Dec 11, 2021
The runner setgid command was executed after importing several airflow
imports, which - when executed for the first time could take quite
some time (possibly even few seconds). The setgid command should be
done as soon as possible, in case of any errors in the import, it
would fail and the setgid could be never set.

Also this caused the test_start_and_terminate test to fail in CI
because the imports could take arbitrary long time (depending on
parallel tests and whether the imported modules were already
loaded in the process so setting the gid could be set after more
than 0.5 seconds.

This change fixes it twofold:

* setgid is moved to be first instruction to be executed (also
  signal handling was moved to before the potentially long
  imports)
* the test was fixed to wait actively and only fail after the
  timeout of 1s (which should not happen before of the fix above)

Additionally the test was using `task test` command rather than task run,
and in some circumstances when you tried to run it locally,
when FORK was disabled (MacOS) the same test could fail with
a different error because --error-file flag is not defined for
`task test` command but it is automatically added by the runner.

The task command has been changed to `run'

Fixing this tests caused occasional test_on_kill failure
which suffered from similar problem and had similar sleep
implemented.

Thanks to that the test will be usually faster as no significant delays
will be introduced.

(cherry picked from commit abe01fa)
potiuk added a commit that referenced this pull request Dec 11, 2021
The previous fix in #20040 improved forked tests but also caused
instability in the "on_kill" test for standard task runner.

This PR fixes the instability by signalling when the task started
rather than waiting for fixed amount of time and it adds better
diagnostics for the test.

(cherry picked from commit e2345ff)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants