Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow set Dag Run resource into Dag Level permission #40703

Merged
merged 269 commits into from
Aug 8, 2024

Conversation

joaopamaral
Copy link
Contributor

@joaopamaral joaopamaral commented Jul 10, 2024

This PR extends Dag's access_control feature to allow Dag Run resource permissions. For example, it can allow only some selected Dag with trigger permission for a specific role.

closes: #37329

Validation:

  • Testing removing a can_create DAG Runs and can_delete DAGs from User role.
$ airflow roles del-perms User -a "can_create" -r "DAG Runs"
$ airflow roles del-perms User -a "can_delete" -r "DAGs"
  • Including the can_create DAG Runs in dag tutorial and can_delete DAG in dag tutorial2.
with DAG(
    'tutorial',
    ...
    access_control={
        'User': {
            'DAG Runs': {'can_create'}
        }
    }
) as dag:

with DAG(
    'tutorial2',
    ....
    access_control={
        'User': {'can_delete'}  # keeping support to old style, it can be also {'DAGs': {'can_delete'}}
    }
) as dag:
  • UI with disable buttons with delete and trigger function working when allowed:
image image image

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk and others added 30 commits March 31, 2024 17:22
During the last week refactor input on mypy checks was moved to
another workflow and mypy checks were not running for a day.

This change brings it back.

(cherry picked from commit a2f5307)
The config names all use "object storage", but the class is still using
"object store". Since we're not renaming the configs, let's rename the
class for consistency.

(cherry picked from commit 0371ea8)
Co-authored-by: Tom Rutter <[email protected]>
(cherry picked from commit 7613795)
The `total_entries` count should reflect how many log entries match the
filters provided, not simply how many rows are in the table total.

(cherry picked from commit 90e7b3f)
These are the attrs we allow sorting on, not filtering on.

(cherry picked from commit e700f41)
* Prohibit to use unsupported DB backends in tests

* Move invalid backends tests into the integration

* Add files to missing unit tests

(cherry picked from commit 279d1f7)
…ath` (apache#38624)

* Disable support of a legacy `LocalPath` in favor of stdlib `pathlib.Path`

* Fixup tests which use tmpdir instead of tmp_path

(cherry picked from commit 2660188)
(cherry picked from commit 8455eb2)
…38659)

Cleaning up docker is a single bash script to execute - it is defined
as a composite action but since it's just one bash commenad to execute,
it is better to just keep it as a bash script instead - which wil make
it easier to add other scripts.

(cherry picked from commit bc7b68b)
(cherry picked from commit 9c61147)
* Improve stability of remove_task_decorator function

* fix statics

* test

* remove test

---------

Co-authored-by: Sam Wheating <[email protected]>
(cherry picked from commit f1301da)
…get_sorted_triggers queries (apache#38664)

(cherry picked from commit 8246a89)
* Add a task instance dependency for mapped dependencies (apache#37091)

---------

Co-authored-by: Tzu-ping Chung <[email protected]>
(cherry picked from commit e2df442)
…he#38678)

We have now a separate job that prepares and uploads artifacts
when PROD image is being prepared for K8S testing. This saves a bit
of time to prepare PROD image. However this step had a bug - it
checked out the `main` version of code when preparing airflow and
provider packages, so it was really testing the "previous" merge
not the "current" PR.

This PR switches the job to prepare artifacts to use the same
checkout strategy as the other build-image workflow jobs.

(cherry picked from commit eb541ae)
The image tag makes no difference. While we are using the image to
run Helm tests (for now) the fact that we are using specific image
shoudl not be part of the job name (it's printed in the steps
when the image is pulled).

(cherry picked from commit 7526440)
…che#38601)

Depending on selective checks, but also on the job executed, we
choose whether to run job on public runners or self-hosted runners.
So far the set of labels to select the runners were passed in a bit
inconsistent way. Outputs of selective checks can only be strings
and the `run-as` accepts array of strings (labels) - so we were
using fromJSON to convert between the two. And we used runs-on
inputs on a number of our workflows to pass the selection.

However this meant that runs-on could be either string or array and
that sometimes we passed public/self-hosted labels as strings
directly and some of those were hard-coded.

This PR changes it consistently across the board to introduce
consistent approach:

* build info have no selective checks results yet, so for them
  runs-on is hardcoded
* similarly for "windows" and release jobs that are manually run
  without running selective checks
* selective checks will produce three outuputs - JSON stringiified
  array of labels:
   * default (one that is selected depending on who runs the build)
   * public (for cases where we want to force the builds to use public
     runners
   * self-hosted (for cases where we want to force the builds to use
     self-hosted runners
* all the outputs are named `<type>-runs-on-as-string` to make it clear
  they are all strings
* all inputs of workflows expectings strings are named the same (with
  as-string suffix and <type> prefix)
* whenever a job is run, we pass "runs-on" parameter to be `fromJSON`
  with appropriate type we want to use passed as input

This will make it easier to reason on which job is using which type
of runner and it will make it easier in the future to make it more
flexible when we add ASF self-hosted runners and possibly our own
K8S runners, or when we would want to change labels for public runners
or self-hosted runners.

(cherry picked from commit 9da08a5)
The "UPGRADE_TO_NEWER_DEPENDENCIES" build arg was a bit misleading
as it was not a true/false value but rather "random hash"/"false".

This PR makes it a bit more explicit:

* the arg is named UPGRADE_INVALIDATION_STRING now
* it's default is "" not false and all conditions are set to check
  for != or == "".

The inputs/outputs in CI job remain as they were - "true/false" but
this PR makes it clearer.

(cherry picked from commit 1934c8b)
…pache#38617)

The apache#37818 bumped requirement for google-cloud-aiplatform to a
version that requires Pydantic so we need to remove it when
we are running "no-pydantic" tests.

The tests that import it should be skipped
when the dependency is removed.

(cherry picked from commit 69df34e)
@joaopamaral
Copy link
Contributor Author

There are some tests failing because there was an increase in the number of queries. I'll check it out this week.

@shahar1 shahar1 self-requested a review July 25, 2024 19:27
@joaopamaral
Copy link
Contributor Author

@shahar1, fyi I've included this change to skip the Dag Run permissions (from dag level) in admin role f182af9 . I think it will fix the db test issue.

@shahar1
Copy link
Contributor

shahar1 commented Aug 2, 2024

@shahar1, fyi I've included this change to skip the Dag Run permissions (from dag level) in admin role f182af9 . I think it will fix the db test issue.

Apologies for the late response - couple of comments before merging:

  1. Thanks for fixing the DB test!
  2. Could you please merge again from main so we could ensure all checks pass?
  3. Only now I realized that there's no documentation nor newsfragment includede in this PR. I think that both are essential for other people being aware of this feature, and using it properly. Could you please update the docs. in docs/apache-airflow-providers-fab/auth-manager/access-control.rst and create newsframent/40703.feature.rst?

Please let me know when you're done, and I'll re-approve and hopefully merge :)

@shahar1 shahar1 self-requested a review August 2, 2024 20:55
@joaopamaral
Copy link
Contributor Author

joaopamaral commented Aug 5, 2024

Hey @shahar1, I've included the docs and I've dug a little more about the query count test and the difference is during the _sync_perm_for_dag:


Recorded query locations:
        dagbag.py:_sync_perm_for_dag:748 > override.py:sync_perm_for_dag:1111 > override.py:create_permission:1848 > override.py:get_permission:1819 > override.py:get_action:1688:     15
        dagbag.py:_sync_perm_for_dag:748 > override.py:sync_perm_for_dag:1111 > override.py:create_permission:1848 > override.py:get_permission:1820 > override.py:get_resource:1749:   15
        dagbag.py:_serialize_dag_capturing_errors:692 > dagbag.py:_sync_perm_for_dag:748 > override.py:sync_perm_for_dag:1111 > override.py:create_permission:1848 > override.py:get_permission:1823:   10


E           AssertionError: The expected number of db queries is 94 with extra margin: 10. The current number is 154.
E           Recorded query locations:
E               dagbag.py:_sync_perm_for_dag:748 > override.py:sync_perm_for_dag:1123 > override.py:create_permission:1890 > override.py:get_permission:1861 > override.py:get_action:1730:     35
E               dagbag.py:_sync_perm_for_dag:748 > override.py:sync_perm_for_dag:1123 > override.py:create_permission:1890 > override.py:get_permission:1862 > override.py:get_resource:1791:   35
E               dagbag.py:_serialize_dag_capturing_errors:692 > dagbag.py:_sync_perm_for_dag:748 > override.py:sync_perm_for_dag:1123 > override.py:create_permission:1890 > override.py:get_permission:1865:   30

As we included more permissions at the DAG level, we expect an increase in the number of queries during the sync:

for resource_name, resource_values in self.RESOURCE_DETAILS_MAP.items():
dag_resource_name = self._resource_name(dag_id, resource_name)
for dag_action_name in resource_values["actions"]:
self.create_permission(dag_action_name, dag_resource_name)

Do you think we can increase the query count for this test?

… in dag level and it will increase the calls in _sync_perm_for_dag apache#40703 (comment)
@shahar1 shahar1 added this to the Airflow 2.10.0 milestone Aug 8, 2024
@shahar1 shahar1 added the type:new-feature Changelog: New Features label Aug 8, 2024
@shahar1 shahar1 merged commit 090607d into apache:main Aug 8, 2024
54 checks passed
ephraimbuddy added a commit to astronomer/airflow that referenced this pull request Aug 11, 2024
@potiuk
Copy link
Member

potiuk commented Aug 16, 2024

I think that one caused problem #41540 (comment) - it does not seem backwards-compatible with the previous fab provider.

@kosteev
Copy link
Contributor

kosteev commented Aug 26, 2024

It looks like this change implies dependencies between Airflow core and fab provider and new version of fab provider requires changes also in Airflow core, thus fab provider should have constraint apache-airflow>=2.9.X?

It looks like for Airflow 2.9.1 this version of fab provider is not backward compatible

@potiuk
Copy link
Member

potiuk commented Aug 27, 2024

It looks like for Airflow 2.9.1 this version of fab provider is not backward compatible

@ekosteev - do you see some evidence of this ? The issue here: #41540 (comment) - made the earlier FAB providers not compatible for 2.10.0 -> which shoudl be fixed in #41540, but I have not seen/ compatibility issues with earlier airflow versions (It passed all the compatibility tests at the very least)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:dev-tools area:production-image Production image improvements and fixes type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support configuring 'can_create' on DAG level permissions