Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT : Add command to get DAG Details via CLI #30432

Merged
merged 14 commits into from
Apr 14, 2023

Conversation

maahir22
Copy link
Contributor

@maahir22 maahir22 commented Apr 2, 2023

This PR closes #30365
Problem Statement: At the moment, there is no support to view DAG Details via the Airflow CLI. The only support for viewing DAG Details is via the REST API at the endpoint: https://airflow.apache.org/api/v1/dags/{dag_id}/details

This new command in Airflow has the proposed format:
airflow dags details <dag-id> and returns all the details that are returned from the REST API as well. The terminal output currently looks like this: terminal

@maahir22 maahir22 changed the title Add command to get DAG Details via CLI FEAT : Add command to get DAG Details via CLI Apr 2, 2023
Comment on lines 356 to 362
for key, value in dag_detail.items():
if isinstance(value, dict):
print(f"\t{key}:")
for subkey, subvalue in value.items():
print(f"\t\t{subkey}: {subvalue}")
else:
print(f"\t{key}: {value}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are trying to implement a new logic for yaml output, and this doesn't support other types (json, table, plain).
Instead you can use AirflowConsole.print_as method, something like:

Suggested change
for key, value in dag_detail.items():
if isinstance(value, dict):
print(f"\t{key}:")
for subkey, subvalue in value.items():
print(f"\t\t{subkey}: {subvalue}")
else:
print(f"\t{key}: {value}")
AirflowConsole().print_as(
data=[dag_detail],
output=args.output,
)

Copy link
Member

@potiuk potiuk Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. All the CLI commands should use the same formatting options.

Copy link
Contributor Author

@maahir22 maahir22 Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to do that @potiuk, @hussein-awala ? The default fallback when no --output option is provided is table -> so args.output becomes table by default, the SimpleTable isn't able to render such a large table properly due to the number of parameters and the length of these fields - most devices won't be able to display this output in a comprehensible way.
To standardise the approach, I've switched to AirflowConsole(), with a caveat that "table" isn't allowed as an output option - if it's received "yaml" output is displayed instead.

  1. Table Format

image

  1. JSON Format
    image

  2. YAML Format
    image

  3. Plain Format
    image

Copy link
Member

@potiuk potiuk Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a table support with swapped columns and rows ? I think that one will be much better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo so essentially a transpose of the current table?
Let me know if I understand it currently, if initially the columns were scheduler_lock, is_active with Row1 being [], False -> now the rows will be scheduler_lock, is_active with Column1 having [], False?
Also, do we want this to be a new output type like --ttable? (Extra t for transposed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the transposed table, it's simple with no need to update the print_as method:

if args.output == "table":
    data = [
        {
            "property_name": key,
            "property_value": value
        }
        for key, value in dag_detail.items()
    ]
else:
    data = [dag_detail]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precisely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes done, and everything looks fine now for table (all other outputs in my previous comment) -
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see in my previous comments, table & plain both formats were not rendered properly for the large number of attributes, so I've added checks for both of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. That's what I thought will be nicer :)

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

airflow/cli/commands/dag_command.py Outdated Show resolved Hide resolved
airflow/cli/cli_config.py Outdated Show resolved Hide resolved
out = temp_stdout.getvalue()

# Check if DAG Details field are present
dag_details_fields = [
Copy link
Member

@potiuk potiuk Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right way. You should check attributes of the dag rather than list the fields by hand. If we ad a new field we wll 100% forget to add it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll check all the attributes - don't get the "they should also likely be sorted part", do we want to display the command attributes in sorted order & test if the same order is maintained? Is there a reason behind doing this as it follows the same order as the REST Client response.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs non-hardcoded attributes, they should be retrieved from dag_details.
There are a number of ways you can do it: https://stackoverflow.com/questions/2675028/list-attributes-of-an-object
Possibly you should filter some (those starting with _ and other common object attributes. They should also likely be sorted.

@maahir22
Copy link
Contributor Author

maahir22 commented Apr 11, 2023

Needs non-hardcoded attributes, they should be retrieved from dag_details. There are a number of ways you can do it: https://stackoverflow.com/questions/2675028/list-attributes-of-an-object Possibly you should filter some (those starting with _ and other common object attributes. They should also likely be sorted.

@potiuk On closer inspection, I found that listing attributes of the class directly isn't going to be a scalable solution either - we have to omit function names / other variable names in such a case as well - highly probable that someone adds a new function to the schema and doesn't add it to the test.
So, I've simply mimicked the process -> we get the DAG by querying and then dump to obtain the schema fields, finally we just check whether all fields present in this dumped schema are present in the response obtained from the CLI Command. This should be a scalable solution since we don't have to worry about hardcoding variables/function names anywhere anymore.
OR
If this approach seems to be too duplicate, I can check if the intersection of CLI Response Fields and the dir(dag_schema) >= FIXED_THRESHOLD, this will ensure that incase new fields are added in future our check don't fail.
@josh-fell @hussein-awala Please let me know if your opinion on this aswell.
PS: In our case as per the StackOverflow link only dir worked - vars & dict threw unreliable empty results.

@maahir22 maahir22 requested a review from hussein-awala April 12, 2023 07:04
@maahir22
Copy link
Contributor Author

LGTM Merge

@potiuk potiuk merged commit 7074167 into apache:main Apr 14, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 14, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@potiuk
Copy link
Member

potiuk commented Apr 14, 2023

JUST in time for 2.6

@potiuk potiuk added this to the Airflow 2.6.0 milestone Apr 14, 2023
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Apr 14, 2023
ephraimbuddy pushed a commit that referenced this pull request Apr 14, 2023
---------

Co-authored-by: Hussein Awala <[email protected]>
Co-authored-by: Hussein Awala <[email protected]>
(cherry picked from commit 7074167)
wookiist pushed a commit to wookiist/airflow that referenced this pull request Apr 19, 2023

---------

Co-authored-by: Hussein Awala <[email protected]>
Co-authored-by: Hussein Awala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need an REST API or/and Airflow CLI to fetch last parsed time of a given DAG
5 participants