Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring EmrClusterLink and add it for other AWS EMR Operators #24294

Merged
merged 1 commit into from
Jun 12, 2022

Conversation

Taragolis
Copy link
Contributor

Amazon provider at that moment has only one External link for EmrClusterLink.

The idea was taken from Google Cloud Provider.

Extend:

  • Hook: Return region and aws partition from current connection/session
  • Console link based on aws partition now
  • Link contains region_name now

Add:

  • AwsBaseLink for further Links (Batch, Glue, Cloudwatch, etc.)
  • EmrClusterLink links to EmrAddStepsOperator, EmrModifyClusterOperator, EmrTerminateJobFlowOperator

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Jun 7, 2022
@Taragolis Taragolis force-pushed the aws-init-external-links branch from 8a31230 to 6c8f896 Compare June 7, 2022 17:31
@Taragolis
Copy link
Contributor Author

cc: @ferruzzi if you have a time could you have a look on this

@Taragolis Taragolis changed the title Refactoring EmrClusterLink and add for other AWS EMR Operators Refactoring EmrClusterLink and add it for other AWS EMR Operators Jun 8, 2022
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to figure out what you were getting at, but it's clever and I like it. ((For anyone else who hasn't had a coffee yet, it's to add a website link back to the AWS Console in the Extra Links section of the Operators in the UI))

I wonder if we can figure out a way to work the operator_extra_links assignment into the AwsBaseOperator that you are working on as well, so we don't assign it each time. I'd need to think on it, but BaseOperator accepts a hook like EmrClusterHook. I wonder if we can work some shenanigans to use that classname to magically add the link if the matching link class exists. I'm not proposing that this gets held up for that, just a thought.

airflow/providers/amazon/aws/links/base.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/links/base.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/links/base.py Outdated Show resolved Hide resolved
@ferruzzi
Copy link
Contributor

ferruzzi commented Jun 8, 2022

@shubham22 Check this out! We were just talking about getting the Extra Links sections filled out for the operators when there was time!

@vincbeck
Copy link
Contributor

vincbeck commented Jun 8, 2022

Really nice work! Besides @ferruzzi comments, I dont have anything to say other than I like it :)

@shubham22
Copy link

@Taragolis - this is very nice and great design as well! We had in our backlog, but couldn't get to it as we are prioritizing System Test work. Thanks for taking the initiative.

@Taragolis Taragolis force-pushed the aws-init-external-links branch from 6c8f896 to 354f496 Compare June 8, 2022 20:25
@Taragolis
Copy link
Contributor Author

I wonder if we can figure out a way to work the operator_extra_links assignment into the AwsBaseOperator that you are working on as well, so we don't assign it each time.

I think it is hardly possible because each operator could required each set of Links and not all of them related to Hook

Example:

  • For EMR at that moment only link to cluster could be provided because there is no direct link to steps exists.
  • For AWS Batch it could be set of links: Batch Job Detail, Latest Attempt logs in CloudWatch, Batch Job Definition, Batch Queue
  • For Glue Job it could be set of links: Glue Job Detail (Glue ETL Studio), Glue Job Detail (Legacy), Glue Job, Filter to CloudWatch prefixes

@Taragolis
Copy link
Contributor Author

We had in our backlog, but couldn't get to it as we are prioritizing System Test work.

no problem, anyway there is plenty amount of time to next provider release after 4.0.0

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

airflow/providers/amazon/aws/links/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/links/emr.py Show resolved Hide resolved
@ferruzzi
Copy link
Contributor

ferruzzi commented Jun 8, 2022

I think it is hardly possible because each operator could required each set of Links and not all of them related to Hook

Maybe. But also, if BaseOperator created a Link set containing the link to the service's main console page, then that list could be extended in the operator if desired. That would save assigning it in every Operator and only need the declaration in Operators where more than the basic link are desired.

Either way, it's not a blocking suggestion, just a thought.

I'm also not sure what this current implementation is going to look like when you want to add a second/third/nth link.

@Taragolis Taragolis force-pushed the aws-init-external-links branch from 354f496 to c8b5fa2 Compare June 10, 2022 16:09
@Taragolis
Copy link
Contributor Author

Maybe. But also, if BaseOperator created a Link set containing the link to the service's main console page

The implementation of Aws Service link it straightforward

import attr
from airflow.providers.amazon.aws.links.base_aws import BaseAwsLink, BASE_AWS_CONSOLE_LINK


@attr.s(auto_attribs=True)
class AwsServiceLink(BaseAwsLink):
    name: str
    format_str: str
    key = "aws_service"

AwsServiceLink(
    name="AWS EMR Console Link",
    format_str=BASE_AWS_CONSOLE_LINK + "/elasticmapreduce/home?region={region_name}#"
)

The is two things which need to be check and solved

  1. Each operator should call AwsServiceLink.persist in execute method of operator/sensor
  2. Need to check is it fine to change operator_extra_links in runtime

I'm also not sure what this current implementation is going to look like when you want to add a second/third/nth link.

All operator links predefined in operator in that moment

In execute stage call persist method of link with appropriate keywords, which stored in XCom

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good as it is :). Any refactoring like that IMHO is never "complete" until you have 3-4 derived classes :). And we can always iterate if we find that we need to. I think you should merge that one and add few more links based on that and we will find out if more changes are needed WDYT @ferruzzi @Taragolis

@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jun 11, 2022
@ferruzzi
Copy link
Contributor

Yeah, I'm good. Like I said, it's just something to consider, I didn't want to block the merge over that. Thanks for checking.

@potiuk potiuk merged commit 19dd9f5 into apache:main Jun 12, 2022
@Taragolis Taragolis deleted the aws-init-external-links branch June 12, 2022 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants