-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add two operators in AWS Providers: RedshiftResumeClusterOperator and RedshiftPauseClusterOperator #19665
Add two operators in AWS Providers: RedshiftResumeClusterOperator and RedshiftPauseClusterOperator #19665
Changes from 8 commits
580dd93
7a921c5
0edb98d
9911c72
a68cbbf
b20a38e
81f2f75
61b2935
3b3f1e0
b151337
ec92273
3ce29da
456cb73
4d6e402
57c4c35
e03031f
92cfcf2
06a7e3d
d20662a
e72bec0
e659253
b303dc7
6c63fed
3ac8f33
50becc6
653d6b2
f012c29
a7ff6ca
f4ac7a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from airflow.models import BaseOperator | ||
from airflow.providers.amazon.aws.hooks.redshift import RedshiftClusterStates, RedshiftHook | ||
|
||
|
||
class RedshiftPauseClusterOperator(BaseOperator): | ||
""" | ||
Pause an AWS Redshift Cluster using boto3. | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
:param cluster_identifier: id of the AWS Redshift Cluster | ||
:type cluster_identifier: str | ||
:param aws_conn_id: aws connection to use | ||
:type aws_conn_id: str | ||
:param check_interval: time in seconds that the job should wait in | ||
between each instance state checks until operation is completed | ||
:type check_interval: float | ||
""" | ||
|
||
template_fields = ("cluster_identifier",) | ||
ui_color = "#eeaa11" | ||
ui_fgcolor = "#ffffff" | ||
|
||
def __init__( | ||
self, | ||
*, | ||
cluster_identifier: str, | ||
aws_conn_id: str = "aws_default", | ||
check_interval: float = 15, | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
**kwargs, | ||
): | ||
super().__init__(**kwargs) | ||
self.cluster_identifier = cluster_identifier | ||
self.aws_conn_id = aws_conn_id | ||
self.check_interval = check_interval | ||
|
||
def execute(self, context): | ||
redshift_hook = RedshiftHook(aws_conn_id=self.aws_conn_id) | ||
self.log.info("Pausing Redshift cluster %s", self.cluster_identifier) | ||
cluster_state = redshift_hook.cluster_status(cluster_identifier=self.cluster_identifier) | ||
if cluster_state == RedshiftClusterStates.AVAILABLE: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if not met i would log info, or perhaps even warning because it seems that it is unexpected. |
||
redshift_hook.get_conn().pause_cluster(ClusterIdentifier=self.cluster_identifier) | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from airflow.models import BaseOperator | ||
from airflow.providers.amazon.aws.hooks.redshift import RedshiftClusterStates, RedshiftHook | ||
|
||
|
||
class RedshiftResumeClusterOperator(BaseOperator): | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Resume an AWS Redshift Cluster using boto3. | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
:param cluster_identifier: id of the AWS Redshift Cluster | ||
:type cluster_identifier: str | ||
:param aws_conn_id: aws connection to use | ||
:type aws_conn_id: str | ||
:param check_interval: time in seconds that the job should wait in | ||
between each instance state checks until operation is completed | ||
:type check_interval: float | ||
""" | ||
|
||
template_fields = ("cluster_identifier",) | ||
ui_color = "#eeaa11" | ||
ui_fgcolor = "#ffffff" | ||
|
||
def __init__( | ||
self, | ||
*, | ||
cluster_identifier: str, | ||
aws_conn_id: str = "aws_default", | ||
check_interval: float = 15, | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
**kwargs, | ||
): | ||
super().__init__(**kwargs) | ||
self.cluster_identifier = cluster_identifier | ||
self.aws_conn_id = aws_conn_id | ||
self.check_interval = check_interval | ||
|
||
def execute(self, context): | ||
redshift_hook = RedshiftHook(aws_conn_id=self.aws_conn_id) | ||
self.log.info("Starting Redshift cluster %s", self.cluster_identifier) | ||
cluster_state = redshift_hook.cluster_status(cluster_identifier=self.cluster_identifier) | ||
if cluster_state == RedshiftClusterStates.PAUSED: | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
redshift_hook.get_conn().resume_cluster(ClusterIdentifier=self.cluster_identifier) |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -17,7 +17,7 @@ | |||||||||
# under the License. | ||||||||||
from typing import Optional | ||||||||||
|
||||||||||
from airflow.providers.amazon.aws.hooks.redshift import RedshiftHook | ||||||||||
from airflow.providers.amazon.aws.hooks.redshift import RedshiftClusterStates, RedshiftHook | ||||||||||
from airflow.sensors.base import BaseSensorOperator | ||||||||||
|
||||||||||
|
||||||||||
|
@@ -28,7 +28,7 @@ class AwsRedshiftClusterSensor(BaseSensorOperator): | |||||||||
:param cluster_identifier: The identifier for the cluster being pinged. | ||||||||||
:type cluster_identifier: str | ||||||||||
:param target_status: The cluster status desired. | ||||||||||
:type target_status: str | ||||||||||
:type target_status: RedshiftClusterStates | ||||||||||
""" | ||||||||||
|
||||||||||
template_fields = ('cluster_identifier', 'target_status') | ||||||||||
|
@@ -37,18 +37,25 @@ def __init__( | |||||||||
self, | ||||||||||
*, | ||||||||||
cluster_identifier: str, | ||||||||||
target_status: str = 'available', | ||||||||||
target_status: RedshiftClusterStates = RedshiftClusterStates.AVAILABLE, | ||||||||||
aws_conn_id: str = 'aws_default', | ||||||||||
**kwargs, | ||||||||||
): | ||||||||||
super().__init__(**kwargs) | ||||||||||
self.cluster_identifier = cluster_identifier | ||||||||||
self.target_status = target_status | ||||||||||
self.target_status = ( | ||||||||||
target_status | ||||||||||
if isinstance(target_status, RedshiftClusterStates) | ||||||||||
else RedshiftClusterStates(str(target_status)) | ||||||||||
) | ||||||||||
|
||||||||||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
self.aws_conn_id = aws_conn_id | ||||||||||
self.hook: Optional[RedshiftHook] = None | ||||||||||
|
||||||||||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
def poke(self, context): | ||||||||||
self.log.info('Poking for status : %s\nfor cluster %s', self.target_status, self.cluster_identifier) | ||||||||||
self.log.info( | ||||||||||
'Poking for status : %s\nfor cluster %s', self.target_status.value, self.cluster_identifier | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
or
Suggested change
because |
||||||||||
) | ||||||||||
return self.get_hook().cluster_status(self.cluster_identifier) == self.target_status | ||||||||||
|
||||||||||
def get_hook(self) -> RedshiftHook: | ||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
|
||
import unittest | ||
|
||
import boto3 | ||
|
||
from airflow.providers.amazon.aws.operators.redshift_pause_cluster import RedshiftPauseClusterOperator | ||
|
||
try: | ||
from moto import mock_redshift | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
except ImportError: | ||
mock_redshift = None | ||
|
||
|
||
class TestPauseClusterOperator(unittest.TestCase): | ||
dbarrundiag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
@staticmethod | ||
def _create_clusters(): | ||
client = boto3.client('redshift', region_name='us-east-1') | ||
client.create_cluster( | ||
ClusterIdentifier='test_cluster_to_pause', | ||
NodeType='dc1.large', | ||
MasterUsername='admin', | ||
MasterUserPassword='mock_password', | ||
) | ||
client.create_cluster( | ||
ClusterIdentifier='test_cluster_to_resume', | ||
NodeType='dc1.large', | ||
MasterUsername='admin', | ||
MasterUserPassword='mock_password', | ||
) | ||
if not client.describe_clusters()['Clusters']: | ||
raise ValueError('AWS not properly mocked') | ||
|
||
def test_init(self): | ||
redshift_operator = RedshiftPauseClusterOperator( | ||
task_id="task_test", | ||
cluster_identifier="test_cluster", | ||
aws_conn_id="aws_conn_test", | ||
check_interval=3, | ||
) | ||
assert redshift_operator.task_id == "task_test" | ||
assert redshift_operator.cluster_identifier == "test_cluster" | ||
assert redshift_operator.aws_conn_id == "aws_conn_test" | ||
assert redshift_operator.check_interval == 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a little unsure about converting to enum here. You don't control the API so you can't guarantee that you will have the value in your enum. If an unexpected value appears here, you'll get an unexpected failure. I think you can use the enum for evaluating whether the returned value is the one you are looking for, but I don't think it's a good idea to immediately convert it without error handling. I suppose you could catch
ValueError
and return an uncategorized enum value or something.... but probably better to just leave it raw. i bet @uranusjr would have some wisdom to share here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@o-nikolas had some good thoughts here on using the
Enum
:#19665 (comment)
essentially because that's what was used on the EKS operators similar to this
airflow/airflow/providers/amazon/aws/hooks/eks.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i read the comment and yeah in the 'not found' case that is the focus of the example, sure returning an enum would be fine -- it's completely in your control and a "made-up" status --- but forcing the raw api response into enum is what i feel more iffy about.
and i'd easily say keep the enum there and use it when evaluating the response (e.g. is it paused or available -- use the enum to evaluate the string against rather than hardcode). just am unsure about converting the raw response in the return value and asking for second opinion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion folks!
IMO if there are breaking API changes and the API starts returning different states than you expect it's going to break your code whether you're using strings or an enum, so it's a moot point. I still vote for using Enums from the start here, add error handling, and if APIs change then this code and other code in the related operators will require attention like any other breaking API changes cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. This hook method is just returning status. If new statuses appear (or if you missed one in your research), it won't break your code, unless you force it into an enum that doesn't know the new status. And it's far from certain that a new status would break your operator. If you are waiting for your cluster to become
available
and there's a new statuskindof-almost-available
well that's still not available so your logic is unaffected -- unless of course you are forcing it into an enum and then your code breaks unnecessarily. That's the scenario that makes me wary of this approach.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooks are necessarily tied to the API/service they wrap, and I agree with @dstandish -- we don't want our airflow Hook to suddley not work at all when AWS add a new status code to their API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks so much @dstandish and @ashb, should we go back to using the strings and leave
cluster_not_found
to avoid the breaking change?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please let's avoid breaking
in my view you are still welcome to use the enum for the purpose of evaluating the response from the API but just don't force the raw response into Enum
thank you 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dstandish,
Yes, you would need to correctly handle the case of an unknown/new status, but there's no reason you can't correctly handle this as well as use Eunms (E.g. coerce anything we haven't seen yet to an UNKOWN Enum state, or just continue to return None like other errors cases can do in this existing code). Certainly each approach has pros and cons, but I see no breaking reason if you chose to use Enums.
Though I'm happy to disagree and commit to using strings if that's what others think as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup it's true this is an approach that would be more tolerant of the unexpected, it's just not what was done here.
But I would say there is little value in implementing such error handling in an effort to still convert in this return value. If your value conforms to the enum, then your comparisons against the enum would work even without converting in the return. And if it doesn't conform, then you either fail (current code state), or throw away information (replacing with None or UNKNOWN). So what's the point of converting it to an enum at all? I would just leave the raw values alone and use the enums for evaluation.
I don't know that we need to make a universal proclaimation but in this case I think not converting to enum makes sense :)