-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS DataSync cancel task on exception (#11011) #16589
AWS DataSync cancel task on exception (#11011) #16589
Conversation
CI timed out waiting for images, please rerun |
@@ -710,7 +710,7 @@ def test_killed_task(self, mock_wait, mock_get_conn): | |||
# ### Begin tests: | |||
|
|||
# Kill the task when doing wait_for_task_execution | |||
def kill_task(*args): | |||
def kill_task(*args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this change for? It doesn’t seem to be related without others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, yes it is needed.
Without it, the test fails:
E TypeError: kill_task() got an unexpected keyword argument 'max_iterations'
We are mocking AWSDataSyncHook::wait_for_task_execution
which takes the kwarg max_iterations
(unchanged)
On AWSDataSyncOperator
line 364 we are passing this kwarg for the first time, so the mock needs to be able to receive it also. (changed)
@@ -52,7 +52,7 @@ class AWSDataSyncHook(AwsBaseHook): | |||
TASK_EXECUTION_FAILURE_STATES = ("ERROR",) | |||
TASK_EXECUTION_SUCCESS_STATES = ("SUCCESS",) | |||
|
|||
def __init__(self, wait_interval_seconds: int = 5, *args, **kwargs) -> None: | |||
def __init__(self, wait_interval_seconds: int = 30, *args, **kwargs) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change in defaults should be documented in the changelog:
https://github.com/apache/airflow/blob/main/airflow/providers/amazon/CHANGELOG.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
I've added it there, hopefully in the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you will need to add 2.0.1 / 2.1.0 and put it under it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaxil thank you. I think it's right now, please check :)
Small improvements to DataSync operator.
Most notable is the ability of the operator to cancel an in progress task execution, eg if the Airflow task times out or is killed. This avoids a zombie issue when the AWS DataSync service can have a zombie task running even if Airflow's task has failed.
Also made some small changes to polling values. DataSync is a batch-based uploading service, it takes several minutes to operate so I changed the polling intervals from 5 seconds to 30 seconds and adjusted max_iterations to what I think is a more reasonable default.
closes: #11011