-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECSOperator returns last logs when ECS task fails #17038
Comments
Would it be better if we include the |
Sorry I don’t understand what you mean. Now |
Could you describe what exactly you would like to see (and probably how that can be implemented)? The value of |
Yes I was thinking of doing something like that as I'm not sure there is a consistent way to identify the rows corresponding to the Exception message from the code running in ECS from Cloudwatch logs: something like |
I tried locally and my proposal would be like that:
|
We can probably use p.s. I don’t think |
Assigned you @pmalafosse |
closes: #17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`.
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
closes: apache/airflow#17038 This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. **Before:** The message when it failed was: `This task is not in success state {<huge JSON from AWS containing all the ECS task details>}` **Now:** The message is: ``` This task is not in success state - last logs from Cloudwatch: <last_logs_from_cloudwatch> ``` which makes it much more useful to understand what failed in the underlying code directly from the alert. The number of logs can be customized with the parameter `number_logs_exception`. GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
Description
Currently when the ECSOperator fails because the ECS task is not in 'success' state it returns a generic message like that in Airflow alerts that doesn't have much value when we want to debug things quickly.
This task is not in success state {<huge JSON from AWS containing all the ECS task details>}
Use case / motivation
This is to make it faster for people to fix an issue when a task running ECSOperator fails.
Proposal
The idea would be to return instead the last lines of logs from Cloudwatch (that are printed above in Airflow logs) so when we receive the alert we know what failed in the ECS task instead of having to go to Airflow logs to find it. This feature would involve changes there I think:
airflow/airflow/providers/amazon/aws/operators/ecs.py
Line 354 in 2ce6e8d
airflow/airflow/providers/amazon/aws/operators/ecs.py
Line 375 in 2ce6e8d
The text was updated successfully, but these errors were encountered: