Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECSOperator returns last logs when ECS task fails #17038

Closed
pmalafosse opened this issue Jul 15, 2021 · 7 comments · Fixed by #17209
Closed

ECSOperator returns last logs when ECS task fails #17038

pmalafosse opened this issue Jul 15, 2021 · 7 comments · Fixed by #17209
Assignees
Labels
good first issue kind:feature Feature Requests provider:amazon-aws AWS/Amazon - related issues

Comments

@pmalafosse
Copy link
Contributor

pmalafosse commented Jul 15, 2021

Description

Currently when the ECSOperator fails because the ECS task is not in 'success' state it returns a generic message like that in Airflow alerts that doesn't have much value when we want to debug things quickly.

This task is not in success state {<huge JSON from AWS containing all the ECS task details>}

Use case / motivation

This is to make it faster for people to fix an issue when a task running ECSOperator fails.

Proposal

The idea would be to return instead the last lines of logs from Cloudwatch (that are printed above in Airflow logs) so when we receive the alert we know what failed in the ECS task instead of having to go to Airflow logs to find it. This feature would involve changes there I think:

@pmalafosse pmalafosse added the kind:feature Feature Requests label Jul 15, 2021
@uranusjr
Copy link
Member

Would it be better if we include the task object in the exception? If we’re to return anything more specific, we can subclass AirflowException, which allows putting a lot more context on the exception object.

@pmalafosse
Copy link
Contributor Author

Sorry I don’t understand what you mean. Nowtask object is in the exception but I would like instead to see, in Airflow alerts, the exception from the code that ran in ECS, not the exception defined in the operator that just says “ECS task failed” with a big JSON that I think is irrelevant noise.

@uranusjr
Copy link
Member

Could you describe what exactly you would like to see (and probably how that can be implemented)? The value of self._last_log_message()?

@pmalafosse
Copy link
Contributor Author

pmalafosse commented Jul 15, 2021

Yes I was thinking of doing something like that as I'm not sure there is a consistent way to identify the rows corresponding to the Exception message from the code running in ECS from Cloudwatch logs:

something like raise AirflowException(self._last_log_messages(5)) that would return the last 5 lines from Cloudwatch for example (_last_log_message() would be equal to _last_log_messages(1))

@pmalafosse
Copy link
Contributor Author

I tried locally and my proposal would be like that:

    def _last_log_messages(self, number_messages):
        try:
            logs = [log["message"] for log in self._cloudwatch_log_events()]
            return "\n".join(logs[-number_messages:])
        except IndexError:
            return None

    def _last_log_message(self):
        return self._last_log_messages(1)

...

raise AirflowException(f"This task is not in success state - last logs from Cloudwatch: \n{self._last_log_messages(10)}")

@uranusjr
Copy link
Member

uranusjr commented Jul 19, 2021

We can probably use collections.deque for better performance, but aside from that, sounds good to me! Would you be interested in opening a pull request for this?

p.s. I don’t think IndexError can ever be raised?

@potiuk
Copy link
Member

potiuk commented Jul 19, 2021

Assigned you @pmalafosse

@eladkal eladkal added good first issue provider:amazon-aws AWS/Amazon - related issues labels Jul 20, 2021
kaxil pushed a commit that referenced this issue Sep 9, 2021
closes: #17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped. 

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Mar 10, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Jun 4, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Jul 10, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Aug 27, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Oct 4, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Oct 7, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Dec 7, 2022
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Jan 27, 2023
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this issue Sep 12, 2024
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Sep 17, 2024
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Nov 7, 2024
closes: apache/airflow#17038

This PR changes the message in the AirflowException when the ECS task launched by ECSOperator is stopped.

**Before:**
The message when it failed was:
`This task is not in success state {<huge JSON from AWS containing all the ECS task details>}`

**Now:**
The message is:
```
This task is not in success state - last logs from Cloudwatch:
<last_logs_from_cloudwatch>
```
which makes it much more useful to understand what failed in the underlying code directly from the alert.

The number of logs can be customized with the parameter `number_logs_exception`.

GitOrigin-RevId: e6cb2f7beb4c6ea4ad4a965f9c0f2b8f6978129c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue kind:feature Feature Requests provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants