Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS logs. Exit fast when 3 consecutive responses are returned from AWS Cloudwatch logs #30756

Merged
merged 7 commits into from
Apr 21, 2023

Conversation

vincbeck
Copy link
Contributor

@vincbeck vincbeck commented Apr 19, 2023

An issue had been introduced by #20814. Now, it takes customers 20 seconds or more to load task logs from past executions. As a solution we decided to exit when 3 consecutive empty responses are returned from AWS Cloudwatch logs


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Comment on lines 30 to 31
# The issue with this approach is, it can take a huge amount of time (e.g. 20 seconds) to retrieve logs
# As an intermediate solution, we decided to stop fetching logs if 3 consecutive responses are empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that happens what user is expected to do / how would user know that this is the issue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He would not be able to know. Before #20814, we were stopping fetching logs after one response from CW is empty. After #20814 being introduced, we now wait that the next token is the same in 2 consecutive responses. We now shoot in the middle to satisfy a bit of the two worlds. This fix is very empirical based on user experiences. They get a very frustrated experience where, sometimes, they need to wait 20 seconds to retrieve logs

airflow/providers/amazon/aws/hooks/logs.py Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/logs.py Outdated Show resolved Hide resolved
else:
token_arg = {}

response = self.get_conn().get_log_events(
response = self.conn.get_log_events(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including this update ❤️

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could wrap it in a flag to let the user decide whether to wait or quit early, defaulting to quit early, but this seems like a decent compromise to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants