-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch served logs also when task attempt is up for retry and no remote logs available #39496
Conversation
fc45184
to
c29563e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one non-blocking suggestion, otherwise LGTM
2883512
to
33c830e
Compare
@kahlstrm can you fix the failing tests? |
Sure, didn't realize the tests failing are related ones, one sec. |
@eladkal fixed the tests, seems like the odd behavior with |
I tried to get this into 2.9.2 but the test was failing https://github.com/apache/airflow/actions/runs/9383343196/job/25837180680?pr=40050#step:7:3547 |
Hmm yeah, I'd guess this has something to do with the flakiness with the |
Filed a PR that simply removes the inconsistent test case for |
@kahlstrm there are some issues with this code now that i'm looking into. But here's a question for you. Why should we check served logs when the task is up for retry? I think it's reasonable to assume that when the task is in a running state, that the logs are where they are going to be -- either shared volume, or remote storage. |
The issue we encountered is that when the state of the task instance state was up for retry, i.e. previous attempt had failed, the logs for the previous attempt weren't available for the time period when the task instance was in up for retry state. Example: Attempt 1 fails for some reason which is easiest visible in the logs. New Task instance is marked up for retry. During the time task instance is in up for retry state, the previous logs for the previous task attempts are not being fetched. The way to access the logs for that period of time is to ssh into the worker and find the log file. Does this use case make sense? We started only observing these problems after upgrading to 2.7.0 and higher. |
This doesn't sound right. When a task fails (whether there are remaing retries or not), the logs should be wherever they are going to be. Does your webserver have access to the shared volume? |
This PR is a continuation of #39177, as while testing it noticed that when the task's state is in
TaskInstanceState.UP_FOR_RETRY
, the logs for all attempts is unavailable, when the source is served logs.Also noticed that the tests previously only covered the
TaskInstanceState.RUNNING
situation, whereas there was alsoTaskInstanceState.DEFERRED
(and nowTaskInstanceState.UP_FOR_RETRY
). Changed the test to cover all three cases.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.