Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECSOperator realtime logging #17626

Merged
merged 2 commits into from
Sep 18, 2021

Conversation

pavelhlushchanka
Copy link
Contributor

Closes #16753.

The idea was to start a thread that would fetch Cloudwatch logs using a fetch interval once the ECS task is started.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Aug 15, 2021
@pavelhlushchanka pavelhlushchanka force-pushed the ecs_operator_realtime_logging branch from 3cb075a to a8879f4 Compare September 16, 2021 15:43
try:
yield from self.hook.get_log_events(self.log_group, self.log_stream_name, skip=skip)
except ClientError as error:
if error.response['Error']['Code'] != 'ResourceNotFoundException':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceNotFoundException is usually the flag when ECS fails to provision tasks due to its own reasons (#15000 (comment)). If it's getting ignored here, how does Airflow react to such fail-to-start ECS tasks? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachliu that's a good question. Now i realise that this case is not handled at all. And looks like handling of fail-to-start was implicitly relied the availability of the log stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachliu i can only guess what describe_tasks returns in case of the edge case (#15000), but i guess if arn is presented and task is not started, then there should be either empty list of tasks in response and it's handled and the proper error is thrown or there is a non empty list of tasks and their status is handled by current logic. The missing part of handling empty list of tasks can be added and i think it's more clear than relying on cloudwatch logs.

Copy link
Contributor

@zachliu zachliu Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, describe_tasks polling was also recommended by aws support engineers. But it may get a bit tricky when a task is supposed to be short-living.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, i can prepare a small pr with a check for existence of the log group of the task. Just to restore previous behaviour with better error message. Like if the task is stopped, logging is enabled and there is no log group, then we throw the error. But then we need to exclude the provider from release @potiuk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you sire! this thing has been under my radar for a while but i never got time to actually do it 😛

@zachliu
Copy link
Contributor

zachliu commented Oct 6, 2021

@codenamestif i'm testing out this feature in our integration env right now. IT IS AWESOME!
i'll soon move it to production so that we can really see #18733 in action!
thank you! 👍

@potiuk
Copy link
Member

potiuk commented Oct 7, 2021

New provider's release is coming for amazon - waiting for 2 more PRs :)

@zachliu
Copy link
Contributor

zachliu commented Oct 7, 2021

@potiuk out of curiosity, which 2 PRs?

@potiuk
Copy link
Member

potiuk commented Oct 7, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Realtime ECS logging
4 participants