Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging from all containers in KubernetesOperatorPod #31663

Merged
merged 15 commits into from
Jul 6, 2023

Conversation

amoghrajesh
Copy link
Contributor

closes: #27282

This PR adds support to get logs from all the containers in a pod and publish them in the airflow logging instead of always getting the base container logs.
A new option container_logging has been added which works in conjunction with get_logs option. ie. if get_logs isn't set, container_logging will not be checked.

The flag takes the following values:

list[str]: A list of container names
str: One single container name
Literal[True]: Log everything
and make the argument default to BASE_CONTAINER_NAME.

This was a community suggestion here: https://github.com/apache/airflow/pull/28981/files#r1071928325


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:providers labels Jun 1, 2023
@amoghrajesh
Copy link
Contributor Author

@uranusjr @potiuk @pulquero @dstandish looping you in for reviews on this PR as you were reviewers to the one that went stale

@@ -270,6 +274,7 @@ def __init__(
reattach_on_restart: bool = True,
startup_timeout_seconds: int = 120,
get_logs: bool = True,
container_logs: list[str] | str | bool = BASE_CONTAINER_NAME,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this accept False? If not, this should be Literal[True] instead of bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yeah it doesn't accept False. Only True

@amoghrajesh amoghrajesh requested a review from uranusjr June 1, 2023 09:49
container_logs,
pod.metadata.name,
)
elif isinstance(container_logs, bool):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to keep this check as bool so that we can reject/filter out invalid/unsupported types too here https://github.com/apache/airflow/pull/31663/files#diff-6900da9281d8404b14da0815f2b37350f3148b1b63449928b952744e6711e7e7R475-R478
@uranusjr

@amoghrajesh
Copy link
Contributor Author

@uranusjr @pulquero The PR is in a reviewable state. Requesting a review when you have some time

@potiuk
Copy link
Member

potiuk commented Jun 4, 2023

docs need fixing still

@pulquero
Copy link

pulquero commented Jun 5, 2023

From my perspective, it looks like it will do the job, thanks.

Comment on lines +456 to +470
else:
self.log.error(
"False is not a valid value for container_logs",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this branch needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need it since we support only Literal[True] when it comes to boolean. Refer to my unit tests. It adds False too as an invalid test case

@amoghrajesh amoghrajesh requested a review from vincbeck June 6, 2023 04:51
@JeremieDoctrine
Copy link

I was looking for this feature and stumbled upon this pull request.

Will this work also with init containers?

@amoghrajesh
Copy link
Contributor Author

Hi @JeremieDoctrine, yes it should also capture the initContainers logs in my opinion

@amoghrajesh
Copy link
Contributor Author

@uranusjr @jedcunningham may i request for a review on this pull request when you have some time?

@amoghrajesh
Copy link
Contributor Author

@hussein-awala @potiuk @jedcunningham @uranusjr following up for a review on this PR. Can you have a look when you have some time?

time.sleep(1)
return terminated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any case where terminated will be false? I fail to see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not supposed to go into the false state. The method is awaiting completion of the container. ie. It will loop until the container is completed. Let me know if it is not clear enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean - why do we need to return the bool ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will always be True right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. Makes sense to revert it back to returning None.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits/questions.

@amoghrajesh amoghrajesh requested a review from potiuk June 26, 2023 04:17
@amoghrajesh
Copy link
Contributor Author

@potiuk just pushed a patch to fix the comments. Have a look when you have some time :)

Comment on lines 587 to 593
containers = []
pod_info = self.read_pod(pod)
for container_spec in pod_info.spec.containers:
if container_spec.name != ContainerNames.XCOM_CONTAINER:
containers.append(container_spec.name)

return containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a simpler equivalent

Suggested change
containers = []
pod_info = self.read_pod(pod)
for container_spec in pod_info.spec.containers:
if container_spec.name != ContainerNames.XCOM_CONTAINER:
containers.append(container_spec.name)
return containers
pod_info = self.read_pod(pod)
return [
container_spec.name
for container_spec in pod_info.spec.containers
if container_spec.name != ContainerNames.XCOM_CONTAINER
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Made the changes here

@@ -398,14 +417,78 @@ def consume_logs(
)
time.sleep(1)

def fetch_requested_container_logs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this entire function can be transformed into a generator function instead and use yield to get rid of all the append calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, it is simpler to control the flow when we have these append calls. I would like to keep it this way if that is ok by you

@uranusjr
Copy link
Member

A couple of comments to improve the code, but they don’t change the overall logic and should be non-blocking.

@amoghrajesh
Copy link
Contributor Author

@potiuk @uranusjr can i get another pass at this PR? I think it is ready to be reviewed from my end

airflow/providers/cncf/kubernetes/operators/pod.py Outdated Show resolved Hide resolved
@@ -412,14 +431,78 @@ def consume_logs(
)
time.sleep(1)

def fetch_requested_container_logs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to use the provided post_termination_timeout in this method as we did with fetch_container_logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I quite understand this comment. We are calling the fetch_container_logs from fetch_requested_container_logs which will take care of post_termination_timeout right?

Comment on lines 79 to 84
class ContainerNames:
"""Possible container names for airflow."""

XCOM_CONTAINER = "airflow-xcom-sidecar"


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand the need of this class, this value is already available in PodDefaults.SIDECAR_CONTAINER_NAME which we use in the other methods, it's better to use it in case we change its name in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I had missed seeing this. I will undo this change and reuse PodDefaults

@amoghrajesh
Copy link
Contributor Author

@hussein-awala can you take a look at this PR again when you have some time?

@amoghrajesh
Copy link
Contributor Author

@hussein-awala @uranusjr following up for a final round of review on this PR 🙂

Want to land it before the branch cut today if possible..

@amoghrajesh
Copy link
Contributor Author

@uranusjr do we need more approvals on this one or can we proceed and merge it?

@uranusjr
Copy link
Member

uranusjr commented Jul 6, 2023

At least one more (or if @vincbeck takes another look to make sure changes made after their approval are good)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KubernetesPodOperator: Option to show logs from all containers in a pod
7 participants