Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicated logs when running a pipeline #26

Merged
2 commits merged into from
Jul 6, 2023
Merged

Conversation

rachel-datatonic
Copy link
Member

@rachel-datatonic rachel-datatonic commented Jun 26, 2023

Description

1. Bug description

When we try to run a pipeline, for example make run pipeline=training, we noticed that there are duplicated lines of log showing up

image

2. Root cause analysis

In google.cloud/aiplatform modules:

base.py: lib/python3.7/site-packages/google/cloud/aiplatform/base.py

class Logger:
    ...
        self._logger.setLevel(logging.INFO)

pipeline_jobs.py: lib/python3.7/site-packages/google/cloud/aiplatform/pipeline_jobs.py

#line 48
_LOGGER = base.Logger(__name__)

#line 430
_LOGGER.info("View Pipeline Job:\n%s" % self._dashboard_uri())

In pipelines modules:

main.py: pipelines/trigger/main.py

#line 194
logging.basicConfig(level=logging.DEBUG)

When a base function has already set the logging level to logging.INFO and a child function calls logging.basicConfig(level=logging.DEBUG), it causes duplicated log entries with both INFO and DEBUG levels. This is because basicConfig configures the root logger, so the child function's call to basicConfig will not affect the already-configured logger in the base function. As a result, any log entries generated in the base function with INFO level will still be logged even if the child function has set the level to DEBUG.

To avoid this issue, it is recommended to configure logging only once at the start of the program and avoid calling basicConfig in child functions. This ensures that all loggers share the same configurations and prevent duplicated logs with different levels. In addition, it is important to choose an appropriate logging level for each module or function to ensure that the logs are relevant and not too verbose.

In order to set the logging level to DEBUG in the child function (sandbox_run) without causing duplicated logs with INFO level from the base function, we can create a logger instance in the child function and set its logging level to DEBUG. For example:

import logging

def sandbox_run(args: List[str] = None) -> aiplatform.PipelineJob:

    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    logger.debug('This is a debug message')

How has this been tested?

  • make test-all-components
  • make run pipeline=training
    Logs are no longer duplicated
image
  • make run pipeline=prediction
image

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have successfully run the E2E tests, and have included the links to the pipeline runs below
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated any relevant documentation to reflect my changes
  • I have assigned a reviewer and messaged them

Pipeline run links:

@rachel-datatonic
Copy link
Member Author

/gcbrun

@rachel-datatonic rachel-datatonic requested review from felix-datatonic and a user June 27, 2023 10:02
@rachel-datatonic rachel-datatonic marked this pull request as ready for review June 27, 2023 10:04
Comment on lines -194 to -195
logging.basicConfig(level=logging.DEBUG)

Copy link
Collaborator

@felix-datatonic felix-datatonic Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also do:

logging.getLogger(__name__).setLevel(logging.DEBUG)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug is overkill I think, probably best to just remove the logging config from here

@ghost ghost merged commit b9e3134 into develop Jul 6, 2023
@felix-datatonic felix-datatonic deleted the fix/duplicated-logs branch November 13, 2023 17:10
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants