Fix duplicated logs when running a pipeline #26
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
1. Bug description
When we try to run a pipeline, for example
make run pipeline=training
, we noticed that there are duplicated lines of log showing up2. Root cause analysis
In google.cloud/aiplatform modules:
base.py: lib/python3.7/site-packages/google/cloud/aiplatform/base.py
pipeline_jobs.py: lib/python3.7/site-packages/google/cloud/aiplatform/pipeline_jobs.py
In pipelines modules:
main.py: pipelines/trigger/main.py
When a base function has already set the logging level to
logging.INFO
and a child function callslogging.basicConfig(level=logging.DEBUG)
, it causes duplicated log entries with bothINFO
andDEBUG
levels. This is becausebasicConfig
configures the root logger, so the child function's call tobasicConfig
will not affect the already-configured logger in the base function. As a result, any log entries generated in the base function withINFO
level will still be logged even if the child function has set the level toDEBUG
.To avoid this issue, it is recommended to configure logging only once at the start of the program and avoid calling
basicConfig
in child functions. This ensures that all loggers share the same configurations and prevent duplicated logs with different levels. In addition, it is important to choose an appropriate logging level for each module or function to ensure that the logs are relevant and not too verbose.In order to set the logging level to
DEBUG
in the child function (sandbox_run) without causing duplicated logs withINFO
level from the base function, we can create a logger instance in the child function and set its logging level toDEBUG
. For example:How has this been tested?
make test-all-components
make run pipeline=training
Logs are no longer duplicated
make run pipeline=prediction
Checklist
Pipeline run links: