Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default_logging.yml and logging.yml to have more sensible default #3687

Open
Tracked by #3591
noklam opened this issue Mar 7, 2024 · 2 comments
Open
Tracked by #3591

Comments

@noklam
Copy link
Contributor

noklam commented Mar 7, 2024

Split out from #3591

Context

I did a demo a while ago showing how frustrating it is to try to change logging level.
With #3446 and this ticket, it will make customise logging easier for our users.

Problem

https://github.com/kedro-org/kedro/blob/da709d4316c141c5a7d6f676a87a5752807b33f4/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/conf/logging.yml

There are many level: INFO settings in the template, one may expect changing them to see more verbose logging. The consequence is that you need to change multiple INFO to DEBUG in order to see the DEBUG level message.
So we basically provide a knob that doesn't change anything (technically it does, but it's most likely not what our user need, and for advance users they can figure out how to do advance filtering)

Proposal

handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: ext://sys.stdout

  1. Remove line 14, which is unnecessary and make it's harder to use logging.yml

#3446 (comment)

+1 on setting the default level of the Kedro logger to INFO (if just for backwards compatibility), and then having -q set it to WARNING and -qq to ERROR
I don't think the current logging.yml logic is the only way to achieve that though
I'm also ambivalent on whether we should change the global logging level to INFO
-1 on keeping the current logging.yml logic - whoever wants fine grained control of logs, file logging, rotation etc should be using journald, supervisor, Datadog, or whatever other solution. this is not Kedro's responsibility @astrojuanlu

  1. How to customise Kedro or other packages logging level
    Use case: As a plugin developer, I want to see my logging in kedro project during kedro run

If we do 1., this will be basically adding addition logger in loggers section, but there is also a problem how plugins can do this easily or maybe it should be done at the package level. This can actually solved by #3591, advance settings will remains the same, which is adding a new loggers or setting this with package level logging.

I don't have a better solution than the current one yet. Here are things that we know:

@noklam
Copy link
Contributor Author

noklam commented Mar 7, 2024

-1 on keeping the current logging.yml logic - whoever wants fine grained control of logs, file logging, rotation etc should be using journald, supervisor, Datadog, or whatever other solution. this is not Kedro's responsibility

@astrojuanlu While I agree it's probably not what Kedro should do, it does helps the developing experience, alternatively we will need some kind of progress bar as that's why Kedro INFO log are doing roughly. Plus I don't see a big problem keeping logging.yml, is there any major benefit moving away from logging.yml? Changing logging.yml is easier we can do it in a non-breaking way in 0.19.x.

@astrojuanlu
Copy link
Member

astrojuanlu commented Jun 25, 2024

Adding some color to my earlier statements on OpenTelemetry, logging etc:

OpenTelemetry seems to be quite mature for traces (as pioneered by OpenTracing), metrics (Prometheus, the former OpenCensus) but not so much for logs. In fact, the client APIs for logging in Python are in development and seemingly unstable:

image

While signals are in development, breaking changes and performance issues MAY occur. Components SHOULD NOT be expected to be feature-complete. In some cases, the signal in Development MAY be discarded and removed entirely. Long-term dependencies SHOULD NOT be taken against signals in Development.

In fact, there seem to be some inconsistencies still.

Looks like good practice nowadays involves having a log collector (Promtail, Fluentd, Logstash, Grafana Agent Alloy) that then send logs to a service (Loki, Elasticsearch).

The dream of having apps just log JSON to stdout is actually spelled in the structlog docs:

Colorful and pretty printed log messages are nice during development when you locally run your code.

However, in production you should emit structured output (like JSON) which is a lot easier to parse by log aggregators.

A simple but powerful approach is to log to unbuffered standard out and let other tools take care of the rest.

That can be your terminal window while developing; it can be systemd redirecting your log entries to syslogd and rotating them using logrotate; or it can be your cluster manager forwarding them to an obscenely expensive log aggregator service.

So I still think that we shouldn't have a too heavy handed approach to logging, but I now have more context on how this is actually achieved, and what to expect from the current ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants