Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rich logging configuration #31205

Closed

Conversation

peytondmurray
Copy link
Contributor

@peytondmurray peytondmurray commented Dec 19, 2022

Why are these changes needed?

This PR adds a new default logging configuration based on rich for Ray's primary logging needs.

Changes

  • Attempts to consolidate logging configuration by introducing reasonable defaults in ray/log.py.
  • This new logging configuration is done once in ray/__init__.py at the top of the module. Subsequent calls to the configuration are ignored.
  • 3 new deprecated arguments to ray.init(): configure_logging, logging_level, and logging_format. All loggers configured in this PR propagate up to the root logger, so whatever settings the user decides to use for the root logger are what the logs generated by ray will be subject to. If the user doesn't like the configurations introduced here, they can change the configuration through the usual python logging mechanisms rather than through ray-specific paths; and, if they want to know more about logging, they only need to look in one place: ray/log.py. For example, if the user wants to suppress everything but CRITICAL messages, they can do
import logging
import ray

logging.getLogger('ray').setLevel(logging.CRITICAL)

to do this.

  • If rich is available, the new logging configuration produces nicely formatted log messages which appear in an organized table, improving readability. This is an example of how the new color and formatting can dramatically improve readability of ray output:

image

Here, log messages are easily distinguished from status information printed to the console; log levels are clearly identified, and appear in the same place in the console no matter the width of the log message; and additional information is now shown, including the ray subpackage (shown in yellow here) as well as the file and line number where the message is emitted.

  • If rich is not available, a plain logging format is used instead, and a new log message is displayed suggesting that the user install rich for nicer output. The user can set the FORCE_PLAIN_LOGGING environment variable to avoid using the rich logger; this is useful for a lot of tests which rely on the content of stdout to determine the outcome of a test.
  • A logger for ray.rllib is configured at the WARN level, to address Revert "Simplify logging configuration. (#30863)" #31858. With this change, Revert "Simplify logging configuration. (#30863)" #31858 can be reverted, again simplifying and consolidating logging configuration.
  • Modified test_output.py::test_logger_config to test only the logger config, not launch a ray cluster. The test was failing intermittently, I think due to a race condition between the launch of the cluster and the reading of the subprocess's stdout, and anyway it wasn't necessary to call ray.init here to check that logging was configured correctly.
  • Modified python/ray/tune/tests/test_commands.py::test_ls_with_cfg to test the underlying data, not what gets printed to stdout (which has changed with the new logging system).

Related issue number

Partially addresses #30005.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@stale
Copy link

stale bot commented Jan 20, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 20, 2023
@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 25, 2023
@peytondmurray peytondmurray force-pushed the global-logging-config branch 6 times, most recently from b9408b4 to 46262ff Compare January 28, 2023 23:14
@peytondmurray
Copy link
Contributor Author

peytondmurray commented Jan 28, 2023

Been working on this for a few days. Some notes about the work so far:

  1. Certain places in the code base log colored output to the terminal. This conflicts with rich output in Jupyter notebooks and clogs up log files. For this reason I added an AnsiStripFormatter which strips ANSI escape codes from formatted log messages when the rich-based ConsoleHandler is used. Probably we should strip these escape codes from the plain formatter is used as well, but I'll wait for input before doing so.
  2. While rich.logging.RichHandler produces lovely formatting for log messages by default, adding one column to the log output for the Ray component name required including a lot of repeated code. Understanding why this code is here is therefore more difficult for someone less familiar with rich. It would be nice if we could make some upstream improvements to rich that make it a bit more modular so we wouldn't need to repeat so much here.
  3. Adding the necessary information about what Ray subpackage a LogRecord is emitted from was really easy - this is what ContextFilter does here. On top of that, we can easily change what subpackage names are displayed with this approach 🚀
  4. Adding a different log level for ray.rllib was very simple; you just make a new entry in the dictConfig and set the appropriate level, no other changes needed.
  5. Logs generated by ray workers are hand to deal with. For example, ray.data logging is particularly tricky because output from ray workers is captured and piped to stdout/stderr, meaning that the user doesn't have the control afforded by python's standard logging mechanisms. If you configure a ray.data logger, then any time a worker runs code that logs a message to that logger the message will be reformatted to add process ID information and then printed to the console - which cramps the log table and bypasses python's logging mechanisms entirely. I think if I were to rework this, I'd suggest logging the output as you normally would rather than using print, using the extra kwarg to supply the extra information about process ID etc, and adding another field in the output formatter to hold the additional info. I think that has the potential to really simplify the code in ray/_private/worker.py, and would give control back to the users about what appears in the logs.
  6. I think we can revert Revert "Simplify logging configuration. (#30863)" #31858 to once again remove a lot of the logging configuration that was happening in various places. As I understand it, that revert was done because the ray.rllib default logging level changed from WARN to INFO when the rllib-specific logging configuration was remove. This PR solves that issue by explicitly setting the log level for the ray.rllib logger to WARN.

I think for now I'm going to stop work on this PR, and introduce additional changes in subsequent PRs, otherwise I'll run the risk of this getting bigger than it already is.

@peytondmurray peytondmurray force-pushed the global-logging-config branch 2 times, most recently from f6baa45 to a1ca333 Compare January 30, 2023 02:32
@xwjiang2010
Copy link
Contributor

I will defer to @rkooo567 for ray logging in general. But I really like the direction this is going.

A few comments on surfacing this to users better:

  1. Can we remove all reference to configure_logging, logging_level, and logging_format in the code since they are going away?
  2. Can we add a section about this new consolidation on driver logs in ray log documentation page? We may want to mention how users are supposed to configure this, as this is mostly interested to them. Your could also add the screenshot to give a little more visuals.

@peytondmurray peytondmurray force-pushed the global-logging-config branch 4 times, most recently from 2360353 to 642e467 Compare February 10, 2023 22:34
@peytondmurray
Copy link
Contributor Author

I will defer to @rkooo567 for ray logging in general. But I really like the direction this is going.

A few comments on surfacing this to users better:

1. Can we remove all reference to `configure_logging`, `logging_level`, and `logging_format` in the code since they are going away?

2. Can we add a section about this new consolidation on driver logs in ray log documentation page? We may want to mention how users are supposed to configure this, as this is mostly interested to them. Your could also add the screenshot to give a little more visuals.

💯 Agree, done for both.

@peytondmurray peytondmurray changed the title [WIP] Add rich logging configuration Add rich logging configuration Feb 10, 2023
@peytondmurray peytondmurray marked this pull request as ready for review February 10, 2023 22:36
@peytondmurray peytondmurray requested a review from a team as a code owner February 10, 2023 22:36
@peytondmurray
Copy link
Contributor Author

At the moment, this logging configuration specifically avoids touching worker logs - any messages produced internally to Ray that pass through a worker will be handled with a StreamHandler before being redirected to the worker log file as usual; then later on, the driver will read this file and print anything there to stdout, just like what Ray currently does.

In the next PR, I'd like to implement a system where logs are passed directly to the driver over the network, so we can do away with this intermediate log file approach that bypasses Python's existing logging machinery.

@peytondmurray peytondmurray changed the title Add rich logging configuration [WIP] Add rich logging configuration Feb 11, 2023
@peytondmurray peytondmurray marked this pull request as draft February 11, 2023 00:38
@peytondmurray peytondmurray changed the title [WIP] Add rich logging configuration Add rich logging configuration Feb 15, 2023
@peytondmurray peytondmurray marked this pull request as ready for review February 15, 2023 00:27
@peytondmurray
Copy link
Contributor Author

@angelinalg
Copy link
Contributor

Made some editing suggestions, hoping to be helpful. Pardon any mangling of the intent.

@peytondmurray
Copy link
Contributor Author

Made some editing suggestions, hoping to be helpful. Pardon any mangling of the intent.

Not at all, this is very helpful - thank you!!

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have several comments regarding doc. But I think we need some alignment before proceeding the PR. (Let's do that in person next week)

  1. We should not deprecate logging config option since they are still needed to configure component logs. Or we should have an alternative (e.g., use env var). We also need API review to deprecate this option.
  2. Actually, I am not sure if we need [Ray Core] (this seems a bit weird to me). I will bring this up internally to get more feedback.
  3. For user logs from workers, I don't think we need to follow ray's logger at all. All the config from this RFC should be applied to the "library" level logging only (e.g., ray, rllib, tune, train, etc.).
  4. It looks like we only have support for rllib. Are we handling other libraries in separate PRs?

doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
python/ray/_private/worker.py Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
doc/source/ray-observability/ray-logging.rst Show resolved Hide resolved
python/ray/log.py Show resolved Hide resolved
@peytondmurray
Copy link
Contributor Author

Wanted to address a few comments before the discussion on Monday:

Have several comments regarding doc. But I think we need some alignment before proceeding the PR. (Let's do that in person next week)

1. We should not deprecate logging config option since they are still needed to configure component logs. Or we should have an alternative (e.g., use env var). We also need API review to deprecate this option.

They are not used to configure component logs here. The logging configuration variables used here call ray._private.ray_logging.setup_logger which does some general logging configuration. Component logs do not use these variables.

2. Actually, I am not sure if we need [Ray Core] (this seems a bit weird to me). I will bring this up internally to get more feedback.

Thanks, I'd appreciate feedback. I like the idea of having each message provide its associated ray component, but others may disagree.

3. For user logs from workers, I don't think we need to follow ray's logger at all. All the config from this RFC should be applied to the "library" level logging only (e.g., ray, rllib, tune, train, etc.).

Yep, that's true. I guess the only thing this PR does related to that is recommend to the users to follow standard python logging practices. I know there are some examples in the docs which run contrary to these practices, but maybe this isn't the place to encourage users in the right direction.

4. It looks like we only have support for rllib. Are we handling other libraries in separate PRs?

We have support for every component of Ray, because any child loggers of ray that don't have handlers, filters, or formatters will silently pass their logs to their parent. So a message logged in ray.data.preprocessors.batch_mapper, for instance, will be passed up to ray.data.preprocessors because no handlers are configured for that logger, and propagate = True by default. The message will similarly be passed to ray.data, and then ray, where it will be handled by the loggers configured in ray.log.py.

So in short we are handling every library of Ray - rllib just gets special treatment because of #31858.

@rkooo567 rkooo567 added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Feb 22, 2023
@rkooo567
Copy link
Contributor

Please remove a tag when it is ready to review!

@peytondmurray
Copy link
Contributor Author

After a discussion with @rkooo567, we decided to split this PR into two: one which implements the initial global logging configuration, and one that introduces rich as a soft dependency. Closing this PR in favor of #32741.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants