Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-172] --quiet flag for run-operations #3451

Closed
jaypeedevlin opened this issue Jun 9, 2021 · 18 comments · Fixed by #4701
Closed

[CT-172] --quiet flag for run-operations #3451

jaypeedevlin opened this issue Jun 9, 2021 · 18 comments · Fixed by #4701
Labels
cli enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors jira

Comments

@jaypeedevlin
Copy link
Contributor

Describe the feature

Currently, when you run run-operation you get the output of your macro after a few lines of text, for example:

Running with dbt=0.19.1
[WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- seeds
- snapshots

{macro output here}

I'm proposing a --quiet flag for run-operation that would remove the Running with dbt and any other warnings, leaving just the output of the macro.

Describe alternatives you've considered

I've previously wrapped calls to run-operation in bash script which trims the first line from the output, however when warnings are triggered (like the example above) it's hard to tell in advance how many lines to trim.

Who will this benefit?

Anyone who uses the codegen macro (and similar macros) outside of dbt cloud

Are you interested in contributing this feature?

Yes — I've poked around the logger code but couldn't see a straightforward way of doing this, but I'm open to trying with some input!

@jaypeedevlin jaypeedevlin added enhancement New feature or request triage labels Jun 9, 2021
@jtcohen6
Copy link
Contributor

@jaypeedevlin This makes sense to me. I know this has been a challenge when using the codegen package in particular, and I feel more comfortable with this answer than with Jinja-based file generation (#3428).

It sounds like the ask is: Everything info-level that would normally log to stdout, should be ignored in stdout and log to logs/dbt.log instead, except for user-defined Jinja macros that call {{ log(..., info = true) }}. Does that sound right to you?

I don't know exactly how this ought to work, beyond that it will require some changes to core/dbt/logger.py. Here's one idea, which may be over-engineering it:

  • We define a custom logging level, LOUD (35), that's between WARNING and ERROR. (I understand it can be tricky to define custom logging levels, so that may be easier said than done.)
  • When the --quiet flag is passed, the output log level is set to LOUD (35) instead of INFO (10). Any errors or critical messages (>35) will still print to stdout, but any warnings or info messages (including logger.info) will not.
  • When the --quiet flag is passed, the Jinja log() context method (defined below) should use logger.loud() instead of logger.info(), so that its messages still land in stdout:

https://github.com/fishtown-analytics/dbt/blob/14507a283e6acd710363aab9fc8fbcac1658002d/core/dbt/context/base.py#L430-L449

That's just one idea. I'd curious to hear from other members of the Core team, who likely have better ones.

@jtcohen6 jtcohen6 added cli and removed triage labels Jun 15, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 1, 2021

This is significantly easier than previously I thought, since I realized we already do it for list:

https://github.com/dbt-labs/dbt/blob/b9d5123aa37786425f7429c91f4513839eea6a65/core/dbt/logger.py#L506-L510

If the --quiet flag is passed, the run-operation task need only include code like:

https://github.com/dbt-labs/dbt/blob/b9d5123aa37786425f7429c91f4513839eea6a65/core/dbt/task/list.py#L59-L63

This will still require setting the Jinja context log.info to a higher level, but it feels like the right thread to pull on.

Going to call this one a good first issue!

@jtcohen6 jtcohen6 added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Jul 1, 2021
@jaypeedevlin
Copy link
Contributor Author

If it's done by default for list, is there an argument to do it by default for run-operation?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 1, 2021

@jaypeedevlin I definitely see merit in that argument. Some run-operations are more like quasi-tasks (thinking about redshift.vacuum or dbt_external_tables.stage_external_sources), where it might be nice to include prefatory matter like the dbt version, but I don't think there's any functional reason it needs to be there—whereas piping stdout to jq/file/etc represents a real functional gain.

All in all, the code in list can serve as a model here, though it's not going to be an exact replica of what we're after. Deprecations and other warnings still show up if _output_handler.level = logbook.WARNING . So I think something like my thought above, defining a LOUD level between WARNING and ERROR, will still be necessary. The most important thing will be finding a way to treat user-supplied, Jinja-defined info logging as "loud." This won't accomplish its goal if it swallows all log output.

@vergenzt
Copy link

vergenzt commented Jul 2, 2021

Is there a reason why dbt log messages go to stdout by default instead of always to stderr?

The convention I'd expect from a shell utility command is for "log" messages to always go to stderr and for there to be a separate call used to explicitly write to stdout (maybe print?). You could even limit the print function to only be in the Jinja context during a run-operation command, since it seems like it'd be a bit weird to be allowed to "print" anything during a dbt model or test run.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 2, 2021

@vergenzt That's effectively the function of log.info—things that dbt wants to print to stdout. By routing everything through a centralized logger, , something like this (and #3440) is more possible.

One complication to my proposal above: logbook doesn't support custom log levels, but it does support custom log processors. That's a decidedly better way to go about this, by tagging user-supplied Jinja log input with a tag that allows it to go through, while suppressing all other logging.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Dec 30, 2021
@jaypeedevlin
Copy link
Contributor Author

Commenting to leave this open, I'd like to take a pass at this in the new year.

@github-actions github-actions bot removed the stale Issues that have gone stale label Jan 1, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 3, 2022

@jaypeedevlin Very cool :)

Since we reworked logging in v1, most of my comments above are no longer relevant! In fact, I'm hopeful it will be much easier to plumb through the logic that sets log levels

@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed good_first_issue Straightforward + self-contained changes, good for new contributors! labels Feb 2, 2022
@jaypeedevlin
Copy link
Contributor Author

Hey there @jtcohen6 — I've started working on this one with @ehmartens, and we could use your help (or someone elses if there's someone better placed to assist!). Here's what we discovered today:

There are three 'portions' of potential output from a run-operation that doesn't error:

  1. The line Running with dbt=1.0.1: INFO Level
  2. Any warnings (eg unused configuration paths): WARN Level
  3. The output of the macro: INFO Level

Our aim is to display only 3 and any additional errors.

Our first approach was to borrow this classmethod from the list command which overrides the logging level. Unfortunately that is designed to suppress info, so we got just the warning. We want a way to suppress the INFO in 1, any WARNINGs but include the INFO from 3.

We thought if there was some way to set the initial log level to ERROR and then raise it to INFO just before the macro runs it would give us what we wanted (and then we could work on making it conditional on an arg/flag), but we're not sure how to go about that.

Any guidance you (or anyone else) could provide to steer us in the right direction would be welcome!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 4, 2022

@jaypeedevlin @ehmartens Hooray!

I do think this will be significantly easier in the new logging system, which has a README :)

It should give us much more flexibility than with the old system, such that we should also be able to find a better way than how the list task has done this in the past. I think we can piggy-back off the approach @emmyoop took for --log-cache-events over in #4369 — and if Emily's down to help, she could probably advise on an even better approach.

Here's what I'm thinking:

  • A new flag in main.py + flags, --quiet (following the example of --debug or --log-cache-events)
  • A new base event type (= a new class) named Loud()
  • The event type MacroEventInfo adds a subclass of Loud
  • When flags.QUIET, the fire_event function should eat all stdout logs, unless they're isinstance(e, Loud) or e.level_tag() == 'error' — I wonder if that can happen in the same spot where it checks the --debug flag and eats debug-level logs?

I took a quick crack at that, and ended up with:

$ dbt run-operation let_the_macro_do_the_talking
13:28:07  Running with dbt=1.0.1
13:28:07  [WARNING]: Deprecated functionality
The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
13:28:07  [WARNING]: Deprecated functionality
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
13:28:07  This is my custom log message! Wahoo!
$ dbt --quiet run-operation let_the_macro_do_the_talking
13:28:29  This is my custom log message! Wahoo!

Note this will include all info-level log events fired via the dbt-Jinja {{ log() }} macro — including any fired from materializations, global macros, etc. We struck up an issue to think about ways to offer more power + precision to user-space logging: #4184

@emmyoop
Copy link
Member

emmyoop commented Feb 4, 2022

It's exciting just how much easier this is with the new logging system!

This approach seems spot on but happy to help out if there are any more questions.

Small note: I don't love the class name of Loud. It seems counterintuitive that when you flag for quiet the Loud class is all that's written out. But I could be convinced otherwise.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 4, 2022

Small note: I don't love the class name of Loud. It seems counterintuitive that when you flag for quiet the Loud class is all that's written out. But I could be convinced otherwise.

Heh, super legit note - I was having a bit too much fun on a Friday. (NotQuiet, CannotBeSilenced, ...?)

@emmyoop
Copy link
Member

emmyoop commented Feb 4, 2022

After giving this a bit more thought I'm thinking we may want to go down the path of a new log level. I'd like to think through the implications of it a bit more before saying we should definitely go that route. But wanted to put a note here so @jaypeedevlin and @ehmartens don't start down the Loud path unnecessarily (yet).

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@jaypeedevlin and @ehmartens after giving this a lot more thought a new log level is not the right way to go. Please correct me if there's something I'm missing but the heart of this issue is that you want to be able to see just the output generated via dbt run-operation. The logs are just noise here.

With that in mind, the basic approach that @jtcohen6 laid out with a flag and new base event class is spot on with one minor tweak. Since a user doesn't know that the output of run-operation is generated with logging (and generally doesn't really need to) let's use a different flag and class name. Also this flag should only apply to --run-operation.

So pulling from the basic outline above:

  1. A new flag in main.py. With this approach of only adding it for the run-operation command, you will need to add it as a command in main.py instead of flags.py. The flag can be called --output.

The new flag will need to be defined as run_operation_sub and moved up with the other subparser definitions on line 1111. So this:

_build_run_operation_subparser(subs, base_subparser)

should become

run_operation_sub = _build_run_operation_subparser(subs, base_subparser) 

Then there needs to be a new function following the pattern of _add_selection_arguments so that the new --output flag can be added to the run_operation_sub. The ensures we only add it to the command we want and not globally.

  1. A new base event type (= a new class) named MacroOutput()
  2. The event type MacroEventInfo adds a subclass of MacroOutput
  3. When flags.OUTPUT, the fire_event function should eat all stdout logs, unless they're isinstance(e, MacroOutput) or e.level_tag() == 'error'

Please let me know if you have any questions about this approach! If you don't think the name of the flag is intuitive I'm happy to start a conversation about it as well.

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

@jaypeedevlin and @ehmartens the team and I have come up with an even more robust solution that will fix a few different situations. While my previous solution would work, this is better long term. #4687 recently requested the ability to silence logs to stdout as well.

The general solution is two fold.

First, add a flag --quiet whose sole purpose is to silence logs to stdout. This will be a global flag that is added to flags.py and main.py (following the example of --debug or --log-cache-events). In this new context where it's only purpose is to not log anything to stdout, --quiet makes more sense as the flag name.

No new event base classes need to be created with this route either. We can leverage the already existing NoStdOut.

You will need to modify the below code to swallow the log for flags.QUIET similar to what we do for flags.DEBUG on line 352.

if not isinstance(e, NoStdOut):
# explicitly checking the debug flag here so that potentially expensive-to-construct
# log messages are not constructed if debug messages are never shown.
if e.level_tag() == 'debug' and not flags.DEBUG:
return # eat the message in case it was one of the expensive ones
log_line = create_log_line(e)
if log_line:
if not isinstance(e, ShowException):
send_to_logger(STDOUT_LOG, level_tag=e.level_tag(), log_line=log_line)
else:
send_exc_to_logger(
STDOUT_LOG,
level_tag=e.level_tag(),
log_line=log_line,
exc_info=e.exc_info,
stack_info=e.stack_info,
extra=e.extra
)

Second, let's add a {{ print() }} function to macros! It will be very similar to {{ log() }}. The difference being that print is not a log statement, it will just print to stdout.

To do this, you can copy the approach we have for log. It should always just print to stdout.

@contextmember
@staticmethod
def log(msg: str, info: bool = False) -> str:
"""Logs a line to either the log file or stdout.
:param msg: The message to log
:param info: If `False`, write to the log file. If `True`, write to
both the log file and stdout.
> macros/my_log_macro.sql
{% macro some_macro(arg1, arg2) %}
{{ log("Running some_macro: " ~ arg1 ~ ", " ~ arg2) }}
{% endmacro %}"
"""
if info:
fire_event(MacroEventInfo(msg=msg))
else:
fire_event(MacroEventDebug(msg=msg))
return ''

These two changes (and a change to codegen to use this new print statement) would result in dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "database_name": "raw"}' to print to stdout any print statements in generate_source but no logs. Logs would however still be written to the file.

@emmyoop
Copy link
Member

emmyoop commented Feb 7, 2022

Flagging @joellabes so he's aware we're going to add a print statement to macros that will allow printing to stdout, unrelated to logging.

@emmyoop emmyoop closed this as completed Feb 7, 2022
@emmyoop emmyoop reopened this Feb 7, 2022
@emmyoop emmyoop added the jira label Feb 7, 2022
@github-actions github-actions bot changed the title --quiet flag for run-operations [CT-172] --quiet flag for run-operations Feb 7, 2022
@joellabes
Copy link
Contributor

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors jira
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants