Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query error logging #11519

Merged
merged 5 commits into from
Aug 5, 2021
Merged

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Jul 30, 2021

Improve query error logging

Description

When adding and developing new query functionality (such as adding a new Aggregator), it may be hard to debug due to limited logging information. This PR change the log level of logging query exception (in QueryLifecycle) from WARN to ERROR to make the Exception easier to find and identify. This PR also adds a query context enableQueryDebugging that can make the above error logging (in QueryLifecycle) to include the stack trace of the Exception. The new context, enableQueryDebugging, would not only add the Exception stracktrace but can include other current and feature functionality that would help user debug when query doesn’t work. It would include things like additional metrics and logs (not part of this PR but will create followup PRs). This one config would let us avoid add many many other configs keys by rolling up all related query debugging functionalities into one easy to remember and use queryContext.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@abhishekagarwal87
Copy link
Contributor

@maytasm can you add a more generic flag such as enableDebugLog or enableVerboseLogging etc.? this way we can use the same flag for similar situations. we could end up with too many flags otherwise. what do you think?

@maytasm
Copy link
Contributor Author

maytasm commented Jul 30, 2021

@maytasm can you add a more generic flag such as enableDebugLog or enableVerboseLogging etc.? this way we can use the same flag for similar situations. we could end up with too many flags otherwise. what do you think?

That might be more confusing. A generic flag might not be obvious what the flag is for. For example, normally if you think about wanting debug log or more verbose logging you would change your log level (in your log file) to DEBUG level or TRACE level. So now the user might be confuse on what is the difference between flag such as enableDebugLog or enableVerboseLogging vs. changing log level. Also, you may require different detail logs for different scenario. For example, as someone developing a custom extension with new aggregator vs as a druid operator. A generic flag such as enableDebugLog or enableVerboseLogging that emits details log for one may not be useful at all for the other.

@abhishekagarwal87
Copy link
Contributor

we can also name the flag as queryLogLevel (this could have values such as TRACE, DEBUG, INFO) etc. This will give you as a developer or as a user to log more details in the future without introducing another flag for each code branch.

@maytasm
Copy link
Contributor Author

maytasm commented Jul 30, 2021

we can also name the flag as queryLogLevel (this could have values such as TRACE, DEBUG, INFO) etc. This will give you as a developer or as a user to log more details in the future without introducing another flag for each code branch.

How does that differs from the current use of log.warn(), log.debug(), log.trace() etc? You can change the log level for specific class in the log4j file already.

Hmm.. actually I can just add an additional log.debug call to log the stack exception and remove the flag

@abhishekagarwal87
Copy link
Contributor

How does that differs from the current use of log.warn(), log.debug(), log.trace() etc? You can change the log level for specific class in the log4j file already.

Hmm.. actually I can just add an additional log.debug call to log the stack exception and remove the flag

are you trying to set log level for a particular query run or for all all the queries? Former may be useful when you don't want the logs to be too noisy but still get the ability to troubleshoot an individual query. a change in log4j setting will apply to all the queries and require a restart of the service.

@maytasm
Copy link
Contributor Author

maytasm commented Jul 30, 2021

How does that differs from the current use of log.warn(), log.debug(), log.trace() etc? You can change the log level for specific class in the log4j file already.
Hmm.. actually I can just add an additional log.debug call to log the stack exception and remove the flag

are you trying to set log level for a particular query run or for all all the queries? Former may be useful when you don't want the logs to be too noisy but still get the ability to troubleshoot an individual query. a change in log4j setting will apply to all the queries and require a restart of the service.

That's a good point. I imagine this would be used for debugging during development / testing. Maybe it is not something that will change often and does not needs a query context?

@abhishekagarwal87
Copy link
Contributor

@maytasm In that case, I too will recommend logging error inside log.debug call.

@maytasm
Copy link
Contributor Author

maytasm commented Aug 4, 2021

@abhishekagarwal87
After some more thoughts, I think having this as a queryContext is a better idea. It is much easier to use as the user does not have to restart the cluster. Errors from query may come up unexpectedly (i.e. in a non-dev environment such as in production). Making this a queryContext will allow user to get more debugging information from a failing query faster and easier without having to restart the cluster. I have also follow your suggestion and rename the new queryContext to something more generic. The new context is now enableQueryDebugging, this context would not only add the Exception stracktrace but can include other current and feature functionality that would help user debug when query doesn’t work. It would include things like additional metrics and logs (not part of this PR but will create followup PRs). This one config would let us avoid add many many other configs keys by rolling up all related query debugging functionalities into one easy to remember and use queryContext.

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maytasm The approach sounds good to me. I do have some reservations around the name of the flag itself. here are some suggestions from my side.
name | possible values
verbose | true,false
debug | true,false
debugMode | true,false
enableDebugging | true,false

the second option (debug) is quite common as a flag in CLI tools. so that is what I personally would prefer the most.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag name LGTM and test coverage seems reasonable. I can't think of a suggestion on how to test the logs were added appropriately, so +1 with the current test coverage

@maytasm maytasm merged commit 3257913 into apache:master Aug 5, 2021
@maytasm maytasm deleted the improve_query_error_logging branch August 5, 2021 15:51
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants