Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SelfDiagnostics] Add new public logging APIs for self diagnostics #1529

Closed
wants to merge 11 commits into from

Conversation

xiang17
Copy link
Contributor

@xiang17 xiang17 commented Nov 12, 2020

Asking community review on a new set of logging API signatures as described in key feature#1 in issue #1258

My intention is to migrate all existing self-diagnostics-related logging methods (such as OpenTelemetrySdkEventSource.Log.SpanProcessorException, OpenTelemetryProtocolExporterEventSource.Log.FailedToReachCollector) to this new set of logging APIs once finalized and approved. And all future need for self diagnostics hopefully could use this set of logging APIs, rather than creating yet another custom subclass of EventSource.

For example, right now OpenTelemetry.Exporter.Zipkin.ZipkinExporter class calls methods in a custom ZipkinExporterEventSource class:

ZipkinExporterEventSource.Log.FailedExport(ex);

With the new APIs, no new custom EventSource class are needed, it calls:

OpenTelemetry.SelfDiagnostics.Logger.LogError(typeof(ZipkinExporter), ex, "Failed to export activities");

Changes

Added a new class SelfDiagnostics in OpenTelemetry.Api.csproj, which includes a list of internal logging APIs: LogCritical, LogError, LogWarning, LogInformation and LogVerbose.

These methods call OpenTelemetryApiEventSource.Log to emit EventSource events.

For significant contributions please make sure you have completed the following items:

@xiang17 xiang17 requested a review from a team November 12, 2020 23:38
ERROR: Exception in Command Processing for EventSource OpenTelemetry-Api: Multiple definitions for string "event_EmitCriticalEvent".
@codecov
Copy link

codecov bot commented Nov 13, 2020

Codecov Report

Merging #1529 (3cb8c3d) into main (efe2d96) will decrease coverage by 0.67%.
The diff coverage is 28.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1529      +/-   ##
==========================================
- Coverage   83.61%   82.94%   -0.68%     
==========================================
  Files         192      193       +1     
  Lines        6172     6245      +73     
==========================================
+ Hits         5161     5180      +19     
- Misses       1011     1065      +54     
Impacted Files Coverage Δ
src/OpenTelemetry.Api/SelfDiagnostics.cs 0.00% <0.00%> (ø)
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 55.95% <40.00%> (-26.41%) ⬇️
src/OpenTelemetry/Sdk.cs 100.00% <100.00%> (ø)

@reyang
Copy link
Member

reyang commented Nov 13, 2020

@xiang17 would you post some perf measurements here?

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cijothomas
Copy link
Member

The new simplified API wrapper on top on EventSource sounds like a good thing. I have one question:
Currently, each project uses own EventSource. When using tools like perfview, its easy to turn on each component individually. In this proposed API, we only have a global ON/OFF as opposed to having the ability to On/Off individual components?
Also, any way to avoid passing TypeOf(caller) as 1st argument, and automatically gather it?

@xiang17
Copy link
Contributor Author

xiang17 commented Nov 18, 2020

The new simplified API wrapper on top on EventSource sounds like a good thing. I have one question:
Currently, each project uses own EventSource. When using tools like perfview, its easy to turn on each component individually. In this proposed API, we only have a global ON/OFF as opposed to having the ability to On/Off individual components?

What's the scenario of turning on/off for individual components? From my experience, when there's an issue, it's easier to ask the customer to reproduce the issue and collect all data than to ask the the same steps done repeatedly for different combinations of components.

Also, any way to avoid passing TypeOf(caller) as 1st argument, and automatically gather it?

I had the same hesitant on this. Another approach I had was having a LoggerFactory and asking each class to initialize a logger object like SLF4J. The downside is we'd have one logger instance for each class where logger is needed, compared to only one logger instance globally in the current approach.

Considering the goal is to make it a bare-bones logger infrastructure that works in most cases, I used the current approach. What do you think?

@cijothomas
Copy link
Member

What's the scenario of turning on/off for individual components? From my experience, when there's an issue, it's easier to ask the customer to reproduce the issue and collect all data than to ask the the same steps done repeatedly for different combinations of components.

One scenario is "finding the needle from the hay" - If every component is using separate eventsourcename, and separate event itself, then its easy to spot in tools like PerfView. But this is not a hard requirement :)

@cijothomas
Copy link
Member

@xiang17 While I generally approve of this approach, I'd be a bit cautious about introducing a set of new self-logging API with less than a week to GA.
The self-diagnostics feature is now capable of being used as is.
We need to postpone the new self-logging API until after 1.0.0 is released. This gives us more time to experiment with the exact shape/params of the API.

I'll mark this is post-ga and won't merge. Thanks a lot of making this extremely valuable feature.

# Conflicts:
#	src/OpenTelemetry.Api/.publicApi/net452/PublicAPI.Unshipped.txt
#	src/OpenTelemetry.Api/.publicApi/netstandard2.0/PublicAPI.Unshipped.txt
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 28, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2022

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants