Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to create public Logging APIs and log writer for easier Self Diagnostics experience #1258

Closed
xiang17 opened this issue Sep 10, 2020 · 15 comments

Comments

@xiang17
Copy link
Contributor

xiang17 commented Sep 10, 2020

Current Problems

Every project needs to create its own EventSource implementation, most of which is repeated boilerplate code to achieve the same purpose. Currently there are 14 EventSource implementations for emitting errors, yet it's not straight-forward to capture and log these EventSource events in case of need for troubleshooting.

Proposal

To solve these problems, I'm trying to provide a cross-platform logging API and self diagnostics module, which provides two key features:

  1. easy-to-use logging APIs for all OpenTelemetry modules to emit log messages/errors for troubleshooting/self diagnostics.
  2. a self diagnostics module that write the log message to disk, when enabled. When disabled (default), the logging APIs would not write anything to disk.

Logging APIs

The Logging APIs are some generic methods like LogVerbose or LogError, located in a public namespace/class for all OpenTelemetry to call for self diagnostics or troubleshooting. Once it's done, the use of other EventSources can be migrated to this. These Logging APIs still use EventSource as the underlying mechanism.

A sample use of the API I want to make is like this:

OpenTelemetrySelfDiagnostics.Log.LogInfo("Null pointer in exporter.");
OpenTelemetrySelfDiagnostics.Log.LogError("Null pointer in exporter.", exceptionObject);

(The exact namespace is to be figured out later.)

When the Self Diagnostics Module is enabled, the log message including timestamp, error level, class name, error message (and optionally exception stack) would be written to a local file as such:

2020-08-14T20:33:24.4788109Z ERROR OpenTelemetry.Exporter.Zipkin.ZipkinExporter: Null pointer in exporter. ExceptionStack:
...stack details...

Self Diagnostics Module

  1. The self diagnostics module captures and write log messages to disk, when enabled.
  2. It should have minimal overhead to the current process so the user can feel confident to enable it.
  3. It should be accessible to use in most environments/cases. For example, we want to use the least dependencies and/or features of the OS. It doesn't require administrator permissions. It can be enabled/disabled without restarting the process.
  4. Bounded resource usage. For example, it should not consume ever-growing memory or generate an ever-growing file.

Implementation Plan

  1. The logger object sits in a public location as a static variable for everyone to call.
  2. The logger provide generic log methods, which internally emits EventSource events.
  3. Self Diagnostics Module listens to the events with EventListener.
  4. Self Diagnostics Module writes the message in the event to a fix-sized local file at a certain location in a circular way. (Overwrite from beginning once write position reaches the end.)
  5. Self Diagnostics Module checks a configuration file for log file location and file size periodically (for example, every 3 seconds).
  6. By default, the configuration doesn't exist, Self Diagnostics Module will not write to any file. Once the file is created with valid configuration format/values, Self Diagnostics Module will start writing log file according to configuration. After the file is deleted or modified, the log file handle will be updated, either closed (if deleted or modified with invalid configuration) or closed and reopened according to the configuration file modification (if modified with valid configuration).
  7. To optimize write performance, Self Diagnostics Module will utilize Memory-Mapped Files feature instead of directly writing to file.
@TimothyMothra
Copy link
Contributor

Can you elaborate more on # 3 of your plan...

It sounds like the order of operations is this:

  1. Someone calls OpenTelemetrySelfDiagnostics.Log.LogError()
  2. Internally, you emit to ETW
  3. You also subscribe to ETW
  4. Then you log to a file.

I think it would be more efficient to log straight to file when your LogError method is called. (skipping 3).

I think emitting to ETW is still valid.

I think subscribing to ETW is still valid. Some users may want to configure additional event sources to subscribe to.

@ejsmith
Copy link
Contributor

ejsmith commented Sep 18, 2020

I have a big problem with the proposal to create a new logging implementation. This library already takes a bleeding edge dependency on System.Diagnostics.DiagnosticSource and in general is a very modern library. IMO, there is no reason that this library shouldn't also take a dependency on Microsoft.Extensions.Logging.Abstractions and use that for logging which would give you easy integration with .NET Core logging and the generic host as well as ensure best performance since the same logging library is used throughout .NET Core.

@cijothomas
Copy link
Member

cijothomas commented Sep 18, 2020

Myunderstanding is this:
We are not introducing a new logging for end users.
We can simply ask every Otel components to simply log into EventSource. (already doing this)
The self-diagnostic module - will subscribe to all EventSource with a name matching "OpenTelemetry.**" or something like this.
And the self-diag module will simply print it to a local file, based on config.

The reason why I prefer to use EventSource, as opposed to Ms.Ext.Logging is because of this:
EventSource based logging should be available at the earliest stage of application startup. But Ilogger based ones, requires someone to inject Ilogger, which typically requires DI to be ready, etc. EventSource requires nothing else (except something really basic), so we should be able to capture logs from the very start of the application itself. So technically, one can listen to EventSource (using perview like tools), almost always. But if we use Ilogger, then we may miss some critical events from the app startup. Also, we are vulnerable, if ILogger config is messed up.

@alanwest and I discussed the option of EventSource vs ILogger for Otel itself. Other Otel libraries (eg: NewRelic Otel exporter) is using ILogger.

Happy to have more discussions on this. We can seek expert suggestions from .NET diagnostic team as well.

@cijothomas
Copy link
Member

In Azure ApplicationInsight, we did log to both Ilogger and eventsource here:
https://github.com/microsoft/ApplicationInsights-dotnet/blob/develop/NETCORE/src/Microsoft.ApplicationInsights.AspNetCore/Implementation/ApplicationInsightsStartupFilter.cs#L42-L43

We need to evaluate this further.

@ejsmith
Copy link
Contributor

ejsmith commented Sep 18, 2020

It seems like you are arguing that the primary reason for not wanting to use Ms.Ext.Logging is because it can't capture issues during bootstrap. Well, 1) this library isn't typically used that early in the application and 2) you can capture bootstrapping issues using Ms.Ext.Logging just fine since it's just a logging abstraction and you could hook it up as early as you want in the app. Look at the Serilog example from this page: https://github.com/serilog/serilog-aspnetcore

For me though, my main argument is that EventSource / PerfView is not easy to use. I want to be able to just say AddOpenTelemetryTracing and immediately see issues. I could also turn my logging level up and be able to see detailed information about what is happening in the OTel client.

@cijothomas
Copy link
Member

It seems like you are arguing that the primary reason for not wanting to use Ms.Ext.Logging is because it can't capture issues during bootstrap

-- not exactly. That's just one reason. There are many I think, but I will get the official recommendation from .NET Diagnostics team.
Also, I don't think ILogger is available in net452. EventSource is available every .NET version we care about. This was one of the original reason (I think), why this repo chose to use EventSource.

@ejsmith
Copy link
Contributor

ejsmith commented Sep 18, 2020

@cijothomas
Copy link
Member

  1. this library isn't typically used that early in the application

This is exactly what me and @alanwest concluded.

@cijothomas
Copy link
Member

It supports 4.6.1 in latest release: https://www.nuget.org/packages/Microsoft.Extensions.Logging.Abstractions/5.0.0-rc.1.20451.14

mm.. We need net452 as well, as we support net452, and net452 is officially Microsoft supported.
https://github.com/open-telemetry/opentelemetry-dotnet/blob/master/src/OpenTelemetry/OpenTelemetry.csproj#L3

@ejsmith
Copy link
Contributor

ejsmith commented Sep 18, 2020

Any particular reason you decided to go back that far? To me it seems highly unlikely that someone would use this library in an old application that isn't even upgraded to the latest version of full framework which has full backwards compatibility with all previous versions. This library is still brand new and not even out of alpha yet. I can't see someone who won't upgrade their .NET framework version adding new code to their apps to integrate Open Telemetry.

@cijothomas
Copy link
Member

.NET452 is an officially supported version by Microsoft. When Microsoft drops support for it, we can drop support for it as well. But until then we have to support this. (There were discussions in early SIG meeting on this, and I believe the consensus was to support all officially supported .NET versions)

@cijothomas
Copy link
Member

Btw - I am not opposed to using ILogger :) As long as it can meet all the requirements, happy to use it.
(In fact I am working on Ilogger this very moment :) )

@cijothomas
Copy link
Member

@shirhatti What are your thoughts on using EventSource vs ILogger as the mechanism to log from the OpenTelemetry SDK itself?

@julealgon
Copy link

EventSource vs ILogger

I don't understand why this is even a statement, considering one of the main adapters for ILogger is exactly one that logs into EventSource.

Why would you ever want to take a static dependency with EventSource and use that hardcoded dependency all over the place (like OpenTelemetry here does), instead of just taking ILogger instead but configuring it to log into EventSource as a "sane default"? Even without a DI container, it just feels like such a terrible idea to hardcode the logger in this way.

I'll be honest in saying it surprised me to see all those hardcoded EventSource direct calls in this codebase after investigating a few issues for several hours. I literally facepalmed when I realized there was no way to push these logs into my own logger in the application.

This proposal itself feels like we are trying to reinvent a wheel: the logging mechanism already exists and is already stable, just use it.

I mentioned this in a discussion and I'll mention it here again: as a consumer of the library, I personally think the current logging/diagnostics solution is abysmal. It needs excessive setup and uses weird nonstandard patterns like "add this weirdly named json file and a txt log will be created per execution". I've shown that mechanism to a few developers on my team and the immediate reaction I get is a frown: as expected. That entire mechanism should literally not exist IMHO, and even if it did, it should be based on appsettings/system.configuration flags which at least are the expected configuration points for a thing such as that.

@reyang
Copy link
Member

reyang commented Aug 31, 2022

Why would you ever want to take a static dependency with EventSource and use that hardcoded dependency all over the place (like OpenTelemetry here does), instead of just taking ILogger instead but configuring it to log into EventSource as a "sane default"? Even without a DI container, it just feels like such a terrible idea to hardcode the logger in this way.

#1258 (comment) explained why - when this was originally designed, ILogger wasn't even available for the runtime versions that OpenTelemetry .NET was targeting...

@reyang reyang closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants