Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] C++ Example Logging Statements #356

Closed
kxyr opened this issue Oct 7, 2020 · 9 comments
Closed

[RFC] C++ Example Logging Statements #356

kxyr opened this issue Oct 7, 2020 · 9 comments
Labels
question Further information is requested

Comments

@kxyr
Copy link
Contributor

kxyr commented Oct 7, 2020

This RFC aims to collect a variety of use cases of logging statements, that will be used to establish the convention and design of the C++ Logging API/SDK in #337.

The goal is to obtain a collection of observability-specific logging statements that:

  • use different data types (custom or built in)
  • use different formats (template or string), and
  • are simple and intuitive to learn and write.

Example Submission
Simple structured logging of a map:

unordered_map<string, int> mymap = {{ "First", 1 },
                                    { "Second",2 }
                                    { "Third", 3 }
                                                 }; 

Write the map to the logger, along with the severity and message:

logger.DEBUG("Outputting a map ", mymap);

This code snippet creates a log with a map attached to it. This is a type of structured logging, since the data holds a relationship with itself through key/value pairs. If this log were converted to JSON, it may have the following structure:

{
    "SeverityLevel": DEBUG,
    "Timestamp": Oct 07 10:59AM, 
    "Message": "Outputting a map", 
    "map1": {
        "First": 1,
        "Second": 2,
        "Third": 3
    }
}

Request for Comments

Please comment below to add examples of log formats and the different ways logging can be performed in C++, with the relevant data definitions (if applicable).

cc: @MarkSeufert @alolita @reyang @maxgolov @ThomsonTan

@wsargent
Copy link

wsargent commented Oct 8, 2020

The main thing that stands out is that structured logging should cover semi-structured data, i.e. JSON/XML/YAML type structures that contain lists, maps, and sub-structures.

For example things like https://github.com/nlohmann/json may be represented:

json j2 = {
  {"pi", 3.141},
  {"happy", true},
  {"name", "Niels"},
  {"nothing", nullptr},
  {"answer", {
    {"everything", 42}
  }},
  {"list", {1, 0, 2}},
  {"object", {
    {"currency", "USD"},
    {"value", 42.99}
  }}
};
logger.DEBUG("Outputting a json document", j2);

but this could likewise be an XML or YAML document. I don't think references need to be taken into account, but something like the XML infoset or a JsonGenerator builder may be helpful. https://docs.oracle.com/javaee/7/api/javax/json/stream/JsonGenerator.html

When you talk about formats, are you talking about things like logfmt https://www.brandur.org/logfmt? There are frameworks that render the same logging statements to different outputs, so the same logging event gets rendered as logfmt on console, but written as JSON on the backend.

@maxgolov
Copy link
Contributor

maxgolov commented Oct 9, 2020

Thoughts:

  • re. JSON - we are already using nlohmann/json for zPages. And we might as well allow LogEvent(T) template, where T is any object. For example, a nlohmann::json object. As a header-only add-on , higher-order module on top of SDK. NOT in SDK. In most cases the systems would require a flat-list rather than a deep structured object, as in most cases the DB storage systems (e.g. Microsoft Kusto, Cosmos, Azure Data Lake, sql DB, Aurora, MySQL, sqlite) - are a 2D time-series of records. It should be possible to flatten the incoming nlohmann::json into property list (flat key-value pairs) using example code like this:
    https://github.com/open-telemetry/opentelemetry-cpp/blob/cf1cd4c75a5bd9a232426cd256f0e9af1d09e801/docs/JSON-Pointer-Properties.md - marshal in ABI-safe manner (as shown below). Then unflatten either in exporter or in collector, depending on whether the final destination supports nested structures or not.

  • at source - you can express your structures as follows:

  // lp - LoggerProvider
  auto logger = lp.GetLogger(loggerName); // with optional 2nd logger parameters
  Event event = { {"uint32Key", (uint32_t)123456}, {"uint64Key", (uint64_t)123456}, {"strKey", "someValue"} };
  logger->LogEvent(event);

With overloads, like:

  logger->LogEvent(name); // simple event by name

and

  logger->LogEvent(name, event); // name and event property list

and

  logger->LogEvent(T); // templated method, where T - is anything. Could be `nlohmann::json` object. User provides conversion

While passing objects across the ABI boundary - we use collection of AttributeValue and KeyValueIterable, which we use for Span attributes as well.

I believe we need to move the KeyValueIterable from trace::KeyValueIterable to common::KeyValueIterable and reuse it for the structured event logging API. You'd be passing map or unordered_map or any other iterable container for this matter, that is header-only (compiled in customer code) transformed into common::KeyValueITerable object, that may then be safely passed to SDK across the ABI boundary to OpenTelemetry SDK shared library. ABI stability is one of our fundamental requirements. Flattening algorithm described above can be used to flatten any deep-nested structure (e.g. JSON object) into flat list, passed across as KeyValueIterable (collection of variants), then unflattened in exporter inside SDK shared library - only in case if destination system really needs / supports deep-nested structures on wire and in storage system.

I would also suggest to add higher-order primitives, in addition to foundation types supported by OTEL protocol. This is example of what may come in handy, in addition to basic types covered by AttributeValue type:
https://github.com/maxgolov/opentelemetry-cpp/tree/master/api/include/opentelemetry/event

  • Time point (64-bit)
  • possibly Time duration
  • GUID (or UUID)
  • byte array. Spec issue for that: https://github.com/open-telemetry/opentelemetry-proto/issues/169 - It is extremely limiting that we do not natively support array of good old uint8_t bytes. Most eventing / logging systems, as well as protocols (ETW, MessagePack) do support byte arrays, which are instrumental for logging blobs (e.g. in crash analytics scenarios).

For the unstructured logging, I believe we have to make sure we support C-style debug macros that can be turned off at compile time.. The usual suspects, like:

#define LOG_DEBUG(fmt_, ...)    MY_LOG_(MyLog::Debug,  getMyLogComponent(), fmt_, ##__VA_ARGS__)
#endif
#define LOG_TRACE(fmt_, ...)    MY_LOG_(MyLog::Trace,  getMyLogComponent(), fmt_, ##__VA_ARGS__)
#define LOG_INFO(fmt_, ...)     MY_LOG_(MyLog::Info,    getMyLogComponent(), fmt_, ##__VA_ARGS__)
#define LOG_WARN(fmt_, ...)     MY_LOG_(MyLog::Warning, getMyLogComponent(), fmt_, ##__VA_ARGS__)
#define LOG_ERROR(fmt_, ...)    MY_LOG_(MyLog::Error,   getMyLogComponent(), fmt_, ##__VA_ARGS__)

With no strong preference how we name them. ALOG_, OTLOG_ or MYLOG_, as long as we agree on what's the set of common levels we support. Maybe:

ASSERT
DEBUG
ERROR
INFO
VERBOSE
WARN

And possibly with:

TRACE
NOTICE

levels too.

@ThomsonTan
Copy link
Contributor

Would log initialization/configuration also be part of logging API instead of log statement, or we are going to put the into different RFC?

For the thoughts from @maxgolov , I suggest that the API doesn't depend on any other library like nlohmann::json, just C++ primitive types and the types defined in our API.

Also I think we want to avoid referencing any STL types in our API, as @maxgolov mentioned, they are not ABI compliant. And add implicit cast of STL types to KeyValueIterable looks convenient, but it may also cause unexpected overhead. Perhaps we want to avoid such implicit conversion.

If we wrap all the APIs into log class, are they going to be virtual functions and allowed to be overloaded by user code?

@wsargent
Copy link

wsargent commented Oct 9, 2020

The other thing that stands out (but may be optional) is the use of markers.

For example, a logging statement may have

logger.info(SECURITY_MARKER, eventData);

and be given a specialized HTTP header without changing the body of the

The concept was added by Logback to indicate when additional filtering / alerting needs to take place "out of band" of the logging event data, but it's useful as a general purpose escape hatch.

For example, you can associate markers with audio clips or build up context through a series of markers.

@maxgolov
Copy link
Contributor

maxgolov commented Oct 9, 2020

@ThomsonTan - yeah, just to clarify - I'm not actually saying we should support JSON in API. I'm saying we can have a template method and a higher-level abstraction that transforms the JSON into "property list of variants" (KeyValueIterable) what we have today. As a header-only add-on , higher-order module on top of SDK. NOT in SDK.. basically not even API, other than the API potentially allows the customer to provide their own implementation of a templated method to transform from their own custom type, be that JSON, XML or some other object. Eventually it might be even a helper method that just transforms any arbitrary struct or class into flat property list that we can marshal across the ABI boundary.

@maxgolov
Copy link
Contributor

maxgolov commented Oct 9, 2020

Would log initialization/configuration also be part of logging API instead of log statement, or we are going to put the into different RFC?

I think we need to have something like auto logger = provider.GetLogger("...", configOptions) that may ask provider to configure the logger by wiring it to specific destination(s). Eventually maybe even auth token / destination, such as API key, being part of the configOptions .. But we can omit that for now, leaving it totally opaque to the rest of logging API RFC. It's the implementation of the logging provider in SDK that would have to respect the config options.

@maxgolov
Copy link
Contributor

maxgolov commented Oct 9, 2020

Re. configOptions -- TL;DR -- https://stackoverflow.com/questions/23322602/log4j-config-different-logs-to-different-files

I think we need to have logger aliases that may be rewired by LoggingProvider to one or multiple destinations at runtime, late binding, without necessarily providing any configuration options on API surface. That'd be much cleaner, and it'd be much easier to migrate the apps between the clouds / logging destinations.

@lalitb
Copy link
Member

lalitb commented Oct 12, 2020

I support the idea of having logging methods as below.

  • Unstructured logs : (say) logger->log_error(nostd::string_view);
  • Key-Value pairs : (say) logger->log_debug(nostd::KeyValueIterable &);
  • Templatized function which can be used by customer to bring their own custom types (json, xml, apache log format etc).

Also, if we can support std::format for unstructured logs, something like:

      logger->log_error("Some error message with args: {} {} {}", arg1, arg2, arg3);

Of-course, this may need backporting std::format from C++20, or else using fmtlib.

Re. configOptions -- TL;DR -- https://stackoverflow.com/questions/23322602/log4j-config-different-logs-to-different-files

I like the concept of log4j having one logger associated with multiple destination, each destination accepting data of certain importance level, different from that associated with logger.

@lalitb
Copy link
Member

lalitb commented Apr 19, 2021

Closing as this was design discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants