Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create Logs SDK LoggerProvider #1517

Merged
merged 19 commits into from
Jan 22, 2024

Conversation

kaylareopelle
Copy link
Contributor

@kaylareopelle kaylareopelle commented Aug 25, 2023

Fulfills the Logs SDK specs for Logger Provider

The Logger Provider implementation is drawn mainly from the Tracer Provider implementation, as they have very similar specs.

If merged, this PR would fulfill the following elements on the spec-compliance-matrix for Logs

  • LoggerProvider.GetLogger
  • LoggerProvider.GetLogger accepts attributes
  • LoggerProvider.Shutdown
  • LoggerProvider.ForceFlush

Here are some other implementations of LoggerProvider for comparison:

Resolves #1483

logs_sdk/lib/opentelemetry/sdk/logs/logger_provider.rb Outdated Show resolved Hide resolved
logs_sdk/lib/opentelemetry/sdk/logs/logger_provider.rb Outdated Show resolved Hide resolved
Comment on lines 122 to 123
rescue StandardError => e
OpenTelemetry.handle_error(exception: e, message: FORCE_FLUSH_ERROR)
Export::FAILURE
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not a rescue in the TracerProvider#force_flush method, but the method documentation states "Export::FAILURE should be returned if a non-specific failure occurred." I'm assuming a non-specific failure would be any StandardError instance.

  • Is there a condition to return Export::FAILURE on non-specific errors in TracerProvider#force_flush that I'm missing?
  • Similar documentation exists for the LoggerProvider/TracerProvider#shutdown methods. If we choose to add a rescue here, should the #shutdown methods also have a similar rescue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be assuming that processors will handle their own errors, but that might be optimistic. I would be fine with rescuing from StandardError in force_flush and shutdown across the board. I'll let others chime in if they have opinions.

# {LogRecordProcessor} to add to this LoggerProvider.
def add_log_record_processor(log_record_processor)
@mutex.synchronize do
@log_record_processors = log_record_processors.dup.push(log_record_processor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from TracerProvider#add_span_processor. @fallwith and I were chatting about this LOC. He was asking about the use of .dup.push here.

Does dup.push provide atomicity in a way that the push method alone couldn't deliver? Or is there a different reason dup.push was used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain, but @fbogsany probably knows. I think it has to do with mutating the processors array. Looking at TracerProvider a span gets a reference to processors on start. If a span processor is added after a span is started, and the processors array is mutated, a span will see a processor that wasn't there when it was started. I'm not sure if this would happen in the real world, or if this is what we're protecting against, but that is a reason why we might prefer dup.push over push.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly the reason - each SDK Span instance has a reference to the list of span processors, and that list should not change between the span calling on_start and on_finish. IDK if that applies in your case - I need to read the rest of the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth considering whether it should be valid to add log record processors after @stopped. In the TracerProvider, we warn and don't allow that.

# Copyright The OpenTelemetry Authors
#
# SPDX-License-Identifier: Apache-2.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the pattern in OpenTelemetry::SDK::Trace::Export

The ExportError class is not present because it was not needed for the LoggerProvider. It will be added once it's used in subsequent PRs.

@@ -0,0 +1,22 @@
# frozen_string_literal: true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a boilerplate/placeholder so the LogRecordProcessor methods can be called. They will be implemented in a later PR.

@@ -0,0 +1,33 @@
# frozen_string_literal: true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Logger SDK will be fully written in a separate PR. The code here represents what's necessary to fulfill the LoggerProvider SDK spec.

@@ -24,15 +24,18 @@ Gem::Specification.new do |spec|
spec.require_paths = ['lib']
spec.required_ruby_version = '>= 3.0'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating dependencies to more recent versions. This came about when I realized the SimpleCov version that was installed didn't include a feature I wanted to use.

# Copyright The OpenTelemetry Authors
#
# SPDX-License-Identifier: Apache-2.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specs for SDK LoggerProvider and SDK TracerProvider are extremely similar.

Given this, the code is very similar to the already implemented TracerProvider in the SDK gem.

Comment on lines 44 to 47
name ||= ''
version ||= ''

OpenTelemetry.logger.warn(EMPTY_NAME_ERROR) if name.empty?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where an invalid name (null or empty string) is specified, a working Logger MUST be returned as a fallback rather than returning null or throwing an exception, its name SHOULD keep the original invalid value, and a message reporting that the specified value is invalid SHOULD be logged.
source

This logic is drawn from TracerProvider#tracer. The null or empty string will not stop InstrumentationScope from being generated during Logger initialization.

OpenTelemetry.logger.warn(EMPTY_NAME_ERROR) if name.empty?

@mutex.synchronize do
OpenTelemetry::SDK::Logs::Logger.new(name, version, self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input provided by the user MUST be used to create an InstrumentationScope instance which is stored on the created Logger.

source

The name and version here are passed to the Logger to create the InstrumentationScope.

#
# @param [LogRecordProcessor] log_record_processor The
# {LogRecordProcessor} to add to this LoggerProvider.
def add_log_record_processor(log_record_processor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LoggerProvider MAY provide methods to update the configuration. If configuration is updated (e.g., adding a LogRecordProcessor), the updated configuration MUST also apply to all already returned Loggers (i.e. it MUST NOT matter whether a Logger was obtained from the LoggerProvider before or after the configuration change). Note: Implementation-wise, this could mean that Logger instances have a reference to their LoggerProvider and access configuration only via this reference.

source

Every logger instance has a reference to the logger provider that created it. (see L50) This gives loggers access to the log record processors.

# @param [optional Numeric] timeout An optional timeout in seconds.
# @return [Integer] Export::SUCCESS if no error occurred, Export::FAILURE if
# a non-specific failure occurred, Export::TIMEOUT if a timeout occurred.
def shutdown(timeout: nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: https://opentelemetry.io/docs/specs/otel/logs/sdk/#shutdown

  • Only called once by setting @stopped to true L92
  • Provides a way to let the caller know whether it succeeded, failed, or timed out by returning OpenTelemetry::SDK::Logs::EXPORT constants (L81, L87, L93)
  • Implements a timeout through calls to OpenTelemetry::Common::Utilities.timeout_timestamp and OpenTelemetry::Common::Utilities.maybe_timeout (L84, L86)
  • Invokes shutdown on all the processors (L85 - 89)

Comment on lines 97 to 123
# Immediately export all {LogRecord}s that have not yet been exported
# for all the registered {LogRecordProcessor}s.
#
# This method should only be called in cases where it is absolutely
# necessary, such as when using some FaaS providers that may suspend
# the process after an invocation, but before the {LogRecordProcessor}
# exports the completed {LogRecord}s.
#
# @param [optional Numeric] timeout An optional timeout in seconds.
# @return [Integer] Export::SUCCESS if no error occurred, Export::FAILURE if
# a non-specific failure occurred, Export::TIMEOUT if a timeout occurred.
def force_flush(timeout: nil)
@mutex.synchronize do
return Export::SUCCESS if @stopped

start_time = OpenTelemetry::Common::Utilities.timeout_timestamp
results = log_record_processors.map do |processor|
remaining_timeout = OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time)
return Export::TIMEOUT if remaining_timeout&.zero?

processor.force_flush(timeout: remaining_timeout)
end

results.max || Export::SUCCESS
end
rescue StandardError => e
OpenTelemetry.handle_error(exception: e, message: FORCE_FLUSH_ERROR)
Export::FAILURE
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec reference: https://opentelemetry.io/docs/specs/otel/logs/sdk/#forceflush

Out of time atm, but I'll circle back with more 1:1 details.

Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have a few minor comments.

# {LogRecordProcessor} to add to this LoggerProvider.
def add_log_record_processor(log_record_processor)
@mutex.synchronize do
@log_record_processors = log_record_processors.dup.push(log_record_processor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain, but @fbogsany probably knows. I think it has to do with mutating the processors array. Looking at TracerProvider a span gets a reference to processors on start. If a span processor is added after a span is started, and the processors array is mutated, a span will see a processor that wasn't there when it was started. I'm not sure if this would happen in the real world, or if this is what we're protecting against, but that is a reason why we might prefer dup.push over push.

Comment on lines 122 to 123
rescue StandardError => e
OpenTelemetry.handle_error(exception: e, message: FORCE_FLUSH_ERROR)
Export::FAILURE
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be assuming that processors will handle their own errors, but that might be optimistic. I would be fine with rescuing from StandardError in force_flush and shutdown across the board. I'll let others chime in if they have opinions.

@kaylareopelle
Copy link
Contributor Author

@mwear - Re: this comment

I've updated #shutdown to include error handling to match #force flush in: 3c0dd57

@kaylareopelle kaylareopelle requested a review from mwear October 4, 2023 23:48
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @kaylareopelle!

* Add OpenTelemetry::SDK::Logs::Export constants
* Create OpenTelemetry::SDK::Logs::LoggerProvider
* Implement LoggerProvider#logger
* Implement LoggerProvider#shutdown
* Implement LoggerProvider#force_flush
* Create no-op LogRecordProcessor
@kaylareopelle
Copy link
Contributor Author

Thanks, @mwear! I've rebased the branch into one commit.

Copy link
Member

@robbkidd robbkidd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

I'm a big fan.

Copy link
Contributor

@arielvalentin arielvalentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we gain a better understanding of duping the log processors...

module Logs
# The SDK implementation of OpenTelemetry::Logs::LoggerProvider.
class LoggerProvider < OpenTelemetry::Logs::LoggerProvider
attr_reader :resource, :log_record_processors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_record_processors need safe concurrent access. By exposing this as a reader, external callers may gain access to the array in an unsafe way.

This attribute should not be exposed as a public value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also not exposed in the spec.

Suggested change
attr_reader :resource, :log_record_processors
attr_reader :resource

Copy link
Contributor

@fbogsany fbogsany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes yet - I just want to block so I have a chance to read it closely.

logs_sdk/lib/opentelemetry-logs-sdk.rb Outdated Show resolved Hide resolved
module OpenTelemetry
module SDK
module Logs
# The export module contains result codes for LoggerProvider#force_flush
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should also be result codes for exporters - their use shouldn't just relate to these two methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Updated to be more general in: d63a4e9

# Called when a {LogRecord} is emitted. Subsequent calls are not
# permitted after shutdown is called.
# @param [LogRecord] log_record The emitted {LogRecord}
# @param [Context] context The resolved Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "resolved Context" here? Who does the resolution? (Note that I haven't read this part of the spec or all of this PR, or the logs API, so maybe this is answered clearly somewhere else - just asking as a naive reviewer.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "resolved Context" here is coming from the Logs SDK spec description for the method arguments.

https://opentelemetry.io/docs/specs/otel/logs/sdk/#onemit

The description has a bit more detail to include that might clear things up. I've updated the description to include the rest of the text in: 5528277

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure who does the resolution. I'm also not quite sure how we'll use this argument.

I'll consult some of the implementations in other languages and come back with more info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fbogsany - I can't find any examples where the context argument is used in other languages.

  • Python doesn't include it anywhere.
  • PHP and Java have it as an argument but do not use it in the onEmit functions.
  • Javascript doesn't include it as an argument in their Simple or Batch processors, but the duck type includes it as an argument.

So I don't have a great answer about who does the resolution. I didn't end up needing this argument in the Simple or Batch processors because I used the SpanContext to pull trace_id, span_id, and trace_flags. Those are the only context-related attributes in the Logs Data Model.

Do you have a sense of what approach you'd prefer Ruby take?

module Logs
# The SDK implementation of OpenTelemetry::Logs::Logger
class Logger < OpenTelemetry::Logs::Logger
attr_reader :instrumentation_scope, :logger_provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we expose these attributes. In the Tracer and Meter, they're private, so the inconsistency is a little weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attr_reader call has been removed. I'm not sure why I initially broke from Tracer and Meter.

Comment on lines +14 to +29
# @api private
#
# Returns a new {OpenTelemetry::SDK::Logs::Logger} instance. This should
# not be called directly. New loggers should be created using
# {LoggerProvider#logger}.
#
# @param [String] name Instrumentation package name
# @param [String] version Instrumentation package version
# @param [LoggerProvider] logger_provider The {LoggerProvider} that
# initialized the logger
#
# @return [OpenTelemetry::SDK::Logs::Logger]
def initialize(name, version, logger_provider)
@instrumentation_scope = InstrumentationScope.new(name, version)
@logger_provider = logger_provider
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @api private comment is our standard way of marking "internal" methods. There is no practical way to enforce it.

# @return [OpenTelemetry::SDK::Logs::LoggerProvider]
def initialize(
resource: OpenTelemetry::SDK::Resources::Resource.create,
log_record_processors: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing the log_record_processors at creation time is MAY in the spec. We don't do that in the MeterProvider (for metric_readers) or the TracerProvider (for span_processors). Instead, we rely on the add_... methods to add them individually. I would start with not exposing it during creation. We can always add it later if that's a common pattern (and then consider adding to TracerProvider and MeterProvider as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove the arg from the initialize method. I'll double-check about its commonality in other languages. I've also removed the public reader. See: 37befe6

# @param [optional String] version Instrumentation package version
#
# @return [OpenTelemetry::SDK::Logs::Logger]
def logger(name = nil, version = nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature is inconsistent between TracerProvider and MeterProvider. In the latter, we used:

def meter(name, version: nil)

so name is required and version is a keyword arg. IIRC we're going to have to add one or two more args at some point 🔜 , so keywords for optional args are vastly preferred.

We should go back and fix this for TracerProvider. I think that can be done in a backwards-compatible fashion.

For MeterProvider, we opted to require name. If a user explicitly passes nil or "", we can handle it, but those are both considered invalid - I'd rather have an explicitly invalid required arg than an invalid default arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. I prefer the keyword args, I just defaulted to what was in TracerProvider since that was already stable.

Updated in: ca7c15f

logs_sdk/lib/opentelemetry/sdk/logs/logger_provider.rb Outdated Show resolved Hide resolved
# {LogRecordProcessor} to add to this LoggerProvider.
def add_log_record_processor(log_record_processor)
@mutex.synchronize do
@log_record_processors = log_record_processors.dup.push(log_record_processor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth considering whether it should be valid to add log record processors after @stopped. In the TracerProvider, we warn and don't allow that.

Comment on lines 90 to 92
rescue StandardError => e
OpenTelemetry.handle_error(exception: e, message: UNEXPECTED_ERROR_MESSAGE % __method__)
Export::FAILURE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the TracerProvider and MeterProvider we expect the processors/readers to not raise from #shutdown or #force_flush and instead only return one of the 3 supported result codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Fixed in: a619d36

Co-authored-by: Francis Bogsanyi <[email protected]>
@kaylareopelle
Copy link
Contributor Author

@fbogsany, checking back in here! Have your concerns been addressed?

@fbogsany fbogsany merged commit cff311c into open-telemetry:main Jan 22, 2024
55 checks passed
@kaylareopelle kaylareopelle deleted the sdk-logger-provider branch July 25, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logs SDK - Logger Provider
8 participants