Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[telemetrySettting] Create sampled Logger #8134

Merged
merged 35 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4394496
Capability to configure sampledLogger in the exporterHelper
antonjim-te Jul 26, 2023
3192d88
Capability to configure sampledLogger in the exporterHelper
antonjim-te Jul 26, 2023
39f95e4
Improve documentation
antonjim-te Jul 26, 2023
5295e11
Improve documentation
antonjim-te Jul 26, 2023
9534fbb
Apply PR feedback
antonjim-te Jul 27, 2023
56b71f3
Apply PR feedback
antonjim-te Jul 27, 2023
5aec726
Apply PR feedback
antonjim-te Jul 27, 2023
5d2cdeb
Merge main
antonjim-te Aug 21, 2023
7ab0227
Fix linter
antonjim-te Aug 28, 2023
9778c88
Add chloggen
antonjim-te Aug 28, 2023
b6ac5ac
Apply feedback. Describe sampled_logger config
antonjim-te Aug 28, 2023
6c2051a
Merge master
antonjim-te Sep 6, 2023
f53561a
Merge branch 'open-telemetry:main' into develop/sampledLoggedCfg
antonjim-te Sep 6, 2023
310b747
Merge master
antonjim-te Sep 6, 2023
9540c55
Create a extra sampled logger as part of the telemetry settings
antonjim-te Sep 6, 2023
20536a1
Review PR
antonjim-te Sep 6, 2023
0b60ef8
Apply feedback from @mx-psi . Create the sampledLogger the first time…
antonjim-te Sep 7, 2023
0add644
Add telemetry test
antonjim-te Sep 7, 2023
37fcef7
Update variable name
antonjim-te Sep 7, 2023
70f05d8
Update variable name
antonjim-te Sep 7, 2023
a5d054b
Merge branch 'main' into develop/sampledLoggedCfg
antonjim-te Sep 11, 2023
868041f
Fix merge commit
antonjim-te Sep 11, 2023
8a321ac
Apply @dmitryax feedback. Keep Sampling config in general logger
antonjim-te Sep 11, 2023
0cb309c
review PR
antonjim-te Sep 11, 2023
bc250fb
Review PR
antonjim-te Sep 12, 2023
7180926
Update default sampling logger configuration in the telemetry configu…
antonjim-te Sep 19, 2023
15b3b42
Merge branch 'main' into develop/sampledLoggedCfg
antonjim-te Sep 19, 2023
409f6ff
Fix merge
antonjim-te Sep 19, 2023
66fa58e
Fix linter
antonjim-te Sep 19, 2023
4b84a00
Apply @dmitryax feedback
antonjim-te Sep 22, 2023
fcd2b83
Apply @dmitryax feedback
antonjim-te Sep 22, 2023
ce3776a
Apply @dmitryax feedback
antonjim-te Sep 25, 2023
bce885d
apply @jpkrohling feedback
antonjim-te Sep 26, 2023
f9ca8a7
apply @jpkrohling feedback
antonjim-te Sep 26, 2023
8c3cddd
apply feedback
antonjim-te Sep 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions exporter/exporterhelper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ The following configuration options can be modified:
[the batch processor](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor)
is used, the metric `batch_send_size` can be used for estimation)
- `timeout` (default = 5s): Time to wait per individual attempt to send data to a backend
- `sampled_logger`: Samples logging messages, which caps the CPU and I/O load of logging while keeping a representative subset of your logs.
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
- `enabled` (default = true)

The `initial_interval`, `max_interval`, `max_elapsed_time`, and `timeout` options accept
[duration strings](https://pkg.go.dev/time#ParseDuration),
Expand Down
21 changes: 19 additions & 2 deletions exporter/exporterhelper/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,21 @@ func NewDefaultTimeoutSettings() TimeoutSettings {
}
}

// SampledLoggerSettings samples logging messages, which caps the CPU and I/O load of logging while keeping a
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
// representative subset of your logs.
// Its purpose is to balance between the need for comprehensive logging and the potential performance impact of logging too much data.
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
type SampledLoggerSettings struct {
// Enable the sampledLogger
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
Enabled bool `mapstructure:"enabled"`
}

// NewDefaultSampledLoggerSettings returns the default settings for SampledLoggerSettings.
func NewDefaultSampledLoggerSettings() SampledLoggerSettings {
return SampledLoggerSettings{
Enabled: true,
}
}

// requestSender is an abstraction of a sender for a request independent of the type of the data (traces, metrics, logs).
type requestSender interface {
send(req internal.Request) error
Expand Down Expand Up @@ -64,6 +79,7 @@ type baseSettings struct {
TimeoutSettings
QueueSettings
RetrySettings
SampledLoggerSettings
}

// fromOptions returns the internal options starting from the default and applying all configured options.
Expand All @@ -74,7 +90,8 @@ func fromOptions(options ...Option) *baseSettings {
// TODO: Enable queuing by default (call DefaultQueueSettings)
QueueSettings: QueueSettings{Enabled: false},
// TODO: Enable retry by default (call DefaultRetrySettings)
RetrySettings: RetrySettings{Enabled: false},
RetrySettings: RetrySettings{Enabled: false},
SampledLoggerSettings: NewDefaultSampledLoggerSettings(),
}

for _, op := range options {
Expand Down Expand Up @@ -154,7 +171,7 @@ func newBaseExporter(set exporter.CreateSettings, bs *baseSettings, signal compo
return nil, err
}

be.qrSender = newQueuedRetrySender(set.ID, signal, bs.QueueSettings, bs.RetrySettings, reqUnmarshaler, &timeoutSender{cfg: bs.TimeoutSettings}, set.Logger)
be.qrSender = newQueuedRetrySender(set.ID, signal, bs.QueueSettings, bs.RetrySettings, bs.SampledLoggerSettings, reqUnmarshaler, &timeoutSender{cfg: bs.TimeoutSettings}, set.Logger)
be.sender = be.qrSender
be.StartFunc = func(ctx context.Context, host component.Host) error {
// First start the wrapped exporter.
Expand Down
14 changes: 9 additions & 5 deletions exporter/exporterhelper/queued_retry.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ type queuedRetrySender struct {
requestUnmarshaler internal.RequestUnmarshaler
}

func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg QueueSettings, rCfg RetrySettings, reqUnmarshaler internal.RequestUnmarshaler, nextSender requestSender, logger *zap.Logger) *queuedRetrySender {
func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg QueueSettings, rCfg RetrySettings, lCfg SampledLoggerSettings, reqUnmarshaler internal.RequestUnmarshaler, nextSender requestSender, logger *zap.Logger) *queuedRetrySender {
retryStopCh := make(chan struct{})
sampledLogger := createSampledLogger(logger)
newLogger := createSampledLogger(logger, lCfg)
traceAttr := attribute.String(obsmetrics.ExporterKey, id.String())

qrs := &queuedRetrySender{
Expand All @@ -95,7 +95,7 @@ func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg Queue
cfg: qCfg,
retryStopCh: retryStopCh,
traceAttribute: traceAttr,
logger: sampledLogger,
logger: newLogger,
requestUnmarshaler: reqUnmarshaler,
}

Expand All @@ -104,7 +104,7 @@ func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg Queue
cfg: rCfg,
nextSender: nextSender,
stopCh: retryStopCh,
logger: sampledLogger,
logger: newLogger,
// Following three functions actually depend on queuedRetrySender
onTemporaryFailure: qrs.onTemporaryFailure,
}
Expand Down Expand Up @@ -266,12 +266,16 @@ func NewDefaultRetrySettings() RetrySettings {
}
}

func createSampledLogger(logger *zap.Logger) *zap.Logger {
func createSampledLogger(logger *zap.Logger, lCfg SampledLoggerSettings) *zap.Logger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codeboten, do you know why we create a sampled logger here? Shouldn't loggers be created based on the telemetry configuration alone? Instead of this PR, I would consider removing this sampled logger altogether.

Copy link
Member

@mx-psi mx-psi Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in #2020 by @tigrannajaryan. The PR description says

All logging is heavily sampled to avoid flooding the logs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and shouldn't this be a collector-wide setting, instead of something exclusively to this helper?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was meant to be specific to this helper to avoid flooding the logs in the case where all requests fail, and you log once per request. The contributing guidelines say

Use logging carefully for events that can happen frequently to avoid flooding the logs. Avoid outputting logs per a received or processed data item since this can amount to very large number of log entries (Collector is designed to process many thousands of spans and metrics per second). For such high-frequency events instead of logging consider adding an internal metric and increment it when the event happens.

I guess at the time this was written metrics were not really available and thus we went with sampled logs.

(Not sure if this is a good reason today to keep this, just trying to reconstruct the history of this to make a better decision)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: I'm sure this was a good decision by then, not so sure the next iteration should be to expose a setting for this instead of using a metric and/or exposing log sampling as part of the collector's telemetry config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a possible future option is to supply 2 loggers to every component: one that has no sampling (outputs 100% of logs) and another that is sampled. Components will use the no-sampled logger for everything that is critical to be printed and is known to be non-repeating so it can't flood logs (e.g. startup and shutdown messages. The messages that can repeat for every data item or request will use the sampled logger. The sampled logger can derive its sampling ratio from a central setting (somewhere in the telemetry config section).

I do not think merely passing a single sampled logger to every component to be used for every message is good enough since it can result in critical one-off messages being lost.

On a side note: the memory usage by sampled logger that this PR attempts to address is puzzling. Maybe try to figure out why and reduce that instead of disabling sampling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tigrannajaryan answering the side note. As you can see on the graph https://user-images.githubusercontent.com/5960625/255850931-f9b3e891-2695-4aac-b8d6-9d60cb0dfb9d.png the memory problem is coming from the logger sampling. We also tried modifying the configuration of NewSamplerWithOptions

zapcore.NewSamplerWithOptions(
			core,
			10*time.Second,
			1,
			100,
		)

But it was still a high memory consumption.
So, it would be great to disable the logger sampler somehow from the telemetry config (in a long-term solution) or from the exporterHelper config (short-term solution)

Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @tigrannajaryan. This doesn't seem like a good option to expose in the exporters' configuration interface.

@antonjim-te can you please try initializing one sampled logger instance per service and see if that helps with memory utilization? It can be created in service.New function and passed as another field of TelemetrySettings to all components. If it doesn't help with memory utilization, we can still expose a similar configuration for the sampled logger in the service::telemetry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dmitryax ,
Thank you for your suggestion. I will work on that in a separate branch, test it and if it solves the problem I will bring those changes to into this PR.

I will create a ticket in our internal process and add it to our next sprint.

I have a good feeling about this suggestion because it will just create one logger sampled for all the exporters instead of creating one per exporterHelper.

Thank you, team. I will keep you updated.

Copy link
Contributor Author

@antonjim-te antonjim-te Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,
I have followed the suggestion from @dmitryax and created an extra sampled logger in the TelemetrySettings and used it when necessary in our case in the QueuedRetrySender

The new approach also solved the memory incident.

Please take a look at the new PR changes.

Thank you in advance.

if logger.Core().Enabled(zapcore.DebugLevel) {
// Debugging is enabled. Don't do any sampling.
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
return logger
}

if !lCfg.Enabled {
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
return logger
}

// Create a logger that samples all messages to 1 per 10 seconds initially,
// and 1/100 of messages after that.
opts := zap.WrapCore(func(core zapcore.Core) zapcore.Core {
Expand Down
52 changes: 52 additions & 0 deletions exporter/exporterhelper/queued_retry_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ import (
"context"
"errors"
"fmt"
"go.uber.org/zap"
"go.uber.org/zap/zaptest"
"strconv"
"sync"
"sync/atomic"
Expand Down Expand Up @@ -617,6 +619,56 @@ func TestQueuedRetryPersistentEnabled_shutdown_dataIsRequeued(t *testing.T) {
}, time.Second, 1*time.Millisecond)
}

func TestCreateSampledLogger(t *testing.T) {
testCases := []struct {
desc string
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
logger *zap.Logger
lCfg SampledLoggerSettings
sampledLoggedCreated bool
}{
{
desc: "default configuration - sampledLogger enabled",
logger: zaptest.NewLogger(t, zaptest.Level(zap.WarnLevel)),
lCfg: NewDefaultSampledLoggerSettings(),
sampledLoggedCreated: true,
},
{
desc: "sampledLogger disable",
logger: zaptest.NewLogger(t, zaptest.Level(zap.WarnLevel)),
lCfg: SampledLoggerSettings{
Enabled: false,
},
sampledLoggedCreated: false,
},
{
desc: "debug logger level and sampledLogged enabled - sampledLogger not created",
logger: zaptest.NewLogger(t, zaptest.Level(zap.DebugLevel)),
lCfg: NewDefaultSampledLoggerSettings(),
sampledLoggedCreated: false,
},
{
desc: "debug logger level and sampledLogged disabled - sampledLogger not created",
logger: zaptest.NewLogger(t, zaptest.Level(zap.DebugLevel)),
lCfg: SampledLoggerSettings{
Enabled: false,
},
sampledLoggedCreated: false,
},
}

for _, tC := range testCases {
t.Run(tC.desc, func(t *testing.T) {
newLogger := createSampledLogger(tC.logger, tC.lCfg)
require.NotNil(t, newLogger)
if tC.sampledLoggedCreated {
require.NotEqual(t, tC.logger, newLogger)
} else {
require.Equal(t, tC.logger, newLogger)
}
})
}
}

type mockErrorRequest struct {
baseRequest
}
Expand Down
1 change: 1 addition & 0 deletions exporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ require (

require (
contrib.go.opencensus.io/exporter/prometheus v0.4.2 // indirect
github.com/benbjohnson/clock v1.3.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
Expand Down
1 change: 1 addition & 0 deletions exporter/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ github.com/aws/aws-sdk-go-v2/service/sso v1.4.2/go.mod h1:NBvT9R1MEF+Ud6ApJKM0G+
github.com/aws/aws-sdk-go-v2/service/sts v1.7.2/go.mod h1:8EzeIqfWt2wWT4rJVu3f21TfrhJ8AEMzVybRNSb/b4g=
github.com/aws/smithy-go v1.8.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
github.com/benbjohnson/clock v1.3.0 h1:ip6w0uFQkncKQ979AypyG0ER7mqUSBdKLOgAle/AT8A=
github.com/benbjohnson/clock v1.3.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA=
github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q=
github.com/beorn7/perks v1.0.0/go.mod h1:KWe93zE9D1o94FZ5RNwFwVgaQK1VOXiVxmqh+CedLV8=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
Expand Down
2 changes: 1 addition & 1 deletion exporter/otlpexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Several helper files are leveraged to provide additional capabilities automatica

- [gRPC settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configgrpc/README.md)
- [TLS and mTLS settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configtls/README.md)
- [Queuing, retry and timeout settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)
- [Queuing, retry, sampled logger and timeout settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)

[beta]: https://github.com/open-telemetry/opentelemetry-collector#beta
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib
Expand Down
3 changes: 2 additions & 1 deletion exporter/otlpexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ type Config struct {
exporterhelper.QueueSettings `mapstructure:"sending_queue"`
exporterhelper.RetrySettings `mapstructure:"retry_on_failure"`

configgrpc.GRPCClientSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
configgrpc.GRPCClientSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
exporterhelper.SampledLoggerSettings `mapstructure:"sampled_logger"`
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
}

var _ component.Config = (*Config)(nil)
Expand Down
3 changes: 3 additions & 0 deletions exporter/otlpexporter/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,5 +75,8 @@ func TestUnmarshalConfig(t *testing.T) {
BalancerName: "round_robin",
Auth: &configauth.Authentication{AuthenticatorID: component.NewID("nop")},
},
SampledLoggerSettings: exporterhelper.SampledLoggerSettings{
Enabled: false,
},
}, cfg)
}
1 change: 1 addition & 0 deletions exporter/otlpexporter/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ func createDefaultConfig() component.Config {
// We almost read 0 bytes, so no need to tune ReadBufferSize.
WriteBufferSize: 512 * 1024,
},
SampledLoggerSettings: exporterhelper.NewDefaultSampledLoggerSettings(),
}
}

Expand Down
2 changes: 2 additions & 0 deletions exporter/otlpexporter/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,5 @@ keepalive:
timeout: 30s
permit_without_stream: true
balancer_name: "round_robin"
sampled_logger:
enabled: false
2 changes: 2 additions & 0 deletions exporter/otlphttpexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ The following settings can be optionally configured:
- `timeout` (default = 30s): HTTP request time limit. For details see https://golang.org/pkg/net/http/#Client
- `read_buffer_size` (default = 0): ReadBufferSize for HTTP client.
- `write_buffer_size` (default = 512 * 1024): WriteBufferSize for HTTP client.
- `sampled_logger`: Samples logging messages, which caps the CPU and I/O load of logging while keeping a representative subset of your logs.
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
- `enabled` (default = true)

Example:

Expand Down
2 changes: 2 additions & 0 deletions exporter/otlphttpexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ type Config struct {

// The URL to send logs to. If omitted the Endpoint + "/v1/logs" will be used.
LogsEndpoint string `mapstructure:"logs_endpoint"`

exporterhelper.SampledLoggerSettings `mapstructure:"sampled_logger"`
antonjim-te marked this conversation as resolved.
Show resolved Hide resolved
}

var _ component.Config = (*Config)(nil)
Expand Down
3 changes: 3 additions & 0 deletions exporter/otlphttpexporter/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,8 @@ func TestUnmarshalConfig(t *testing.T) {
Timeout: time.Second * 10,
Compression: "gzip",
},
SampledLoggerSettings: exporterhelper.SampledLoggerSettings{
Enabled: false,
},
}, cfg)
}
1 change: 1 addition & 0 deletions exporter/otlphttpexporter/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ func createDefaultConfig() component.Config {
// We almost read 0 bytes, so no need to tune ReadBufferSize.
WriteBufferSize: 512 * 1024,
},
SampledLoggerSettings: exporterhelper.NewDefaultSampledLoggerSettings(),
}
}

Expand Down
2 changes: 2 additions & 0 deletions exporter/otlphttpexporter/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ headers:
header1: 234
another: "somevalue"
compression: gzip
sampled_logger:
enabled: false