Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export internal metrics using OTEL metrics #1425

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/beyla/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
name: beyla
version: 1.5.0
version: 1.5.1
appVersion: 1.9.0
description: eBPF-based autoinstrumentation HTTP, HTTP2 and gRPC services, as well as network metrics.
home: https://grafana.com/oss/beyla-ebpf/
Expand Down
2 changes: 1 addition & 1 deletion charts/beyla/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# beyla

![Version: 1.5.0](https://img.shields.io/badge/Version-1.5.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.9.0](https://img.shields.io/badge/AppVersion-1.9.0-informational?style=flat-square)
![Version: 1.5.1](https://img.shields.io/badge/Version-1.5.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.9.0](https://img.shields.io/badge/AppVersion-1.9.0-informational?style=flat-square)

eBPF-based autoinstrumentation HTTP, HTTP2 and gRPC services, as well as network metrics.

Expand Down
2 changes: 1 addition & 1 deletion charts/beyla/templates/daemon-set.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ spec:
containerPort: {{ .Values.service.targetPort | default .Values.config.data.prometheus_export.port }}
protocol: TCP
{{- end }}
{{- if (and (or (.Values.service.internalMetrics.targetPort) (.Values.config.data.internal_metrics)) (not (eq .Values.config.data.prometheus_export.port .Values.config.data.internal_metrics.prometheus.port))) }}
{{- if (and (or (.Values.service.internalMetrics.targetPort) ((and .Values.config.data.internal_metrics .Values.config.data.internal_metrics.prometheus))) (not (eq .Values.config.data.prometheus_export.port .Values.config.data.internal_metrics.prometheus.port))) }}
- name: {{ .Values.service.internalMetrics.portName }}
containerPort: {{ .Values.service.internalMetrics.targetPort | default .Values.config.data.internal_metrics.prometheus.port }}
protocol: TCP
Expand Down
10 changes: 7 additions & 3 deletions docs/sources/configure/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -1364,9 +1364,7 @@ gRPC application metrics, while the rest of the **instrumentations** are be disa
YAML section `internal_metrics`.

This component reports certain internal metrics about the behavior
of the auto-instrumentation tool. Currently, only [Prometheus](https://prometheus.io/) export
is supported. It is enabled if the `internal_metrics` section
contains a `prometheus` subsection with the `port` property set.
of the auto-instrumentation tool. Currently, both [Prometheus](https://prometheus.io/) and [OTEL](https://opentelemetry.io/) metrics export are supported. Prometheus export is enabled if the `internal_metrics` section contains a `prometheus` subsection with the `port` property set. OTEL metrics export is enabled if the `internal_metrics` section contains an `otel_metrics` property set to `true`.

Example:

Expand Down Expand Up @@ -1398,6 +1396,12 @@ same values, this `internal_metrics.prometheus.path` value can be
different from `prometheus_export.path`, to keep both metric families separated,
or the same (both metric families are listed in the same scrape endpoint).

| YAML | Environment variable | Type | Default |
| ----------- | ---------------------------------------- | ---- | ------- |
| `otel_metrics` | `BEYLA_INTERNAL_METRICS_OTEL` | boolean | `false` |

Specifies whether to enable the internal metrics exporter for OpenTelemetry metrics. If set to `true`, the internal metrics are exported to the OpenTelemetry endpoint specified in the `otel_metrics_export` section or `grafana.otlp` section.

## YAML file example

```yaml
Expand Down
8 changes: 8 additions & 0 deletions pkg/beyla/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ var DefaultConfig = Config{
Printer: false, // Deprecated: use TracePrinter instead
TracePrinter: debug.TracePrinterDisabled,
InternalMetrics: imetrics.Config{
OTELMetrics: false, // disabled by default
Prometheus: imetrics.PrometheusConfig{
Port: 0, // disabled by default
Path: "/internal/metrics",
Expand Down Expand Up @@ -273,6 +274,13 @@ func (c *Config) Validate() error {
" grafana, otel_metrics_export, otel_traces_export or prometheus_export")
}

if c.InternalMetrics.OTELMetrics && c.InternalMetrics.Prometheus.Port != 0 {
return ConfigError("you can't enable both OTEL and Prometheus internal metrics")
}
if c.InternalMetrics.OTELMetrics && !c.Metrics.Enabled() && !c.Grafana.OTLP.MetricsEnabled() {
return ConfigError("you can't enable OTEL internal metrics without enabling OTEL metrics")
}

return nil
}

Expand Down
1 change: 1 addition & 0 deletions pkg/beyla/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ func TestConfigValidate(t *testing.T) {
{"BEYLA_TRACE_PRINTER": "json_indent", "BEYLA_EXECUTABLE_NAME": "foo"},
{"BEYLA_TRACE_PRINTER": "counter", "BEYLA_EXECUTABLE_NAME": "foo"},
{"BEYLA_PROMETHEUS_PORT": "8080", "BEYLA_EXECUTABLE_NAME": "foo", "INSTRUMENT_FUNC_NAME": "bar"},
{"BEYLA_INTERNAL_METRICS_OTEL_METRICS": "true", "OTEL_EXPORTER_OTLP_METRICS_ENDPOINT": "localhost:1234", "BEYLA_EXECUTABLE_NAME": "foo"},
}
for n, tc := range testCases {
t.Run(fmt.Sprint("case", n), func(t *testing.T) {
Expand Down
29 changes: 20 additions & 9 deletions pkg/components/beyla.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (

"github.com/grafana/beyla/pkg/beyla"
"github.com/grafana/beyla/pkg/export/attributes"
"github.com/grafana/beyla/pkg/export/otel"
"github.com/grafana/beyla/pkg/internal/appolly"
"github.com/grafana/beyla/pkg/internal/connector"
"github.com/grafana/beyla/pkg/internal/imetrics"
Expand All @@ -20,7 +21,10 @@ import (
// RunBeyla in the foreground process. This is a blocking function and won't exit
// until both the AppO11y and NetO11y components end
func RunBeyla(ctx context.Context, cfg *beyla.Config) error {
ctxInfo := buildCommonContextInfo(ctx, cfg)
ctxInfo, err := buildCommonContextInfo(ctx, cfg)
if err != nil {
return fmt.Errorf("can't build common context info: %w", err)
}

wg := sync.WaitGroup{}
app := cfg.Enabled(beyla.FeatureAppO11y)
Expand Down Expand Up @@ -102,7 +106,7 @@ func mustSkip(cfg *beyla.Config) string {
// from the user-provided configuration
func buildCommonContextInfo(
ctx context.Context, config *beyla.Config,
) *global.ContextInfo {
) (*global.ContextInfo, error) {
promMgr := &connector.PrometheusManager{}
ctxInfo := &global.ContextInfo{
Prometheus: promMgr,
Expand All @@ -116,7 +120,20 @@ func buildCommonContextInfo(
MetaSourceLabels: config.Attributes.Kubernetes.MetaSourceLabels,
}),
}
if config.Attributes.HostID.Override == "" {
ctxInfo.FetchHostID(ctx, config.Attributes.HostID.FetchTimeout)
} else {
ctxInfo.HostID = config.Attributes.HostID.Override
}
switch {
case config.InternalMetrics.OTELMetrics:
var err error
config.Metrics.Grafana = &config.Grafana.OTLP
slog.Debug("reporting internal metrics as OpenTelemetry")
ctxInfo.Metrics, err = otel.NewInternalMetricsReporter(ctx, ctxInfo, &config.Metrics)
if err != nil {
return nil, fmt.Errorf("can't start OpenTelemetry metrics: %w", err)
}
case config.InternalMetrics.Prometheus.Port != 0:
slog.Debug("reporting internal metrics as Prometheus")
ctxInfo.Metrics = imetrics.NewPrometheusReporter(&config.InternalMetrics.Prometheus, promMgr, nil)
Expand All @@ -133,13 +150,7 @@ func buildCommonContextInfo(

attributeGroups(config, ctxInfo)

if config.Attributes.HostID.Override == "" {
ctxInfo.FetchHostID(ctx, config.Attributes.HostID.FetchTimeout)
} else {
ctxInfo.HostID = config.Attributes.HostID.Override
}

return ctxInfo
return ctxInfo, nil
}

// attributeGroups specifies, based in the provided configuration, which groups of attributes
Expand Down
1 change: 0 additions & 1 deletion pkg/export/otel/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,6 @@ func isExponentialAggregation(mc *MetricsConfig, mlog *slog.Logger) bool {
return false
}

// TODO: restore as private
func InstantiateMetricsExporter(ctx context.Context, cfg *MetricsConfig, log *slog.Logger) (metric.Exporter, error) {
var err error
var exporter metric.Exporter
Expand Down
175 changes: 175 additions & 0 deletions pkg/export/otel/metrics_internal.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
package otel

import (
"context"
"log/slog"
"runtime"
"time"

"github.com/google/uuid"
"go.opentelemetry.io/otel/attribute"
instrument "go.opentelemetry.io/otel/metric"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/resource"
semconv "go.opentelemetry.io/otel/semconv/v1.19.0"

"github.com/grafana/beyla/pkg/buildinfo"
"github.com/grafana/beyla/pkg/internal/pipe/global"
)

// InternalMetricsReporter is an internal metrics Reporter that exports to OTEL
type InternalMetricsReporter struct {
ctx context.Context
tracerFlushes instrument.Float64Histogram
otelMetricExports instrument.Float64Counter
otelMetricExportErrs instrument.Float64Counter
otelTraceExports instrument.Float64Counter
otelTraceExportErrs instrument.Float64Counter
instrumentedProcesses instrument.Int64UpDownCounter
beylaInfo instrument.Int64Gauge
}

func imlog() *slog.Logger {
return slog.With("component", "otel.InternalMetricsReporter")
}

func NewInternalMetricsReporter(ctx context.Context, ctxInfo *global.ContextInfo, metrics *MetricsConfig) (*InternalMetricsReporter, error) {
log := imlog()
log.Debug("instantiating internal metrics exporter provider")
exporter, err := InstantiateMetricsExporter(context.Background(), metrics, log)
if err != nil {
return nil, err
}

res := newResourceInternal(ctxInfo.HostID)
provider, err := newInternalMeterProvider(res, &exporter, metrics.Interval)
meter := provider.Meter("beyla_internal")

if err != nil {
log.Error("", "error", err)
return nil, err
}
tracerFlushes, err := meter.Float64Histogram(
"beyla.ebpf.tracer.flushes",
instrument.WithDescription("Length of the groups of traces flushed from the eBPF tracer to the next pipeline stage"),
instrument.WithUnit("1"),
)
if err != nil {
return nil, err
}

otelMetricExports, err := meter.Float64Counter(
"beyla.otel.metric.exports",
instrument.WithDescription("Length of the metric batches submitted to the remote OTEL collector"),
)
if err != nil {
return nil, err
}

otelMetricExportErrs, err := meter.Float64Counter(
"beyla.otel.metric.export.errors",
instrument.WithDescription("Error count on each failed OTEL metric export"),
)
if err != nil {
return nil, err
}

otelTraceExports, err := meter.Float64Counter(
"beyla.otel.trace.exports",
instrument.WithDescription("Length of the trace batches submitted to the remote OTEL collector"),
)
if err != nil {
return nil, err
}

otelTraceExportErrs, err := meter.Float64Counter(
"beyla.otel.trace.export.errors",
instrument.WithDescription("Error count on each failed OTEL trace export"),
)
if err != nil {
return nil, err
}

instrumentedProcesses, err := meter.Int64UpDownCounter(
"beyla.instrumented.processes",
instrument.WithDescription("Instrumented processes by Beyla"),
)
if err != nil {
return nil, err
}

beylaInfo, err := meter.Int64Gauge(
"beyla.internal.build.info",
instrument.WithDescription("A metric with a constant '1' value labeled by version, revision, branch, goversion from which Beyla was built, the goos and goarch for the build."),
)
if err != nil {
return nil, err
}

return &InternalMetricsReporter{
ctx: ctx,
tracerFlushes: tracerFlushes,
otelMetricExports: otelMetricExports,
otelMetricExportErrs: otelMetricExportErrs,
otelTraceExports: otelTraceExports,
otelTraceExportErrs: otelTraceExportErrs,
instrumentedProcesses: instrumentedProcesses,
beylaInfo: beylaInfo,
}, nil
}

func newResourceInternal(hostID string) *resource.Resource {
attrs := []attribute.KeyValue{
semconv.ServiceName("beyla-internal"),
semconv.ServiceInstanceID(uuid.New().String()),
semconv.TelemetrySDKLanguageKey.String(semconv.TelemetrySDKLanguageGo.Value.AsString()),
// We set the SDK name as Beyla, so we can distinguish beyla generated metrics from other SDKs
semconv.TelemetrySDKNameKey.String("beyla"),
semconv.HostID(hostID),
}

return resource.NewWithAttributes(semconv.SchemaURL, attrs...)
}

func newInternalMeterProvider(res *resource.Resource, exporter *metric.Exporter, interval time.Duration) (*metric.MeterProvider, error) {
meterProvider := metric.NewMeterProvider(
metric.WithResource(res),
metric.WithReader(metric.NewPeriodicReader(*exporter, metric.WithInterval(interval))),
)
return meterProvider, nil
}

func (p *InternalMetricsReporter) Start(ctx context.Context) {
p.beylaInfo.Record(ctx, 1, instrument.WithAttributes(attribute.String("goarch", runtime.GOARCH), attribute.String("goos", runtime.GOOS), attribute.String("goversion", runtime.Version()), attribute.String("version", buildinfo.Version), attribute.String("revision", buildinfo.Revision)))
}

func (p *InternalMetricsReporter) TracerFlush(len int) {
p.tracerFlushes.Record(p.ctx, float64(len))
}

func (p *InternalMetricsReporter) OTELMetricExport(len int) {
p.otelMetricExports.Add(p.ctx, float64(len))
}

func (p *InternalMetricsReporter) OTELMetricExportError(err error) {
p.otelMetricExportErrs.Add(p.ctx, 1, instrument.WithAttributes(attribute.String("error", err.Error())))
}

func (p *InternalMetricsReporter) OTELTraceExport(len int) {
p.otelTraceExports.Add(p.ctx, float64(len))
}

func (p *InternalMetricsReporter) OTELTraceExportError(err error) {
p.otelTraceExportErrs.Add(p.ctx, 1, instrument.WithAttributes(attribute.String("error", err.Error())))
}

func (p *InternalMetricsReporter) PrometheusRequest(_, _ string) {
}

func (p *InternalMetricsReporter) InstrumentProcess(processName string) {
p.instrumentedProcesses.Add(p.ctx, 1, instrument.WithAttributes(attribute.String("process_name", processName)))
}

func (p *InternalMetricsReporter) UninstrumentProcess(processName string) {
p.instrumentedProcesses.Add(p.ctx, -1, instrument.WithAttributes(attribute.String("process_name", processName)))
}
3 changes: 2 additions & 1 deletion pkg/internal/imetrics/imetrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ import (

// Config options for the different metrics exporters
type Config struct {
Prometheus PrometheusConfig `yaml:"prometheus,omitempty"`
Prometheus PrometheusConfig `yaml:"prometheus,omitempty"`
OTELMetrics bool `yaml:"otel_metrics,omitempty" env:"BEYLA_INTERNAL_METRICS_OTEL_METRICS"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to be less redundant, rename YAML property to otel or use_otel and the env var to BEYLA_INTERNAL_METRICS_OTEL or BEYLA_INTERNAL_METRICS_USE_OTEL?

Even, to rely less on booleans for config, replace this property for something like protocol or exporter, which can be none, prometheus or otel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even, to rely less on booleans for config, replace this property for something like protocol or exporter, which can be none, prometheus or otel.

Is not a bad idea, but that would imply breaking changes, no? basically we have to force everyone to set exporter and prometheus config for internal metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can document that breaking change in Beyla 2.0.

If we want to keep backwards compatibility, maybe we could default it to prometheus, and explain that it will only have effect if the prometheus subsection is set.

But anyway that was just a suggestion. I'm fine with current implementation.

}

// Reporter of internal metrics
Expand Down
Loading