Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging OTLP support #556

Open
8 of 23 tasks
a-thaler opened this issue Dec 9, 2022 · 2 comments
Open
8 of 23 tasks

Logging OTLP support #556

a-thaler opened this issue Dec 9, 2022 · 2 comments
Assignees
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented Dec 9, 2022

Motivation
The telemetry module was initially designed to be based fully on the OpenTelemetry project. As the logs domain of the project was not stable yet and also the adoption is simply not there yet, the module was released having the application logs based on Fluentbit using the proprietary HTTP output. The SAP Cloud Logging backend was not supporting OTLP as well.

These criterias changed and the adoption slowly kicks-in:

  • On the backends:
    • SAP Cloud Logging supports OTLP based ingestion with much better performance and much better correlation capabilities with traces and metrics.
    • Most 3party service ofeerings are supporting it like dynatrace, signalFX or honeycomb
    • More and more backends are supporting it, also Loki is in an active implementation for adding support.
  • On the clients:
    • The otel-sdk is getting more and more used for instrumentation of traces and metrics and supports logs as well, so people will start trying it out.
    • Istio access logging is supporting OTLP, avoiding the mix up of the access logs with the envoy application logs via stdout.
    • SAP CAP starts using the otel-sdk

The client adoption brings the benefit of streamlining the setup by introducing a gateway like done for traces and metrics already, allowing clients to directly push logs without the indirection via stdout, enabling more opportunities like enabling easy collection of kubernetes event logs,

Making logs based on OTLP will allow a streamlined approach to telemetry data across traces and metrics. All data can have streamlined attributes following streamlined semantics.

Also the proprietary protocol used with fluentbit is strongly aligned with the SAP Cloud Logging API and is not common for ingestion to other systems, so the usage is very limited.

Fluentbit as technology is different to the otel-collector framework and not that flexible. A lot of synergies can be seen in the telemetry-manager between the traces and metrics domain and could be applied to logging as well, simplifying the code base and maintenance.

Goal and requirements

Goal is to make logs based on the OTLP protocol (for backend ingestion to support more providers, for client ingestion to avoid indirection via stdout if not desired), leveraging the otel-collector framework to have a streamlined technology stack, still supporting collection of logs via stdout.

The requirements can be split into three parts:

  • backend OTLP ingestion support
    • support of OTLP outputs similar to traces and metrics
    • data will get enriched in OTLP fashion following same enrichment logic as already done for traces and metrics, covering mainly the cluster name, the k8s resource attributes and the service name
  • tail based log collection enabled by dedicated input
    • similar to metrics input, have a way to enable log collection via runtime log tailing
    • support namespace filtering
    • try a meaningful mapping of log entries to the OTLP protocol in case of JSON logs
    • optional support dedicated mapping instructions per workload
  • client OTLP support (not needed for feature parity)
    • introduce an endpoint to push logs in OTLP protocol similar to metrics/traces

The new API must be available in parallel to the old API which will get deprecated but will stay till there is no usage anymore.
With kyma-project/kyma#15932 fluentbit introduced support for OTLP and a first iteration might introduce the OTLP output based on fluenbit. However, first investigations revealed that fluentbit does neglect OTLP resource attributes, especially the enrichment of relevant k8s metadata will be not possible. With that a direct jump to the otel-collector seems to be needed (which is desired anyway).

Targetted Architecture
logs drawio

Actions

@a-thaler a-thaler changed the title Logging OTEL support Logging OTLP support Dec 9, 2022
@a-thaler
Copy link
Collaborator Author

The testing of the OTLP output revealed serious problems. Some got fixed with 2.0.9 but still problems are present.

@kyma-bot
Copy link
Contributor

kyma-bot commented May 7, 2023

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2023
@a-thaler a-thaler added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 8, 2023
@a-thaler a-thaler transferred this issue from kyma-project/kyma Nov 20, 2023
@a-thaler a-thaler added area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature. and removed area/logging labels Nov 20, 2023
This was referenced Jan 9, 2024
@a-thaler a-thaler self-assigned this Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

2 participants