-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support byte stream log collection #3267
Comments
@djaglowski what do you think, are requirements 2 and 3 possible to implement? |
Contributes to: open-telemetry/opentelemetry-specification#780 open-telemetry/opentelemetry-collector-contrib#3267 Issue open-telemetry/opentelemetry-specification#780 nicely describes why the data type is needed. There are several use cases for binary data, both for trace and log attributes and for log record Body. This is a backward compatible addition. After this change is merged no senders will initially exist that emit binary data. Nevertheless, if such data is received by the Collector it will correctly pass such data intact through the pipeline when receiving/sending OTLP (no Collector code changes are needed for this). We do not yet have binary data type in the OpenTelemetry API, so no existing sources can emit it yet. The receivers that do not understand the binary data type should also continue functioning normally. Collector's current implementation treats any unknown data type as NULL (and this would apply to binary data type until we teach the Collector to understand binary specifically). I checked the Collector source code and this should not result in crashes or overly surprising behavior (NULL is either ignored or treated as an "unknown" branch in the code which does not care about it). We will add full support for binary data to the Collector, particularly to support translating it correctly to other formats (e.g. Jaeger, which supports binary type natively). Note: the addition of this data type to the protocol is not an obligation to expose the data type in the Attributes API. OTLP has always been a superset of what is possible to express via the API. The addition of the data type in the Attributes API should be discussed separately in the specification repo.
@tigrannajaryan This should be possible.
👍
My initial thought here is that we should leave
I like the In any case, we'll need to implement a new
Makes sense. Should be easy enough - probably just add an |
@djaglowski Great, thanks for validating the idea.
What happens currently when |
Contributes to: open-telemetry/opentelemetry-specification#780 open-telemetry/opentelemetry-collector-contrib#3267 Issue open-telemetry/opentelemetry-specification#780 nicely describes why the data type is needed. There are several use cases for binary data, both for trace and log attributes and for log record Body. This is a backward compatible addition. After this change is merged no senders will initially exist that emit binary data. Nevertheless, if such data is received by the Collector it will correctly pass such data intact through the pipeline when receiving/sending OTLP (no Collector code changes are needed for this). We do not yet have binary data type in the OpenTelemetry API, so no existing sources can emit it yet. The receivers that do not understand the binary data type should also continue functioning normally. Collector's current implementation treats any unknown data type as NULL (and this would apply to binary data type until we teach the Collector to understand binary specifically). I checked the Collector source code and this should not result in crashes or overly surprising behavior (NULL is either ignored or treated as an "unknown" branch in the code which does not care about it). We will add full support for binary data to the Collector, particularly to support translating it correctly to other formats (e.g. Jaeger, which supports binary type natively). Note: the addition of this data type to the protocol is not an obligation to expose the data type in the Attributes API. OTLP has always been a superset of what is possible to express via the API. The addition of the data type in the Attributes API should be discussed separately in the specification repo.
I like the proposal of |
@maxgolov the current default for The user can set a different value. For byte stream collection this essentially will define the maximum size of each chunk. The aim should be that it is not too small so that there is significant overhead per chunk and not too big so that it consume lots of memory. 64KiB sounds reasonable, 1MiB may be too much and consume too much memory if there is a large number of incoming streams handled simultaneously. |
@tigrannajaryan, it appears that you are exactly correct on this, and that the current behavior of // Nop is the nop encoding. Its transformed bytes are the same as the source
// bytes; it does not replace invalid UTF-8 sequences. Unsurprisingly, With that, I think your original proposal details are just about spot on. A couple more points of clarification:
I believe the
If the nop / !nop use cases are different enough to warrant it, we can have the default
|
This functionality has now been added to the log-collection library, and will be included in the next release. |
Logs are today assumed to be UTF text, which is split into records using some delimiter pattern. In some cases it is useful to treat the logs files simply as byte streams with unspecified encoding. It would be useful to have this capability added.
To enable it we need the following:
filelog
whenencoding=nop
populate theBody
with binary data type instead of string data type (which according to OTLP's Protobuf requirements is limited to valid UTF-8 strings.multiline
setting?). Split into chunks by honouringmax_log_size
setting to limit the size of each individual record.The text was updated successfully, but these errors were encountered: