You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Data Prepper has sources which can pull binary data (mostly in base64) format. And we are adding some new processors which can decompress binary data. It would be good to handle binary data consistently so that we don't too much code spread across the project which will result in some processor combinations breaking a pipeline.
I'd like Data Prepper's sources and sinks to know their own encodings as much as possible.
Describe the solution you'd like
Create a new BinaryData model in data-prepper-api. Allow this to be set and retrieved from the Event model. This model can also be designed to avoid unnecessary encoding/decoding.
When a Data Prepper source gets binary data, it wraps it in the BinaryData model. Similarly, when writing to a sink use that same model.
There are some situations where the source cannot know the encoding. For example, JSON could have binary data encoded as base64 or base64. In such cases, the pipeline author will need to know the encoding and convert it accordingly.
class BinaryData {
public byte[] getBinaryData();
public static fromBase64Data(String base64) { ... }
}
There may also be an good way to decouple the binary data from the encoding itself.
Describe alternatives you've considered (Optional)
There may some useful third party libraries that have a similar solution we could make use of. Though, I'd still propose we keep our interface and use that for the internals.
Is your feature request related to a problem? Please describe.
Data Prepper has sources which can pull binary data (mostly in base64) format. And we are adding some new processors which can decompress binary data. It would be good to handle binary data consistently so that we don't too much code spread across the project which will result in some processor combinations breaking a pipeline.
I'd like Data Prepper's sources and sinks to know their own encodings as much as possible.
Describe the solution you'd like
Create a new
BinaryData
model indata-prepper-api
. Allow this to be set and retrieved from theEvent
model. This model can also be designed to avoid unnecessary encoding/decoding.When a Data Prepper source gets binary data, it wraps it in the
BinaryData
model. Similarly, when writing to a sink use that same model.There are some situations where the source cannot know the encoding. For example, JSON could have binary data encoded as base64 or base64. In such cases, the pipeline author will need to know the encoding and convert it accordingly.
There may also be an good way to decouple the binary data from the encoding itself.
Describe alternatives you've considered (Optional)
There may some useful third party libraries that have a similar solution we could make use of. Though, I'd still propose we keep our interface and use that for the internals.
Additional context
Coming from this comment: #4016 (comment)
The text was updated successfully, but these errors were encountered: