Create a model for binary data #4096

dlvenable · 2024-02-08T22:08:21Z

Is your feature request related to a problem? Please describe.

Data Prepper has sources which can pull binary data (mostly in base64) format. And we are adding some new processors which can decompress binary data. It would be good to handle binary data consistently so that we don't too much code spread across the project which will result in some processor combinations breaking a pipeline.

I'd like Data Prepper's sources and sinks to know their own encodings as much as possible.

Describe the solution you'd like

Create a new BinaryData model in data-prepper-api. Allow this to be set and retrieved from the Event model. This model can also be designed to avoid unnecessary encoding/decoding.

When a Data Prepper source gets binary data, it wraps it in the BinaryData model. Similarly, when writing to a sink use that same model.

There are some situations where the source cannot know the encoding. For example, JSON could have binary data encoded as base64 or base64. In such cases, the pipeline author will need to know the encoding and convert it accordingly.

class BinaryData {
  public byte[] getBinaryData();
  
  public static fromBase64Data(String base64) { ... }
}

There may also be an good way to decouple the binary data from the encoding itself.

Describe alternatives you've considered (Optional)

There may some useful third party libraries that have a similar solution we could make use of. Though, I'd still propose we keep our interface and use that for the internals.

Additional context

Coming from this comment: #4016 (comment)

The text was updated successfully, but these errors were encountered:

dlvenable added the untriaged label Feb 8, 2024

github-project-automation bot added this to Data Prepper Tracking Board Feb 8, 2024

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Feb 8, 2024

dlvenable mentioned this issue Feb 8, 2024

Decompress processor #4016

Closed

asifsmohammed added enhancement New feature or request and removed untriaged labels Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a model for binary data #4096

Create a model for binary data #4096

dlvenable commented Feb 8, 2024

Create a model for binary data #4096

Create a model for binary data #4096

Comments

dlvenable commented Feb 8, 2024