-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support generic parsers/codecs #1532
Comments
Here is a concept for how Data Prepper can provide this to pipeline authors. Define Interfaces in data-prepper-api The source codec interface can exist in the It might look like the following.
Create plugin projects for each type Under Taking CSV as an example, we could have a project: Use the plugin framework for loading codecs The Data Prepper plugin framework supports arbitrary interfaces. This can follow the same pattern as HTTP authentication in Armeria. Following the CSV example, we might have the following class in
Sources load using the plugin framework The S3 source, for example, can load from the plugin framework similar to how the HTTP source loads authentication. Unlike HTTP, the S3 source should not have a default value. |
Support for Source Codecs opensearch-project#1532
Based on some of the changes coming to support this, I think the interface should have some modifications to support additional flexibility. Here are some of the things that we may need
Thoughts on this approach? @kkondaka , @graytaylor0 , @umairofficial |
I'm re-opening this issue to improve the interface before we release 2.3. |
In order to support Parquet codecs, we may need to update the codec interface to attempt to support seekable input. Ideally, this means we have two forms of codecs - a base codec, and a seekable codec. It is possible that not all sources will support the means of choosing bytes. |
Is your feature request related to a problem? Please describe.
Both the S3 Source and HTTP Source use similar concepts of codecs for parsing input data. The S3 Source currently makes these codecs available as plugins. So they can be extended for the S3 source. But, if another source wanted to use these plugins it would be unable to.
Describe the solution you'd like
Create a core concept in Data Prepper of source-based codecs or parsers. These should be generic enough to take any Java
InputStream
and produce events from them.I propose that we based this concept on the S3 codec. It has a few advantages:
Consumer
for each event. This also allows the source using the codec to receiveEvent
objects and decide independently of the Codec of the best way to handle these.Describe alternatives you've considered (Optional)
Data Prepper can have a similar concept for output codecs/parsers. However, I see no reason to force these to be the same concept. (Implementors may choose to pair them together to avoid code duplication).
Additional context
S3 Codec interface:
data-prepper/data-prepper-plugins/s3-source/src/main/java/com/amazon/dataprepper/plugins/source/codec/Codec.java
Lines 19 to 27 in 37c8b09
HTTP Codec interface:
data-prepper/data-prepper-plugins/http-source/src/main/java/com/amazon/dataprepper/plugins/source/loghttp/codec/Codec.java
Lines 16 to 23 in 37c8b09
The text was updated successfully, but these errors were encountered: