-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 Event Decoding Consistency #1687
Comments
I like the suggestion that it should be configurable where we save the data to avoid conflicts. |
I also like the suggestion that the destination ( From my perspective, it's easier to deal with shallow (root-level) fields in OpenSearch and other processors because I like the suggestion that the |
… by default. Resolves opensearch-project#1687 Signed-off-by: David Venable <[email protected]>
…rom the message key. Resolves opensearch-project#1687 Signed-off-by: David Venable <[email protected]>
Support a configurable key for S3 metadata and make this base key s3/ by default. Moved the output of JSON from S3 objects into the root of the Event from the message key. Resolves #1687 Signed-off-by: David Venable <[email protected]>
Is your feature request related to a problem? Please describe.
The
s3
source includes two codes in 1.5 and a new codec for CSV processing is coming in 2.0. These populate Events somewhat differently.newline-delimited
-> The newline is saved to themessage
key of the Event. This is a single string.json
-> The JSON is expanded intomessage
. So, if the JSON has a key namedsourceIp
, it is populated in/message/sourceIp
.csv
-> Each key is expanded directly into the root of the Event (/
). Thus, if the CSV has a key namedsourceIp
, it is populated in/sourceIp
.Also, the
s3
processor adds two special keys to all Events:bucket
andkey
. These indicate the S3 bucket and key, respectively, for the object. The S3 Processor populates this, not the Codecs.Describe the solution you'd like
First, all codecs should put the data in the same place consistently. Second, we should decide where we want this data to reside (
/message
or/
). Third, it should avoid conflicting with thebucket
andkey
.One possible solution is to change the
s3
source to save thebucket
andkey
to a top-level object nameds3
. Then the codecs save to the root (/
). This could lead to conflicts if the actual data has a column or field nameds3
. But, if we make this key configurable, then pipeline authors could potentially avoid this.Describe alternatives you've considered (Optional)
An alternative would be more robust support for Event metadata. The bucket and key could be saved as metadata. However, Data Prepper's conditional routing and processors don't support Event metadata presently.
Additional context
The text was updated successfully, but these errors were encountered: