-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to send multiple log events as a record #105
Conversation
README.md
Outdated
@@ -31,7 +31,7 @@ Run `make` to build `./bin/firehose.so`. Then use with Fluent Bit: | |||
* `time_key`: Add the timestamp to the record under this key. By default the timestamp from Fluent Bit will not be added to records sent to Kinesis. | |||
* `time_key_format`: [strftime](http://man7.org/linux/man-pages/man3/strftime.3.html) compliant format string for the timestamp; for example, `%Y-%m-%dT%H:%M:%S%z`. This option is used with `time_key`. You can also use `%L` for milliseconds and `%f` for microseconds. If you are using ECS FireLens, make sure you are running Amazon ECS Container Agent v1.42.0 or later, otherwise the timestamps associated with your container logs will only have second precision. | |||
* `replace_dots`: Replace dot characters in key names with the value of this option. For example, if you add `replace_dots _` in your config then all occurrences of `.` will be replaced with an underscore. By default, dots will not be replaced. | |||
|
|||
* `multiple_log_events_per_record`: Option to allow plugin send multiple log events in the same record if current record not exceed the maximumRecordSize(1MiB), default to be `false`, set to `true` to enable this option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this option simple_aggregation
. And then the description should note that it joins together as many log records as possible into a single Firehose record and delimits them with newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also may be add that they should consider increasing the Fluent Bit flush interval in the service section to increase the number of logs sent per flush (to get better aggregation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering, why this can't be enabled by default. Why do we need a settings. Whats the complexity if we do this as defult behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting it by default would break backwards compatibility with the old behavior...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALSO- this option is only for some customers/use cases. If the log events are all joined together into one record, that only works for some Firehose destinations like S3. For other like elasticsearch, concatenating events will break experiences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something- why would it break the existing experience? All the data are still being send but in a larger chunk (multiple events in one record). Isn't it the implementation details how we are sending data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I missed your previous comment. Make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to simple_aggregation
everywhere.
e73819b
to
13d20be
Compare
README.md
Outdated
@@ -31,7 +31,7 @@ Run `make` to build `./bin/firehose.so`. Then use with Fluent Bit: | |||
* `time_key`: Add the timestamp to the record under this key. By default the timestamp from Fluent Bit will not be added to records sent to Kinesis. | |||
* `time_key_format`: [strftime](http://man7.org/linux/man-pages/man3/strftime.3.html) compliant format string for the timestamp; for example, `%Y-%m-%dT%H:%M:%S%z`. This option is used with `time_key`. You can also use `%L` for milliseconds and `%f` for microseconds. If you are using ECS FireLens, make sure you are running Amazon ECS Container Agent v1.42.0 or later, otherwise the timestamps associated with your container logs will only have second precision. | |||
* `replace_dots`: Replace dot characters in key names with the value of this option. For example, if you add `replace_dots _` in your config then all occurrences of `.` will be replaced with an underscore. By default, dots will not be replaced. | |||
|
|||
* `simple_aggregation`: Option to allow plugin send multiple log events in the same record if current record not exceed the maximumRecordSize(1MiB). It joins together as many log records as possible into a single Firehose record and delimits them with newline. default to be `false`, set to `true` to enable this option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a space here: maximumRecordSize(1MiB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to mention, this option is good to have enabled if you are sending data to a destination which supports aggregation, like S3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where should the space be added? comments looks like the same with code change. Do you mean (1 MiB)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I messed it up: maximumRecordSize (1 MiB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! updated, also mentioned the suggestion
Data: data, | ||
}) | ||
if output.simpleAggregation && len(output.records) > 0 && len(output.records[len(output.records)-1].Data) + newDataSize <= maximumRecordSize { | ||
output.records[len(output.records)-1].Data = append(output.records[len(output.records)-1].Data, data...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are you adding the newline??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this part, new line has been added in processRecord
in this line as it used to and I didn't change this.
And the current output is like:
{"cpu0_p_cpu":0,"cpu0_p_system":0,"cpu0_p_user":0,"cpu1_p_cpu":0,"cpu1_p_system":0,"cpu1_p_user":0,"cpu2_p_cpu":0,"cpu2_p_system":0,"cpu2_p_user":0}
{"cpu0_p_cpu":0,"cpu0_p_system":0,"cpu0_p_user":0,"cpu1_p_cpu":0,"cpu1_p_system":0,"cpu1_p_user":0,"cpu2_p_cpu":0,"cpu2_p_system":0,"cpu2_p_user":0}
I wonder if this is correct output or a newline expected like this:
{"cpu0_p_cpu":0,"cpu0_p_system":0,"cpu0_p_user":0,"cpu1_p_cpu":0,"cpu1_p_system":0,"cpu1_p_user":0,"cpu2_p_cpu":0,"cpu2_p_system":0,"cpu2_p_user":0}
{"cpu0_p_cpu":0,"cpu0_p_system":0,"cpu0_p_user":0,"cpu1_p_cpu":0,"cpu1_p_system":0,"cpu1_p_user":0,"cpu2_p_cpu":0,"cpu2_p_system":0,"cpu2_p_user":0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you're good. We don't want two newlines.
nit: maybe you can update the screenshot in the PR description as you changed the field name. |
Thanks! Update. |
Signed-off-by: Drew Zhang <[email protected]>
13d20be
to
7622b07
Compare
Signed-off-by: Drew Zhang [email protected]
Issue #: #12
Description of changes:
Add the option for allow multiple log events added in the same record if the maxRecordSizeLimit(1MiB) is not exceeded.
Running result:
Unit Test:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.