-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: add bytes limit for a complete log #17387
Comments
@djaglowski Any idea about this feature? |
Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This looks like a separate issue from the |
Sounds reasonable. The immediate question is whether or not it should be the concern of the recombine operator to truncate the final line before batching. It's not clear to me that it should, but your proposal indicates otherwise. I may need some help understanding the considerations here. |
I don't see why this would be useless. Perhaps |
If I'm understanding correctly, you are suggesting that the truncated data should become the first line in the next batch. Is that right? |
Regarding truncation, an alternative would be to drop truncated data and any additional lines until |
yeah, these two issues can be addressed separately. |
Firstly, I think the best time to truncate the log is before the aggregation of entries because there would be some redundant action if we do it after the aggregation. |
For this, If users want to limit the bytes size of a log, 'max_log_size' is precise and simple, in which case 'max_batch_log' is perfectly replaced. Will users use 'max_batch_size' to limit other things? |
Actually, this solution addresses the second problem I mentioned, which means that once a log was truncated, a mark 'isTruncated' would be recorded, and when the remaing entries which belong to previous log come in, the mark can tell that a truncate action is happened at last flush, then we can aggregate them together. |
yeah, I also considered this alternative, but I think dropping the data directly may not be a very good choice for a agent. Whether such trucated log is useful should be left to the users to judge. |
We do have to consider existing users. The I agree Edit: One additional consideration here, is that |
I agree with the overall functionality but I think we should explore in the implementation whether we really need the mark. It may be enough to just truncate, flush, and insert the remainder as the first line in the new batch. |
I'm good with this. If it proves necessary in the future, it should be easy enough to add a configuration flag for this. |
How does limiting to a maximum size in bytes work? In what representation is the size measured? Does this require serialization and length checking the serialized data and then throwing away the serialized data? |
OK,I agree with it. |
There is another point needs to be discuss. when the bytes size of entries in a batch reaches the limit, how to deal with these entries? I think there is two way:
|
I think if limit bytes size in recombine operator, there are no serialization requirements. And when the data comes to recombine operator from the file reader, the body of the entry is a string or bytes array whose bytes size can be easily obtained. Are there any issues I missed in the process? |
@djaglowski Hello, I would like to work on this feature. Is there anything I should know before starting (Besides the contributing guidelines)? Thank you |
…ig to recombine operator (#17387) (#18089) "max_log_size" is added to the config of recombine operator. Once the total bytes size of the combined field exceeds the limit, all received entries of the source will be combined and flushed. We've had discussion around this feature (see #17387). Here I choose the "soft limit" rather than truncating the log to get the fixed bytes size when total size exceeds the limit because I think soft limitation is enough for users on this occasion.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
pkg/stanza
Is your feature request related to a problem? Please describe.
version: v0.61.0
When I use recombine operator to aggregate log entries, I found some problems:
Describe the solution you'd like
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: