Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix overflowing batch size #1310

Merged
merged 2 commits into from
Jul 14, 2020

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Jul 9, 2020

Signed-off-by: Pavol Loffay [email protected]

Description: <Describe what has changed.

This PR adds configuration property enfoce_batch_size to batch processor that ensures that batch size does not overflow the configured max_batch_size. The default value is false hence it does not change the default behavior.

If the incoming batch size is bigger than available space in cached traces then the incoming batch is split into reaming size and size of the maximum batch size. The spans that would overflow the buffer are sent again in trace data over the channel.

Link to tracking Issue:

Resolves #1140
Related to #1020

Testing: < Describe what testing was performed and which tests were added.>

Documentation: < Describe the documentation added.>

enfoce_batch_size has been added to the readme.

@codecov
Copy link

codecov bot commented Jul 9, 2020

Codecov Report

Merging #1310 into master will increase coverage by 0.03%.
The diff coverage is 97.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1310      +/-   ##
==========================================
+ Coverage   90.01%   90.05%   +0.03%     
==========================================
  Files         216      217       +1     
  Lines       15200    15244      +44     
==========================================
+ Hits        13683    13728      +45     
  Misses       1101     1101              
+ Partials      416      415       -1     
Impacted Files Coverage Δ
processor/batchprocessor/splittraces.go 97.05% <97.05%> (ø)
processor/batchprocessor/batch_processor.go 98.01% <100.00%> (+0.21%) ⬆️
translator/internaldata/resource_to_oc.go 86.04% <0.00%> (+2.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c7de06...0d5e6c7. Read the comment docs.

@dmitryax
Copy link
Member

dmitryax commented Jul 10, 2020

The original idea of the batcher is to batch traces together and do it as performant as possible. That why it doesn't enforce any hard limit on the batch size. It might be not clearly described in docs, but we have this:

- `send_batch_size` (default = 8192): Number of spans or metrics after which a batch will be sent.

I don't think we should add the complexity to enforce the hard limit by default. This behavior might not be what the users want, but can degrade performance.

What do you think if we keep existing behavior for send_batch_size parameter (may be to make it more clear in docs that it's not a hard limit) and add another config parameter that would enforce the hard limit and cause splitting traces?

@pavolloffay
Copy link
Member Author

What do you think if we keep existing behavior for send_batch_size parameter (may be to make it more clear in docs that it's not a hard limit) and add another config parameter that would enforce the hard limit and cause splitting traces?

It looks good to me

@pavolloffay
Copy link
Member Author

@dmitryax thanks for the feedback. I have made the changes and made the splitting configurable.

@@ -141,6 +143,16 @@ func (bp *batchTraceProcessor) startProcessingCycle() {
for {
select {
case td := <-bp.newTraceItem:
if bp.enforceBatchSize {
Copy link
Member

@bogdandrutu bogdandrutu Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about some simpler approach. Add items as normal to the batch, then do a simple split not by max size. It is a bit more overhead maybe but simpler logic I feel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be even better from the perf standpoint. The buckets of max size would have to be probably split also bc there can be other items arriving in the meantime (unless we push them directly).

@pavolloffay pavolloffay force-pushed the batch-size-overflow branch from f93cb7a to 9728c7d Compare July 13, 2020 12:56
@pavolloffay
Copy link
Member Author

@bogdandrutu I have rebased the PR and simplified it to do only a single split per consume.

@bogdandrutu bogdandrutu self-requested a review July 13, 2020 22:23
Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay pavolloffay force-pushed the batch-size-overflow branch from 650d1c8 to 1a5c564 Compare July 14, 2020 07:09
Signed-off-by: Pavol Loffay <[email protected]>
@bogdandrutu bogdandrutu merged commit 5c7db8c into open-telemetry:master Jul 14, 2020
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this pull request Nov 11, 2021
* Update Span End method documentation

Updates to the Span after End is called result it potentially
inconsistent views of the Span between the code defining it and the
ultimate receiver of the Span data. This corrects the documented
language of the API to prevent this from happening.

* Add changes to changelog
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch processor returns a bigger batch than configured
3 participants