Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otel_batch_processor): don't divide max_queue_size by word-size #635

Conversation

SergeTupchiy
Copy link
Contributor

There is no need to convert max_queue_size to words, as it is compared with the number of objects in ETS table: https://github.com/open-telemetry/opentelemetry-erlang/blob/main/apps/opentelemetry/src/otel_batch_processor.erl#L278.

@SergeTupchiy SergeTupchiy requested a review from a team October 10, 2023 09:36
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 10, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@bryannaegele
Copy link
Contributor

I think that might actually be a bug in the code you linked as I think the intention is to limit memory consumption, not the number of spans. The rules around limits are focused around things that affect memory pressure.

I think the info check should actually be this:

case ets:info(?CURRENT_TABLE(RegName), memory) of

@SergeTupchiy
Copy link
Contributor Author

SergeTupchiy commented Oct 10, 2023

I think that might actually be a bug in the code you linked as I think the intention is to limit memory consumption, not the number of spans. The rules around limits are focused around things that affect memory pressure.

I think the info check should actually be this:

case ets:info(?CURRENT_TABLE(RegName), memory) of

Hi @bryannaegele,
Thanks for your reply.
It used to be compared against memory size (in words) indeed:

I'd be happy to fix it to use memory instead, but I think we also need to account for some pre allocated memory per table:

> ets:info('otel_batch_processor_otel_export_table1_<0.4149.0>', memory).
1377
> ets:info('otel_batch_processor_otel_export_table1_<0.4149.0>', size).  
0

Basically, an empty table already occupies more words than the default max_queue_size: 2048 div erlang:system_info(wordsize) = 256 (OTP 25)

@tsloughter
Copy link
Member

Wow, yea, this got all messed up...

Hm, because spans have options for limits on attributes maybe we should just use the # of spans instead of size of the table?

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
apps/opentelemetry/src/otel_batch_processor.erl 85.71% <100.00%> (+4.56%) ⬆️

📢 Thoughts on this report? Let us know!.

@bryannaegele
Copy link
Contributor

What if we supported both? integer for number of entries, {memory_unit, integer} for memory size? I suspect most folks would want to use a memory cap as span sizes and volume can vary so greatly and should be the default method.

@SergeTupchiy
Copy link
Contributor Author

SergeTupchiy commented Oct 12, 2023

What if we supported both? integer for number of entries, {memory_unit, integer} for memory size? I suspect most folks would want to use a memory cap as span sizes and volume can vary so greatly and should be the default method.

@bryannaegele , @tsloughter
I think this would be fine. Yet another choice is to support both options simultaneously, e.g., max_queue_len and max_queue_mem_size. It would probably look slightly more clear/familiar for a user. It would also still be possible to use only one by setting the other to infinity.

handle_event_(_State, {timeout, check_table_size}, check_table_size, #data{max_queue_len=MaxQueueLen,
                                                                           max_queue_mem_size=MaxQueueMemSize
                                                                           reg_name=RegName}) ->
   L = ets:info(?CURRENT_TABLE(RegName), size),
   M = ets:info(?CURRENT_TABLE(RegName), memory)
   %% MaxQueueMemSize must be adjusted for pre-allocated (empty) ETS table size  
   case L >= MaxQueueLen orelse M >= MaxQueueMemSize of
       true ->
           disable(RegName),
           keep_state_and_data;
       false ->
           enable(RegName),
           keep_state_and_data
   end;
...

For backward-compatibility with the previous config max_queue_size can be translated to max_queue_len.
Also it'd be necessary to come up with some sensible default values for both parameters.

@SergeTupchiy
Copy link
Contributor Author

What if we supported both? integer for number of entries, {memory_unit, integer} for memory size? I suspect most folks would want to use a memory cap as span sizes and volume can vary so greatly and should be the default method.

@bryannaegele , @tsloughter I think this would be fine. Yet another choice is to support both options simultaneously, e.g., max_queue_len and max_queue_mem_size. It would probably look slightly more clear/familiar for a user. It would also still be possible to use only one by setting the other to infinity.

handle_event_(_State, {timeout, check_table_size}, check_table_size, #data{max_queue_len=MaxQueueLen,
                                                                           max_queue_mem_size=MaxQueueMemSize
                                                                           reg_name=RegName}) ->
   L = ets:info(?CURRENT_TABLE(RegName), size),
   M = ets:info(?CURRENT_TABLE(RegName), memory)
   %% MaxQueueMemSize must be adjusted for pre-allocated (empty) ETS table size  
   case L >= MaxQueueLen orelse M >= MaxQueueMemSize of
       true ->
           disable(RegName),
           keep_state_and_data;
       false ->
           enable(RegName),
           keep_state_and_data
   end;
...

For backward-compatibility with the previous config max_queue_size can be translated to max_queue_len. Also it'd be necessary to come up with some sensible default values for both parameters.

I've realized that max_queue_size is actually a common config param (as well as a few others) as proposed in opentelemetry-config: https://github.com/open-telemetry/oteps/blob/main/text/assets/0225-config.yaml#L135, so we probably shouldn't rename/change it.
Also, it is not mentioned that it is memory limit in bytes, so I would infer that it just means the number of spans in the queue.

A few SDKs in other languages that use it as a max queue length indeed:

@SergeTupchiy
Copy link
Contributor Author

@tsloughter , @bryannaegele,
Found another related issue: check_table_size timeout has not been started at all, attempted to fix it in this commit: 42b8376

@tsloughter
Copy link
Member

I'm thinking we should go with this and can add a max_queue_size_bytes if desired in the future if desired.

@bryannaegele does that sound good?

@bryannaegele
Copy link
Contributor

Yep. I'm fine with length

There is no need to convert `max_queue_size` to words,
as it is compared with the number of objects in ETS table.
@bryannaegele bryannaegele force-pushed the fix-batch-processor-max-queue-size branch from 42b8376 to 22f3417 Compare October 29, 2023 15:18
@bryannaegele bryannaegele merged commit 0fb4e6c into open-telemetry:main Oct 29, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants