-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK Trace: Adding span collection limits #942
SDK Trace: Adding span collection limits #942
Conversation
Fixes open-telemetry#182 To help safeguard against a common error, adding default limits to the number of events per span in the SDK.
Adding optional additional configurations for span limits. Modifying the unmlimited value to -1 to enable restricting collections to 0 elements.
- modifying logging messaging to SHOULD, to imply configurability - clarifying collections types rather than using "each"
@tigrannajaryan @bogdandrutu @Oberon00 are you satisfied with the PR update? Can you pls re-review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading this, I am not sure if you ask for a "bytes limit" or "number of attributes", "number of events", etc limits.
I would say that the first one may sound very interesting but it comes with some performance overhead and hard to implement it right. So I am more supportive of the "max number of attributes", etc.
Independent of what is suggested here, I think the fact that is not clear has to be fixed.
Co-authored-by: Tigran Najaryan <[email protected]>
Ensuring there is no ambiguation between limits being in bytes, or numbers of elements in a collection.
Good point! I've fixed to element count. I agree that bytes has some interesting ramifications and addressed the immediate concern of stressing memory, but maybe a follow-up in another PR. |
from the issue triage meeting today with TC, if this PR is desired before GA, need 2 approvals an no disapprovals to merge, else this will be closed by end of day today @open-telemetry/technical-committee @open-telemetry/specs-approvers @SergeyKanzhelev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I might object to this spec change is simply that it's a lot of work to require, but it's work that was inherited from the OpenCensus libraries in many cases, e.g., I believe these limits are already implemented for OTel-Go.
from the issue triage mtg today with TC, marking as |
Addressing feedback arguing for a non-configurable limit, as is the case with other limits present in OpenTelemetry.
Thanks for the feedback! It's great to have more input.
Do you believe this will restrict people from adding other ways? I thought the spec was always a dictation of what is a requirement for an implementation, with anything not explicitly stated up to the liberty of the implementations.
I think this is the tradeoff to be made: the con is forcing choices that are not appropriate for the implementation. the pro is standard behavior that's enables knowledge that transfers across language boundaries. We dictate several implementation details that could be argued to prevent innovation, such as dictating that the behavior of the default span processor, and it's defaults. If the consensus (and I'd like to hear a few more chime in too) is "we don't know if this is the right behavior to dictate across implementations", I think it's fine to leave this PR be (or close). But I would argue that this is a practical implementation detail that would prevent a significant category of human errors. And the cost of differing implementations is end user confusion. |
the proposal in current form is very toothless. If anybody disagree with this PR - please mark it with "changes requested". Otherwise I will merge mid day (pacific time) tomorrow. |
@toumorokoshi Not restrict, but possibly discourage. There is limited amount of time SDK authors are willing to spend on this topic. As an SDK author if I spend effort to implement a spec requirement I am less likely to go ahead and implement another additional way to limit resource exhaustion. Developer's time is a finite resource too.
I agree that it is beneficial to have a good standard behavior. I am not convinced the proposed standard behavior is valuable enough to be enforced across all SDKs.
I would also like to hear more opinions. I am willing to be convinced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reasons I posted in my previous comments I am putting a soft block on this to prevent premature merging, but I am open to be convinced.
I would like to point to yet another aspect of the current discussion. Even if this PR is not merged, it will be allowed for SDK to implement these or similar limits. Then it will be beneficial if different languages at least used the same configuration options for these limits. E.g. the same environment variable name. Is the specification open to augment |
Yes, I think so. I would prefer configurable behavior. |
Configurability was removed from this PR as we cannot formalize use cases. For instance, if somebody set the limit to 2 attributes, should it be supported? Why? If somebody sets the limit to infinity, should it be supported? No exporter will probably need those so why to allow to set it? Configurability was important in OpenCensus as default was quite low. Like 10 or 20 as far as I remember. In OpenTelemetry I believe the general agreement that we need to start with much higher default. Which defeats the purpose of having this limits configurable on SDK side. I see this PR in current form as a warning that something need to be done to control for unbound collections. Alternative to this PR is doing nothing - SDKs needs to be safe in general. Switching to controllable behavior would need to start with the scenarios. |
There is a use case where collection of all telemetry is important: audit. In extreme cases it may be preferable to crash the actual operation if the telemetry that describes the operation cannot be exported. I understand that we do not necessarily care about this particular use case.
Makes sense. If it's a warning perhaps we can it a recommendation rather than hard requirement? |
I think there's a good comprise with the optional aspect of it. Aggregating the proposal and calling out how this is addressing concerns, how about:
I'm still concerned about this not being required, but I'd much prefer some awareness of the issue in the spec vs none at all. And it can be a good building block if we want to expand that into something more standardized. |
I am sorry, what does it mean? SDKs cannot be safe by magic, they have to build themselves in such a way. This PR is one way to help/advice SDKs how to do that. SDKs have to implement some safety measures, this PR tries to make them similar across different languages. Do we want to have ever language different? One has low limit, one has no limit, everyone using different configuration variables? |
Addressing feedback to make collection limits optional for now. The motivation is to allow innovation around safe SDK behavior, while providing a guideline and awareness for the issue. Reducing the logging on discarded attributes, events, or links to once per process to further reduce spam.
Giving the SDK implementors more freedom in how they choose to log, while setting guidelines on the frequency.
Thank you @toumorokoshi |
@SergeyKanzhelev @iNikem I re-requested your review since the PR is substantially different now. |
To help make consumers aware of which SDKS implement span collection limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once @iNikem's final suggestions are applied.
Thanks for carrying this through, @toumorokoshi!
Co-authored-by: Nikita Salnikov-Tarnovski <[email protected]>
@arminru added! |
@SergeyKanzhelev could you do a final review, and merge if you feel it's appropriate? I believe I've addressed all outstanding comments. |
* SDK Trace: Adding span collection limits Fixes open-telemetry#182 To help safeguard against a common error, adding default limits to the number of events per span in the SDK. * Adding to the changelog * Addressing feedback Adding optional additional configurations for span limits. Modifying the unmlimited value to -1 to enable restricting collections to 0 elements. * Addressing feedback - modifying logging messaging to SHOULD, to imply configurability - clarifying collections types rather than using "each" * Update specification/trace/sdk.md Co-authored-by: Tigran Najaryan <[email protected]> * Clarifying limits are in elements, not bytes Ensuring there is no ambiguation between limits being in bytes, or numbers of elements in a collection. * Removing customizable span collection limit Addressing feedback arguing for a non-configurable limit, as is the case with other limits present in OpenTelemetry. * fixing lint * Update specification/trace/sdk.md Co-authored-by: Nikita Salnikov-Tarnovski <[email protected]> * Update specification/trace/sdk.md Co-authored-by: Sergey Kanzhelev <[email protected]> * adding collection limits for attributes, links Adding collection limits for spans in the compliance matrix for attributes and links as well as events. * Making collection limits optional Addressing feedback to make collection limits optional for now. The motivation is to allow innovation around safe SDK behavior, while providing a guideline and awareness for the issue. Reducing the logging on discarded attributes, events, or links to once per process to further reduce spam. * Updating log recommendations. Giving the SDK implementors more freedom in how they choose to log, while setting guidelines on the frequency. * adding back limits to spec-compliance-matrix To help make consumers aware of which SDKS implement span collection limits. * Apply suggestions from code review Co-authored-by: Nikita Salnikov-Tarnovski <[email protected]> Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: Nikita Salnikov-Tarnovski <[email protected]> Co-authored-by: Sergey Kanzhelev <[email protected]>
To help safeguard against a common error, adding default limits
to the number of events per span in the SDK.
Fixes #182
Changes
add default limits to the number of events per span in the SDK