-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/probabilisticsampler] fix panic when sampling on non-Bytes log record attribute #18223
Conversation
This commit fixes a panic from the processor, which occurs when the processor is configured to sample on any log attribute that does not have a `Bytes` attribute type. Now, the processor will accept `String` attributes as well as `Byte` attributes. If a sampled attribute has a different type then rather than panicking the processor will skip the log record and log a warning.
lidBytes = value.Bytes().AsRaw() | ||
|
||
switch value.Type() { | ||
case pcommon.ValueTypeStr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do with some input here.. I'm not sure if we should be handling string types without understanding the encoding. See PR discussion for more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that this doesn't matter much, as a hash will be calculated based on the value. If it's string or int, it doesn't matter, as long as they are all converted to bytes in the end. Additionally, it doesn't matter if we are using the right encoding for the string, as long as all values are byte encoded in the same way. If we were to use this information for other than just calculating a hash, it would certainly be important to properly handle the encoding.
case pcommon.ValueTypeBytes: | ||
lidBytes = value.Bytes().AsRaw() | ||
default: | ||
lsp.logger.Warn("incompatible log record attribute, only String or Bytes supported; skipping log record", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the user has configured a log record attribute that is not a supported type. Unfortunately, the RemoveIf
functions used by this processor don't return errors, so I figure the only sane thing here without a bigger refactor is to log something. Is WARN
the appropriate severity for a misconfigured processor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only value that should not be supported is boolean. Otherwise, the probability provided by the user will be skewed by the fact that booleans can only have two values: there won't be a good distribution of values to make a proper probability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a method named, AsString
which works regardless the type since it just calls fmt.Sprint
?
Would that work here instead?
Hey folks, I came across this one whilst playing with the probabilistic sampler as part of some benchmarking I was doing. Initially I figured that it would make as much sense to sample on a However.. Now I have hunted around the original issues and PRs for this sampler it looks like the rationale behind being able to sample on log record attributes is to ensure that you sample all the log records associated with any spans that have been sampled. The underlying idea being (I assume) to stuff the same trace IDs into your log signals and your spans. I'm now wondering if I should just change this PR to not accept If you have a trace ID ( Let me know if I should just remove the If that's the case, then as a end user that has decided to put the trace IDs on an attribute (rather than on the log record TraceID) |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Foresight Summary
View More Details⭕ build-and-test-windows workflow has finished in 9 seconds (41 minutes 22 seconds less than
|
Job | Failed Steps | Tests | |
---|---|---|---|
windows-unittest-matrix | - 🔗 | N/A | See Details |
windows-unittest | - 🔗 | N/A | See Details |
✅ check-links workflow has finished in 47 seconds (40 seconds less than main
branch avg.) and finished at 15th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
changed files | - 🔗 | N/A | See Details |
check-links | - 🔗 | N/A | See Details |
✅ changelog workflow has finished in 2 minutes 25 seconds and finished at 15th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
changelog | - 🔗 | N/A | See Details |
✅ telemetrygen workflow has finished in 1 minute 3 seconds (1 minute 2 seconds less than main
branch avg.) and finished at 15th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
build-dev | - 🔗 | N/A | See Details |
publish-latest | - 🔗 | N/A | See Details |
publish-stable | - 🔗 | N/A | See Details |
❌ build-and-test workflow has finished in 26 minutes 19 seconds (37 minutes 19 seconds less than main
branch avg.) and finished at 15th Mar, 2023. 3 jobs failed. There are 2 test failures.
Job | Failed Steps | Tests | |
---|---|---|---|
unittest-matrix (1.19, connector) | - 🔗 | ✅ 113 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, connector) | - 🔗 | ✅ 113 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, processor) | Run Unit Tests 🔗 | ✅ 822 ❌ 1 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, processor) | Run Unit Tests 🔗 | ✅ 822 ❌ 1 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, extension) | - 🔗 | ✅ 467 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, receiver-0) | - 🔗 | ✅ 564 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, receiver-0) | - 🔗 | ✅ 454 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, other) | - 🔗 | ✅ 0 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, internal) | - 🔗 | ✅ 551 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, internal) | - 🔗 | ✅ 316 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, exporter) | - 🔗 | ✅ 615 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, other) | - 🔗 | ✅ 0 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, exporter) | - 🔗 | ✅ 615 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, extension) | - 🔗 | ✅ 467 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, receiver-1) | - 🔗 | ✅ 301 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, receiver-1) | - 🔗 | ✅ 301 ❌ 0 ⏭ 0 🔗 | See Details |
correctness-traces | - 🔗 | ✅ 17 ❌ 0 ⏭ 0 🔗 | See Details |
correctness-metrics | - 🔗 | ✅ 2 ❌ 0 ⏭ 0 🔗 | See Details |
integration-tests | - 🔗 | ✅ 55 ❌ 0 ⏭ 0 🔗 | See Details |
setup-environment | - 🔗 | N/A | See Details |
checks | - 🔗 | N/A | See Details |
check-codeowners | - 🔗 | N/A | See Details |
build-examples | - 🔗 | N/A | See Details |
check-collector-module-version | - 🔗 | N/A | See Details |
lint-matrix (receiver-0) | - 🔗 | N/A | See Details |
lint-matrix (receiver-1) | - 🔗 | N/A | See Details |
lint-matrix (processor) | - 🔗 | N/A | See Details |
lint-matrix (exporter) | - 🔗 | N/A | See Details |
lint-matrix (extension) | - 🔗 | N/A | See Details |
lint-matrix (connector) | - 🔗 | N/A | See Details |
lint-matrix (internal) | - 🔗 | N/A | See Details |
lint-matrix (other) | - 🔗 | N/A | See Details |
unittest (1.20) | Interpret result 🔗 | N/A | See Details |
unittest (1.19) | - 🔗 | N/A | See Details |
lint | - 🔗 | N/A | See Details |
cross-compile | - 🔗 | N/A | See Details |
build-package | - 🔗 | N/A | See Details |
windows-msi | - 🔗 | N/A | See Details |
publish-check | - 🔗 | N/A | See Details |
publish-stable | - 🔗 | N/A | See Details |
publish-dev | - 🔗 | N/A | See Details |
✅ prometheus-compliance-tests workflow has finished in 10 minutes 43 seconds (⚠️ 3 minutes 17 seconds more than main
branch avg.) and finished at 15th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
prometheus-compliance-tests | - 🔗 | ✅ 21 ❌ 0 ⏭ 0 🔗 | See Details |
✅ load-tests workflow has finished in 16 minutes 46 seconds (⚠️ 3 minutes 42 seconds more than main
branch avg.) and finished at 15th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
loadtest (TestTraceAttributesProcessor) | - 🔗 | ✅ 3 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestIdleMode) | - 🔗 | ✅ 1 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestMetric10kDPS|TestMetricsFromFile) | - 🔗 | ✅ 6 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) | - 🔗 | ✅ 8 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) | - 🔗 | ✅ 10 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) | - 🔗 | ✅ 12 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestBallastMemory|TestLog10kDPS) | - 🔗 | ✅ 18 ❌ 0 ⏭ 0 🔗 | See Details |
setup-environment | - 🔗 | N/A | See Details |
✅ e2e-tests workflow has finished in 13 minutes 50 seconds and finished at 15th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
kubernetes-test (v1.26.0) | - 🔗 | N/A | See Details |
kubernetes-test (v1.24.7) | - 🔗 | N/A | See Details |
kubernetes-test (v1.23.13) | - 🔗 | N/A | See Details |
kubernetes-test (v1.25.3) | - 🔗 | N/A | See Details |
*You can configure Foresight comments in your organization settings page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I was out for a few weeks. I left a few comments, let me know what you think.
lidBytes = value.Bytes().AsRaw() | ||
|
||
switch value.Type() { | ||
case pcommon.ValueTypeStr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that this doesn't matter much, as a hash will be calculated based on the value. If it's string or int, it doesn't matter, as long as they are all converted to bytes in the end. Additionally, it doesn't matter if we are using the right encoding for the string, as long as all values are byte encoded in the same way. If we were to use this information for other than just calculating a hash, it would certainly be important to properly handle the encoding.
case pcommon.ValueTypeBytes: | ||
lidBytes = value.Bytes().AsRaw() | ||
default: | ||
lsp.logger.Warn("incompatible log record attribute, only String or Bytes supported; skipping log record", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only value that should not be supported is boolean. Otherwise, the probability provided by the user will be skewed by the fact that booleans can only have two values: there won't be a good distribution of values to make a proper probability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - with a slight reduction of scope, just adding string support for now. We can add additional types in subsequent PRs.
Co-authored-by: Antoine Toulme <[email protected]>
Re: addition of string support. This is actually something I wanted yesterday for a use-case (sampling logs probabilistically without having any TraceID associated with a log), so thanks for adding it. 🙏 When I dived a little deeper into this, stanza's log parsers and OTTL, it was unclear to me how in fact one could get a |
Please attend to the failing test:
|
Please, ping me once the test failures are fixed so that I can review this again. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
@e-dard, are you still interested in this PR? If not, would you mind if someone else continues your work? @daianmartinho, would you be interested in picking this up if @e-dard isn't interested anymore? |
We have to get this in. Reopening and will follow up. |
@jpkrohling, |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything that concerns me, just needs to be rebased with main and it should be good to go :)
case pcommon.ValueTypeBytes: | ||
lidBytes = value.Bytes().AsRaw() | ||
default: | ||
lsp.logger.Warn("incompatible log record attribute, only String or Bytes supported; skipping log record", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a method named, AsString
which works regardless the type since it just calls fmt.Sprint
?
Would that work here instead?
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
@daianmartinho, would you be interested in working on @MovieStoreGuy's last comment? I think this is very close to being merged, only needs that clarification. |
@jpkrohling @MovieStoreGuy apologies for this falling off my plate for such a long time. I have fixed the test case and rebased. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
This PR fixes a panic from the processor, which occurs when the processor is configured to sample on any log attribute that does not have a
Bytes
attribute type.Now, the processor will accept
String
attributes as well asByte
attributes. If a sampled attribute has a different type then rather than panicking the processor will skip the log record and log a warning.Description:
Prevented panic by checking type of attribute before making a sampling decision.
Link to tracking Issue: #18222
Testing:
Added a test that previously would panic. Added a test that shows
String
attributes will be sampled.Documentation:
n/a