-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource merge semantics make upgrades problematic #2341
Comments
This is a problem we're in the process of addressing. The semantic conventions now include a schema that describes changes from version to version in a way that will enable migrating a resource from one version to the next. We recently landed #2267 which gives us the ability to parse those schema definitions. The next step is to implement the transformations required and then to update the |
@Aneurysm9 I can see where renaming attributes could be helpful in some cases, but also could break dashboards and alerts in an observability platform. Will there be an option to include both the old and new attributes on a span, metric, or log? We try and keep our internal libraries in sync with otel. My concern with |
There are plans for the collector to have a processor with the same logic and the ability to normalize all incoming attributes to a given schema version. |
Closing as this is now addressed with the OTel schema. |
@MrAlias How was this resolved? I got bit by this today. |
In case it helps someone else, I had to track down this version number, and copy it into my program so it matched: opentelemetry-go/sdk/resource/process.go Line 25 in 60f7d42
(notwithstanding the release notes telling me v1.18.0 is part of the 1.14 release) |
This looks like a bug to me – if the schema is being returned as v1.17.0 when v1.18.0 had been released. |
Can you identify how that is the case? This explicitly requires identical schemas between two resources. |
Is there an update here? Is there anything I could look to contribute? |
@Aneurysm9 do you have further details or should I open something separately up? |
I'm not sure there is anything we can do in the Go implementation at this time to improve the situation. The specification does not detail how to perform resource merging with multiple schema URLs involved, instead it defines this as an error and states that the resulting resource is undefined. There is a proposal to separate semantic conventions from the specification, which may result in a new schema family and versions, which I think would only further complicate this situation. While I think it might be argued that it would be spec compliant to return an error and a merged resource with the schema URL removed, doing so would likely condition users to ignore the error and result in surprising behavior when the implementation changes and/or different behavior is specified. |
Cheers @Aneurysm9. Yep, agreed that the spec does not detail how to manage, however, as highlighted here (and elsewhere through open-telemetry/opentelemetry-go-contrib#3657 with a response @ #3944) I'm keen to get more involved in this and understand that the wee hours of the Thursday morning (AEST) is the meeting time so I can likely jump on from next week onwards. |
Telemetry schema and their description of changes from one version to the next are documented in the specification. The gap lies in the resource specification including a requirement for schema URL support and stating that merging resources with different schema URLs is an error without including a requirement that schema translation be applied or that there be a mechanism for indicating that schema translation should be applied. There was significant discussion of this topic at the maintainers' meeting today and I expect there will be more discussion of this issue at the specification SIG meeting tomorrow. I'm sorry if my earlier comment appeared to imply that I did not care about this issue or that we were not intending to do anything about it. We certainly do intend to ensure that resources with associated telemetry schema can be used easily and safely. I think that the proposal @MrAlias has made in #3944 is good, and the implementation is the logical conclusion of work started almost two years ago when we began introducing telemetry schema. However, I think it does not go far enough and the spec should instead direct that telemetry schema migration SHOULD be used to align attributes on resources to be merged, either implicitly at the newer of the two versions or explicitly at the version provided by the user. We do, however, need to be careful to ensure that we do not introduce changes to our stable SDK that are incompatible with the specification or that get ahead of specification language under consideration that may change before it is stabilized. |
Here's a new repo I created to handle schema translations: https://github.com/MrAlias/otel-schema-utils Only thing supported currently is resource translations. |
## Which problem is this PR solving? When using detectors with mis-matched Schema URLs, resource.New will return an error. Previously, this was dropped and consequently it looked like configuration worked but many resource attributes would be missing in the resulting telemetry. With the recent addition of #48, this error is much easier to hit. For example, the detectors built into opentelemetry-go [import](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/builtin.go#L25) semconv `v1.17.0` but this project [pulls in](https://github.com/honeycombio/otel-config-go/blob/main/otelconfig/otelconfig.go#L21) `v1.18.0`. This means that adding something like `otelconfig.WithResourceOption(resource.WithHost())` results in a mis-configured OTel and no error to warn you what happened. This is all … very bad. This PR is really the bare minimum that needs to be done here. That is, it at least causes a runtime error so that you can tell what's going on — but it doesn't solve the problem that you fundamentally cannot use this library with _any_ other library that doesn't happen to use semconv `v1.18.0`. There are [a number](open-telemetry/opentelemetry-go#2341) of [open](open-telemetry/opentelemetry-go#3769) [issues](open-telemetry/opentelemetry-go-contrib#3657) in OTel and adjacent repositories regarding this problem, which probably warrant further investigation into a long-term sustainable solution. ## Short description of the changes Return the `err` from `resource.New` up through the stack. Update tests to expect this.⚠️ I guess technically this could be considered a backwards incompatible change, in that folks may have configurations that are "working" today but will throw an error after this change. I put that in scare quotes though because it's not _really_ working — it's just not throwing an error. I feel like actually returning an appropriate error here and letting folks know their configuration is likely not working as expected is the right choice. ## How to verify that this has the expected result There's a new test that specifically uses a detector with a different schema URL. Co-authored-by: Vera Reynolds <[email protected]>
We hit this issue multiple times already because of routine package updates and without any changes from our side. While I can imagine the cases where it is important to enforce compatible schemas, maybe we can just introduce a flag to ignore schema version incompatibility? |
Something I haven't seen in the discussion currently is that semconv doesn't seem to follow semver. The way these are versioned is that not all versions are compatible, even across minor/patch releases :/ For example, I need to upgrade my code from |
@tonglil, semantic conventions versioning policy is defined here: https://opentelemetry.io/docs/specs/otel/versioning-and-stability/#semantic-conventions-stability. Take notice that |
@pellared sorry to follow up but I'm looking for clarity on a couple of things.
|
Excepting that the current strategy adopted in the related open-telemetry/opentelemetry-go-contrib#3657 (comment) What's more challenging still is that within that single package for a given release, there are currently three different semconvs in use – While it is not necessarily one to discuss here, the problems with interoperability due to brittle design decisions – particularly with distributed systems – is rather concerning to me as a consumer of the standard, especially as it is currently adopted in Go. Right now I'm seeing version bumps due to security related issues for the contrib packages. The problem is that they don't abide by the compatibility contract of golang modules. Arguably the semconv strategy and contract of otel conflicts with golang's own module contract to the extent that IMHO each semconv alteration should see a similar |
At the very least the error message from Edit1: Why is Edit2: Forgive me if this is naive but |
## Which problem is this PR solving? - Fixes #5116 - Replaces #5115 ## Description of the changes - Upgrade OTel SDK to v1.22 (dependabot) - Upgrade semconv version to v1.24 ([Reference issue](open-telemetry/opentelemetry-go#2341)). I tried using semconv v1.22 but it was still failing. - Replace deprecated function `semconv.HTTPStatusCodeKey` with `semconv.HTTPResponseStatusCodeKey` ## How was this change tested? - ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Chirag Ghosh <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Description
We've encountered issues upon a recent upgrade to OTEL v1.1.0 (which upgraded to semconv v1.7.0) and existing code or contrib code which still used semconv v1.4.0. See open-telemetry/opentelemetry-go-contrib#1384 for an example.
The spec (https://github.com/open-telemetry/opentelemetry-specification/blob/bad49c714a62da5493f2d1d9bafd7ebe8c8ce7eb/specification/resource/sdk.md#merge) seems a bit contradictory in how this should be handled:
while also mentioning the merge algorithm (which is implemented to the spec in opentelemetry-go) as:
From a consumer perspective, there is no real way to determine if all of the resources are compatible prior to calling resource.New(...), and thus we're failing to create a resource detector.
Instead of the current algorithm, it feels like if the two resource schemas being merged are compatible (same major version), the merge should be able to proceed. Otherwise, we'll need to coordinate an upgrade of every telemetry package together to ensure we're on a consistent version of semconv.
Possible algorithms:
rpc.jsonrpc.method
->rpc.method
).Environment
Steps To Reproduce
resource.New
fails because two different semconv versions are in use:opentelemetry-go/sdk/resource/resource.go
Lines 190 to 192 in 7ce58f3
Expected behavior
Resource detector should be able to be initialized.
The text was updated successfully, but these errors were encountered: