Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide versioned semantic conventions with sub-package/folder #2946

Open
srikanthccv opened this issue Sep 23, 2022 · 19 comments
Open

Provide versioned semantic conventions with sub-package/folder #2946

srikanthccv opened this issue Sep 23, 2022 · 19 comments

Comments

@srikanthccv
Copy link
Member

The idea is to create a new folder for each version of semantic conventions release and instrumentation packages or end users can choose which one to use.

The folder structure will look like following

opentelemetry-semantic-conventions/src/opentelemetry/semconv
  /v1_13_0
    /trace
    /resource
    /metric
    /..
  /v1_12_0
  /v1_11_0

It is convenient to use this because we don't necessarily need to update the instrumentation every time there is a spec release. This will enable us to mark the package as stable since there will always be a backward compatible sub-package but the downside is it can grow big over time. The current size of the package seems to be around 26kb https://pypi.org/project/opentelemetry-semantic-conventions/#files. Original inspiration from go SKD https://github.com/open-telemetry/opentelemetry-go/tree/main/semconv

@ocelotl
Copy link
Contributor

ocelotl commented Sep 23, 2022

I think it would be better to separate our semantic conventions code in a different repo so that we have an opentelemetry-semantic-conventions package that follows the releases of the spec because:

  1. Code would not increase since we would not have repeated code in that repo.
  2. When we want to update the semantic conventions now, we have to make a release for all our components. This way we could make a release of the semantic conventions package only making it much easier to have an up-to date semantic conventions package.
  3. If users want to use another version of X, they update the version number of X in their requirements file, they don't update application or library code. This is important because users use pip freeze to find out how was the running environment of an application, they don't expect to also have to record some application code line to find out which semantic conventions version was being used.

@ocelotl
Copy link
Contributor

ocelotl commented Sep 23, 2022

Something else:

  1. An user uses from opentelemetry.semantic_conventions.v_1_2_3 import X today with opentelemetry-semantic-conventions==5.2.3 in their requirements file.
  2. Accidentally someone commits the deletion the symbol X from opentelemetry.semantic_conventions.v_1_2_3. That commit gets merged and included in release 5.2.4 of our packages (including opentelemetry-semantic-conventions).
  3. The user updates the dependencies to include opentelemetry-semantic-conventions==5.2.4. The user still wants to use semantic conventions v1.2.3 to import X but now it is broken.

In summary, using actual released packages with their corresponding versions to protects the integrity of those versions because it is guaranteed that the code in a specific version of a package will never change. Using directories named after versions is a fragile approach.

@codeboten
Copy link
Contributor

I like this idea of releasing opentelemetry-semantic-conventions when a new version of the spec is released. I do see some benefits to @srikanthccv's proposal in the following scenario:

  • user application relies on semconv 1.11.0 and they use some instrumentations that also relies on 1.11.0
  • user updates the instrumentation package to the latest which now depends on 1.12.0.
    • user application can continue to use 1.11.0 while instrumentation can use 1.12.0 as they co-exist in the same package

I'm not sure if this is a great situation, as it may mean that with multiple semconvs being used applications will produce telemetry differently, this should be ok so long as schema URLs are set correctly.

I do struggle with the idea of having a continuously increasing package size. If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

There is precedent for including multiple versions within a single package, the kubernetes client package does this (https://github.com/kubernetes-client/python/tree/master/kubernetes/client/api).

@srikanthccv
Copy link
Member Author

I think it would be better to separate our semantic conventions code in a different repo so that we have an opentelemetry-semantic-conventions package that follows the releases of the spec because

I worry this sets an incorrect expectation that other SDK packages also follow and keep in sync with the spec release. Does any other SIG do this today?

If users want to use another version of X, they update the version number of X in their requirements file, they don't update application or library code.

This assumes that a new semantic version didn't delete anything which is not the case now. Take a look at the recent release, there are some attributes removed from the HTTP attributes which means the user might have to update the application code if they want to use the new version. In a sense, we are also the users (contrib repo), if we want to use the 1.13.0 version we must update the instrumentation library code.

Accidentally someone commits the deletion the symbol X from opentelemetry.semantic_conventions.v_1_2_3. That commit gets merged and included in release 5.2.4 of our packages (including opentelemetry-semantic-conventions).

No handmade changes will be done to this package. It's all generated from sem conv yaml files.

I do struggle with the idea of having a continuously increasing package size. If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

This is a valid concern and I think the idea of limiting the number of versions is a good suggestion.

@ocelotl
Copy link
Contributor

ocelotl commented Sep 26, 2022

This assumes that a new semantic version didn't delete anything which is not the case now. Take a look at the recent release, there are some attributes removed from the HTTP attributes which means the user might have to update the application code if they want to use the new version. In a sense, we are also the users (contrib repo), if we want to use the 1.13.0 version we must update the instrumentation library code.

The application code may need or not to be updated if another version of the semantic conventions are used but that is not the point here. The point here is that when using version-named directories a dependency version is being specified outside the requirements file and into the application code.

No handmade changes will be done to this package. It's all generated from sem conv yaml files.

That does not matter, with the version-named directories approach that generated code can be modified by any commit.

This is a valid concern and I think the idea of limiting the number of versions is a good suggestion.

Limiting the number of versions is a solution that introduces another problem, an user will not be able to use an old version after it is deleted. Having a separate package for the semantic conventions does not have the problem of code growing and also it does not have the problem of having to delete old versions.

@srikanthccv
Copy link
Member Author

Isn't the idea behind schema_url about the ability for applications to use multiple sematic conv versions? How would it be possible if there is a separate package for each version?

@lzchen
Copy link
Contributor

lzchen commented Sep 27, 2022

@ocelotl
I'm not following your suggestion here. Are you proposing to release a new differently named package per semantic convention version? i.e. opentelemetry-semantic-conventions-1.12 for v1.12 of semantic conventions and also keep it in lockstep with the spec version? How would the instrumentation hard dependencies work in this case?

@srikanthccv
@ocelotl 's concern of:

Limiting the number of versions is a solution that introduces another problem, an user will not be able to use an old version after it is deleted.

makes sense. I can totally see users setting up their application once and never updating the semantic conventions again as soon as it fits their business use case. If these conventions get outdated/obsolete this could be troublesome. However, as to @codeboten 's point,

If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

maybe the above problem isn't as bad as we think realistically for users? Maybe the desire to have a stamped "stable" on our product is more appealing to users than it is to update their code once in a blue moon? As long as we are transparent with our users and give them ample time/warning to upgrade (before deleting older versions of semconv), is that so bad?

@ocelotl
Copy link
Contributor

ocelotl commented Sep 27, 2022

@ocelotl I'm not following your suggestion here. Are you proposing to release a new differently named package per semantic convention version? i.e. opentelemetry-semantic-conventions-1.12 for v1.12 of semantic conventions and also keep it in lockstep with the spec version? How would the instrumentation hard dependencies work in this case?

No, I am suggesting to have our opentelemetry-semantic-conventions package follow this versioning scheme: X.Y.Z.a, where X.Y.X is the version number of the spec and a is an additional number for our bugfixes.

@lzchen
Copy link
Contributor

lzchen commented Sep 27, 2022

@ocelotl

  1. How does that solve the original goal of being able to mark semantic conventions package stable?
  2. How does this make it so that we don't have to update our instrumentations everytime semantic conventions is updated?

If users want to use another version of X, they update the version number of X in their requirements file, they don't update application or library code.

Isn't this what we already do today? How would users be able to use semantic version X without having to update their instrumentation version as well?

Maybe I'm not fully understanding. Feel free to add this topic to Thursday's agenda.

@ocelotl
Copy link
Contributor

ocelotl commented Sep 28, 2022

Maybe I'm not fully understanding. Feel free to add this topic to Thursday's agenda.

Sure, I will 👍

@ocelotl
Copy link
Contributor

ocelotl commented Sep 29, 2022

In the SIG we discussed that the problem with the approach I suggest is that we expect that it will be frequent that an instrumentation will pin its dependency to a version of semantic conventions and another one to another version so that the user will have a problem when trying to install both in the same virtual environment (I specifically mention frequent here because the same can happen to other packages like opentelemetry-api, opentelemetry-sdk, etc. but we don't expect that to happen frequently).

There is another solution that has all the advantages of the approach I suggest without this problem, and that is to make a new package named opentelemetry-semantic-conventions-... with every spec release, where ... is some versioning schema that indicates that semantic conventions is unstable and that the package corresponds with the released version of the spec (something like 0.A.B.C.D, where A.B.C is the spec version and D a bug fix number, maybe?).

Another advantage of this approach is that if it happens that the semantic conventions in the spec someday become a stable thing that follows semantic versioning we can just start having the release of opentelemetry-semantic-conventions with a normal semantic versioning.

The disadvantage of this approach is that is horrible.

But it works. 🤷

WDYT @srikanthccv @lzchen @aabmass ?

@lancetarn
Copy link
Contributor

👋 New here, so feel free to ignore this... there is lots of context I don't have!

FWIW, I have used the "subpackage-versioning" strategy

from foo.v1_2_3 import bar

within private packages to cushion against version conflicts while providing backward compatibility. It isn't the strangest thing I've seen, although I agree it also isn't the most straight-forward. All told, the go team's approach linked in the issue description does seem to make sense given the forces at play.

I think creating an entire new package for every version has a few downsides:

  • Need to add entirely new dependency for every update
  • Still need to update library/app code if you want to move to newer version
  • Clutters the package namespace
    • Makes finding the right package more difficult/more typo-squatting options
    • PyPI is not designed for sorting/comparing across packages in this way/more difficult to know what "latest" is

@lzchen
Copy link
Contributor

lzchen commented Sep 30, 2022

@ocelotl

There is another solution that has all the advantages of the approach I suggest without this problem, and that is to make a new package named opentelemetry-semantic-conventions-... with every spec release

Are you proposing to release a new differently named package per semantic convention version?

So is your new proposal aligned with my question from before?

@ocelotl
Copy link
Contributor

ocelotl commented Oct 3, 2022

wave New here, so feel free to ignore this... there is lots of context I don't have!

FWIW, I have used the "subpackage-versioning" strategy

from foo.v1_2_3 import bar

within private packages to cushion against version conflicts while providing backward compatibility. It isn't the strangest thing I've seen, although I agree it also isn't the most straight-forward. All told, the go team's approach linked in the issue description does seem to make sense given the forces at play.

I think creating an entire new package for every version has a few downsides:

  • Need to add entirely new dependency for every update

This is normal, everytime an user wants to update a library the will add a change in their requirements.txt file.

  • Still need to update library/app code if you want to move to newer version

This is true.

  • Clutters the package namespace

    • Makes finding the right package more difficult/more typo-squatting options

This is true.

  • PyPI is not designed for sorting/comparing across packages in this way/more difficult to know what "latest" is

This is true as well. I am proposing a horrible solution, indeed.

@ocelotl
Copy link
Contributor

ocelotl commented Oct 3, 2022

@ocelotl

There is another solution that has all the advantages of the approach I suggest without this problem, and that is to make a new package named opentelemetry-semantic-conventions-... with every spec release

Are you proposing to release a new differently named package per semantic convention version?

So is your new proposal aligned with my question from before?

That's right, it is.

@aabmass
Copy link
Member

aabmass commented Oct 6, 2022

I do struggle with the idea of having a continuously increasing package size. If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

@codeboten I agree that would be nice. I see two way to do this without breaking semantic versioning

  • keep opentelemetry-semantic-conventions as <1.0, doing minor version releases when we remove something and otherwise patch versions.
  • bump the major version each time we remove something

Also re package size, I did a quick test by copying semantic conventions 200 times in src tree to get some rough estimates:

  • src/ went from 116K -> 16M
  • builds went from
    28K     dist/opentelemetry_semantic_conventions-0.34b0-py3-none-any.whl
    24K     dist/opentelemetry_semantic_conventions-0.34b0.tar.gz
    
    to
    3.8M    dist/opentelemetry_semantic_conventions-0.34b0-py3-none-any.whl
    3.3M    dist/opentelemetry_semantic_conventions-0.34b0.tar.gz
    

This is similar to grpcio wheel package size for comparison. But not great.

@aabmass
Copy link
Member

aabmass commented Jul 27, 2023

This could be an alternate solution open-telemetry/semantic-conventions#214 to avoid backward compatibility issues when fields are removed.

@lzchen
Copy link
Contributor

lzchen commented Aug 16, 2023

Do we still want to explore this now that we have agreed that we will be carrying out the migration plan for OT sem conv? https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md

@aabmass
Copy link
Member

aabmass commented Aug 17, 2023

To refocus this conversation, the goal of this issue is to mark opentelemetry-semantic-conventions PyPI package as stable with no expectation of every having a major version bump. The benefits are:

  • Prevent instrumentations which use semantic conventions (including contrib repo, third partry instrumentation libraries, or instrumentation inlined into the library being instrumented) from having dependency conflicts caused by OpenTelemetry.
    • E.g. say Flask and requests both instrumented their code directly with OTel and require opentelemetry-api~=1.0 opentelemetry-semantic-conventions~=1.0 package. The OpenTelemetry dependencies will never cause Flask and requests to conflict as pip may choose the newest version of opentelemetry-semantic-conventions
  • Possibility of versioning the opentelemetry-semantic-conventions package in lockstep with the semantic conventions spec version. Makes it easy for users to know what they are pulling in.
  • Makes it easier for instrumentations to implement OTEL_SEMCONV_STABILITY_OPT_IN, as they can use the constants from any spec version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants