How/where should bootstrap command get the list of auto-instrumentation packages from? #668

owais · 2020-05-11T12:03:57Z

PR #650 adds a new bootstrap command. The command can be executed by users to automatically detect the libraries their project uses and install instrumentation libraries for the detected packages.

To make this work, the bootstrap commands need posses some knowledge about supported packages and the respective instrumentations. The current implementation hard-codes this information in the bootstrap command source code. Some people have raised concerns about hard-coding this knowledge.

We could as an alternative have a development script that iterates over the list ext/ directory, find all the instrumentations and generate the list. This would automate the step and would ensure that the bootstrap command would always be in sync with the published packages.

That said, I think doing this raises more problems than it solves. Here is what such a script would need to do.

Find all instrumentation packages from ext/ directory.
Figure out the libraries the packages are instrumenting.
Figure out which version of the package to install. For example, source code might have an unpublished version. Or we might discover an issue in the latest version of some instrumentation and want the bootstrap command to install an older version instead. It is easy to imagine that bootstrap command might not always want to install the version specified in the repo.
Detect and ignore any unpublished/in-development packages.
In future, if the project moves away from a mono-repo structure, automating this might not be worth the effort anymore.
We might want the command to install some blessed community/contrib packages in addition to the ones shipped by the core project.

1-4 can be solved by adding additional metadata to each package. A build script can then iterate over the packages, extract this metadata and generate a mapping ready to be used by the command. IMO this is not dramatically better. It just gets rid of a central "hard-coded" index for a distributed one. If we envision this information to be useful elsewhere, may be it could be justified but if it's only consumed was to be the bootstrap command then it doesn't make much sense IMO. We'd just be hard-coding the same information but scattering it all over the repository.

For 5-6, this will not work as we'll need some sort of a discovery mechanism to make it all happen.

Downsides of hard-coding this information.

Only big downside to hard-coding this information I see is that some contributors might forget to update it when publishing instrumentation packages. It is somewhat less likely that people would forget to update this info if it it was stored in setup.cfg but it doesn't solve the problem completely. Also, what is the worst that could happen if someone forgot to update the boostrap command "index"? We'd ship opentelemetry-auto-instrumentation package and bootstrap would install a slightly older version of an instrumentation or not know about a new instrumentation. I think it is relatively very harmless compared to a bootstrap command that automatically updates the index to the latest version of instrumentation package it finds in each package's setup.py.

On the other hand, hard-coding this information gives us a lot more flexibility than any automated solution we could come up with.

Generally speaking, from a software engineering perspective, I think some form of hard-coding this should be the first step and we should move towards automation only after living with the pain (if any) this causes. It might turn out to be a case of premature abstraction otherwise.

I might have missed other obvious reasons to not hard-code though. Happy to hear other thoughts.

The text was updated successfully, but these errors were encountered:

ocelotl · 2020-05-11T19:29:54Z

Actually, I think the biggest problem with hardcoding is that it does not scale well, generally speaking. The user may forget to add a new instrumentation in the list, or they may be moved into another repo altogether, forcing the maintainer of this repo to be aware of what is happening on some other repo.

I think it is fine to hardcode now for our first release of this component, but a better approach, at least for the part of finding the existing instrumentations is to use pip search opentelemetry and filter the output to get the published instrumentations.

github-actions · 2021-04-09T03:21:35Z

This issue was marked stale due to lack of activity. It will be closed in 30 days.

owais mentioned this issue May 11, 2020

Introduce a bootstrap command to auto-install packages #650

Merged

github-actions bot added the backlog label Apr 9, 2021

codeboten added feature-request and removed backlog labels Apr 15, 2021

owais closed this as completed Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How/where should bootstrap command get the list of auto-instrumentation packages from? #668

How/where should bootstrap command get the list of auto-instrumentation packages from? #668

owais commented May 11, 2020

ocelotl commented May 11, 2020

github-actions bot commented Apr 9, 2021

How/where should bootstrap command get the list of auto-instrumentation packages from? #668

How/where should bootstrap command get the list of auto-instrumentation packages from? #668

Comments

owais commented May 11, 2020

Downsides of hard-coding this information.

ocelotl commented May 11, 2020

github-actions bot commented Apr 9, 2021