You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #650 adds a new bootstrap command. The command can be executed by users to automatically detect the libraries their project uses and install instrumentation libraries for the detected packages.
To make this work, the bootstrap commands need posses some knowledge about supported packages and the respective instrumentations. The current implementation hard-codes this information in the bootstrap command source code. Some people have raised concerns about hard-coding this knowledge.
We could as an alternative have a development script that iterates over the list ext/ directory, find all the instrumentations and generate the list. This would automate the step and would ensure that the bootstrap command would always be in sync with the published packages.
That said, I think doing this raises more problems than it solves. Here is what such a script would need to do.
Find all instrumentation packages from ext/ directory.
Figure out the libraries the packages are instrumenting.
Figure out which version of the package to install. For example, source code might have an unpublished version. Or we might discover an issue in the latest version of some instrumentation and want the bootstrap command to install an older version instead. It is easy to imagine that bootstrap command might not always want to install the version specified in the repo.
Detect and ignore any unpublished/in-development packages.
In future, if the project moves away from a mono-repo structure, automating this might not be worth the effort anymore.
We might want the command to install some blessed community/contrib packages in addition to the ones shipped by the core project.
1-4 can be solved by adding additional metadata to each package. A build script can then iterate over the packages, extract this metadata and generate a mapping ready to be used by the command. IMO this is not dramatically better. It just gets rid of a central "hard-coded" index for a distributed one. If we envision this information to be useful elsewhere, may be it could be justified but if it's only consumed was to be the bootstrap command then it doesn't make much sense IMO. We'd just be hard-coding the same information but scattering it all over the repository.
For 5-6, this will not work as we'll need some sort of a discovery mechanism to make it all happen.
Downsides of hard-coding this information.
Only big downside to hard-coding this information I see is that some contributors might forget to update it when publishing instrumentation packages. It is somewhat less likely that people would forget to update this info if it it was stored in setup.cfg but it doesn't solve the problem completely. Also, what is the worst that could happen if someone forgot to update the boostrap command "index"? We'd ship opentelemetry-auto-instrumentation package and bootstrap would install a slightly older version of an instrumentation or not know about a new instrumentation. I think it is relatively very harmless compared to a bootstrap command that automatically updates the index to the latest version of instrumentation package it finds in each package's setup.py.
On the other hand, hard-coding this information gives us a lot more flexibility than any automated solution we could come up with.
Generally speaking, from a software engineering perspective, I think some form of hard-coding this should be the first step and we should move towards automation only after living with the pain (if any) this causes. It might turn out to be a case of premature abstraction otherwise.
I might have missed other obvious reasons to not hard-code though. Happy to hear other thoughts.
The text was updated successfully, but these errors were encountered:
Actually, I think the biggest problem with hardcoding is that it does not scale well, generally speaking. The user may forget to add a new instrumentation in the list, or they may be moved into another repo altogether, forcing the maintainer of this repo to be aware of what is happening on some other repo.
I think it is fine to hardcode now for our first release of this component, but a better approach, at least for the part of finding the existing instrumentations is to use pip search opentelemetry and filter the output to get the published instrumentations.
PR #650 adds a new bootstrap command. The command can be executed by users to automatically detect the libraries their project uses and install instrumentation libraries for the detected packages.
To make this work, the bootstrap commands need posses some knowledge about supported packages and the respective instrumentations. The current implementation hard-codes this information in the bootstrap command source code. Some people have raised concerns about hard-coding this knowledge.
We could as an alternative have a development script that iterates over the list
ext/
directory, find all the instrumentations and generate the list. This would automate the step and would ensure that the bootstrap command would always be in sync with the published packages.That said, I think doing this raises more problems than it solves. Here is what such a script would need to do.
ext/
directory.1-4 can be solved by adding additional metadata to each package. A build script can then iterate over the packages, extract this metadata and generate a mapping ready to be used by the command. IMO this is not dramatically better. It just gets rid of a central "hard-coded" index for a distributed one. If we envision this information to be useful elsewhere, may be it could be justified but if it's only consumed was to be the bootstrap command then it doesn't make much sense IMO. We'd just be hard-coding the same information but scattering it all over the repository.
For 5-6, this will not work as we'll need some sort of a discovery mechanism to make it all happen.
Downsides of hard-coding this information.
Only big downside to hard-coding this information I see is that some contributors might forget to update it when publishing instrumentation packages. It is somewhat less likely that people would forget to update this info if it it was stored in setup.cfg but it doesn't solve the problem completely. Also, what is the worst that could happen if someone forgot to update the boostrap command "index"? We'd ship opentelemetry-auto-instrumentation package and bootstrap would install a slightly older version of an instrumentation or not know about a new instrumentation. I think it is relatively very harmless compared to a bootstrap command that automatically updates the index to the latest version of instrumentation package it finds in each package's setup.py.
On the other hand, hard-coding this information gives us a lot more flexibility than any automated solution we could come up with.
Generally speaking, from a software engineering perspective, I think some form of hard-coding this should be the first step and we should move towards automation only after living with the pain (if any) this causes. It might turn out to be a case of premature abstraction otherwise.
I might have missed other obvious reasons to not hard-code though. Happy to hear other thoughts.
The text was updated successfully, but these errors were encountered: