Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pkg_resource and replace for importlib.metadata in autoinstrumentation #2181

8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

### Breaking changes

- Deprecation of pkg_resource in favor of importlib.metadata
([#2181](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/2181))

## Version 1.27.0/0.48b0 ()

### Added

- `opentelemetry-instrumentation-kafka-python` Instrument temporary fork, kafka-python-ng
inside kafka-python's instrumentation
- `opentelemetry-instrumentation-kafka-python` Instrument temporary fork, kafka-python-ng inside kafka-python's instrumentation
([#2537](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2537))

### Breaking changes
Expand Down
3 changes: 2 additions & 1 deletion opentelemetry-instrumentation/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ classifiers = [
]
dependencies = [
"opentelemetry-api ~= 1.4",
"setuptools >= 16.0",
"wrapt >= 1.0.0, < 2.0.0",
"importlib-metadata >= 6.0, < 8.0",
Copy link

@Arnatious Arnatious Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib-metadata is already aliased in opentelemetry-api (https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-api/src/opentelemetry/util/_importlib_metadata.py) - should we reuse this rather than re-introduce the dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is at:

    "importlib-metadata >= 6.0, <= 8.4.0",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"packaging >= 18.0",
]

[project.scripts]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,13 @@
# limitations under the License.

from argparse import REMAINDER, ArgumentParser
from importlib.metadata import entry_points
from logging import getLogger
from os import environ, execl, getcwd
from os.path import abspath, dirname, pathsep
from re import sub
from shutil import which

from pkg_resources import iter_entry_points
xrmx marked this conversation as resolved.
Show resolved Hide resolved

from opentelemetry.instrumentation.version import __version__

_logger = getLogger(__name__)
Expand Down Expand Up @@ -48,8 +47,8 @@ def run() -> None:

argument_otel_environment_variable = {}

for entry_point in iter_entry_points(
"opentelemetry_environment_variables"
for entry_point in entry_points(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method of selection is supported for Python 3.8 and 3.9 -> https://docs.python.org/3/library/importlib.metadata.html#entry-points

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my earlier comment #2181 (comment) - if we use the importlib_metadata already taken as a dependency in opentelemetry-api, it's version >=6.4.0 which supports this syntax.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> for entry_point in importlib_metadata.entry_points(group="console_scripts"):
...   print(entry_point)
... 
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pip3.9', value='pip._internal.cli.main:main', group='console_scripts')
>>> importlib_metadata.version('importlib_metadata')
'6.4.0'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should update the import path though 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding but we discuss about that here #2181 (comment).

And here #2181 (comment) about the version constraint.

Using a constraint on a version of importlib-metadata would ensure that we could use groups and thus avoid having to use .get() on Entry_Points and that's a good thing. On the other hand, I suppose that we would be restricting a version which could cause compatibility problems.
There is an interesting post that report this.

The second approach is fine but I think it would make it difficult for us to maintain the code in the future, every time Python releases a new version we would have to update it and perhaps have to implement a new method if the imp changes. I think it would add an unnecessary workload to us in the future. Additionally, we would be repeating already made code.

If I have to take sides with something, it is to use .get() at the moment.
But in case we cannot use the .get() approach I would use the version constraint solution

@xrmx @Arnatious

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting replacing all uses of importlib.metadata or importlib_metadata with opentelemetry.util._importlib_metadata and taking it as a transitive dependency from opentelemetry-api itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, my bad. It seems reasonable to use the alias.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please use opentelemetry.util._importlib_metadata. We defnitely don't want to use neither importlib.metadata nor imporlib_metadata anywhere else outside of opentelemetry.util._importlib_metadata. The implementation of importlib.metadata and importlib_metadata is plagued with API changes between Python versions and we want to contain all the issues this causes into one single place.

BTW, since there is a lot of interest in moving this forward, I added some commits on top of your PR in #2854. I cannot continue debugging the failing test cases right now but I encourage you to use the code in #2854, which removes usage of importlib.metadata in test requirements files and uses opentelemetry.util._importlib_metadata in source code.

Also, you'll notice that the CORE_REPO_SHA environment variable in #2854 is set to a particular commit, that commit should always be the latest commit of this. Once that PR gets merged, the value of CORE_REPO_SHA should be set back to main in the tox.ini file and the .github/workflows/*.yml files.

group="opentelemetry_environment_variables"
):
environment_variable_module = entry_point.load()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from importlib.metadata import EntryPoint, distributions, entry_points
from logging import getLogger
from os import environ

from pkg_resources import iter_entry_points

from opentelemetry.instrumentation.dependencies import (
get_dist_dependency_conflicts,
)
Expand All @@ -31,9 +30,32 @@
_logger = getLogger(__name__)


class _EntryPointDistFinder:
def __int__(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? __int__ vs __init__

(Can probably just be deleted if https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2181/files#r1739126880 is accepted)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a typo

self._mapping = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this a functools.cached_property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?:

    @cached_property
    def dist_for(self, entry_point: EntryPoint):
        dist = getattr(entry_point, "dist", None)
        if dist:
            return dist

        if self._mapping is None:
            self._mapping = {
                self._key_for(ep): dist
                for ep in dist.entry_points
                for dist in distributions()
            }

        return self._mapping.get(self._key_for(entry_point))

This would always return the same dist, right?

Copy link

@Arnatious Arnatious Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the _mapping attribute within _EntrypointDistFinder - it'll do exactly what you're already doing and only compute _mapping once.

So it'd be

class _EntryPointDistFinder:
    @cached_property
    def _mapping(self):
        return {
            self._key_for(ep): dist
            for ep in dist.entry_points
            for dist in distributions()
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yes, I agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this (with the two for lines swapped) but it looks like the code is not used with our current tests, see
https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2871/files#r1759148502

So the question is: do we really need this cache?

Copy link

@Arnatious Arnatious Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I popped a breakpoint on dist_for - we only ever pass it Mocks, so it never fails to getattr. Seems like some over-mocking is taking place.


def dist_for(self, entry_point: EntryPoint):
dist = getattr(entry_point, "dist", None)
if dist:
return dist

if self._mapping is None:
self._mapping = {
self._key_for(ep): dist
for ep in dist.entry_points
lzchen marked this conversation as resolved.
Show resolved Hide resolved
for dist in distributions()
}

return self._mapping.get(self._key_for(entry_point))

@staticmethod
def _key_for(entry_point: EntryPoint):
return f"{entry_point.group}:{entry_point.name}:{entry_point.value}"


def _load_distro() -> BaseDistro:
distro_name = environ.get(OTEL_PYTHON_DISTRO, None)
for entry_point in iter_entry_points("opentelemetry_distro"):
for entry_point in entry_points(group="opentelemetry_distro"):
try:
# If no distro is specified, use first to come up.
if distro_name is None or distro_name == entry_point.name:
Expand All @@ -58,23 +80,25 @@ def _load_distro() -> BaseDistro:

def _load_instrumentors(distro):
package_to_exclude = environ.get(OTEL_PYTHON_DISABLED_INSTRUMENTATIONS, [])
entry_point_finder = _EntryPointDistFinder()
if isinstance(package_to_exclude, str):
package_to_exclude = package_to_exclude.split(",")
# to handle users entering "requests , flask" or "requests, flask" with spaces
package_to_exclude = [x.strip() for x in package_to_exclude]

for entry_point in iter_entry_points("opentelemetry_pre_instrument"):
for entry_point in entry_points(group="opentelemetry_pre_instrument"):
entry_point.load()()

for entry_point in iter_entry_points("opentelemetry_instrumentor"):
for entry_point in entry_points(group="opentelemetry_instrumentor"):
if entry_point.name in package_to_exclude:
_logger.debug(
"Instrumentation skipped for library %s", entry_point.name
)
continue

try:
conflict = get_dist_dependency_conflicts(entry_point.dist)
entry_point_dist = entry_point_finder.dist_for(entry_point)
conflict = get_dist_dependency_conflicts(entry_point_dist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are not using entry_point.dist anymore?

Copy link
Contributor Author

@Rodrigo-Novas Rodrigo-Novas Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entry_point.dist doesn't seem to exist in older python versions and is an optional attribute in the latest version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use importlib_metadata (via opentelemetry api util) we just need to be assured that it's there since 6+ no? Looking at some old pkg_resources code, dist was default None there too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to be guaranteed 6+ if open-telemetry/opentelemetry-python#4177 goes through - we'll only have 6 on python 3.12 and higher.

if conflict:
_logger.debug(
"Skipping instrumentation %s: %s",
Expand All @@ -90,14 +114,14 @@ def _load_instrumentors(distro):
_logger.exception("Instrumenting of %s failed", entry_point.name)
raise exc

for entry_point in iter_entry_points("opentelemetry_post_instrument"):
for entry_point in entry_points(group="opentelemetry_post_instrument"):
entry_point.load()()


def _load_configurators():
configurator_name = environ.get(OTEL_PYTHON_CONFIGURATOR, None)
configured = None
for entry_point in iter_entry_points("opentelemetry_configurator"):
for entry_point in entry_points(group="opentelemetry_configurator"):
if configured is not None:
_logger.warning(
"Configuration of %s not loaded, %s already loaded",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import argparse
import logging
import sys
from importlib.metadata import PackageNotFoundError, version
from subprocess import (
PIPE,
CalledProcessError,
Expand All @@ -23,7 +24,7 @@
check_call,
)

import pkg_resources
from packaging.requirements import Requirement
xrmx marked this conversation as resolved.
Show resolved Hide resolved

from opentelemetry.instrumentation.bootstrap_gen import (
default_instrumentations,
Expand Down Expand Up @@ -91,18 +92,19 @@ def _pip_check():


def _is_installed(req):
if req in sys.modules:
return True
req = Requirement(req)

try:
pkg_resources.get_distribution(req)
except pkg_resources.DistributionNotFound:
dist_version = version(req.name)
except PackageNotFoundError:
return False
except pkg_resources.VersionConflict as exc:

if not req.specifier.filter(dist_version):
logger.warning(
"instrumentation for package %s is available but version %s is installed. Skipping.",
exc.req,
exc.dist.as_requirement(), # pylint: disable=no-member
"instrumentation for package %s is available"
" but version %s is installed. Skipping.",
req,
dist_version,
)
return False
return True
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
from importlib.metadata import Distribution, PackageNotFoundError, version
from logging import getLogger
from typing import Collection, Optional
from typing import Collection, Optional, Union

from pkg_resources import (
Distribution,
DistributionNotFound,
RequirementParseError,
VersionConflict,
get_distribution,
)
from packaging.requirements import InvalidRequirement, Requirement

logger = getLogger(__name__)

Expand All @@ -27,36 +22,43 @@ def __str__(self):
def get_dist_dependency_conflicts(
dist: Distribution,
) -> Optional[DependencyConflict]:
main_deps = dist.requires()
instrumentation_deps = []
for dep in dist.requires(("instruments",)):
if dep not in main_deps:
# we set marker to none so string representation of the dependency looks like
# requests ~= 1.0
# instead of
# requests ~= 1.0; extra = "instruments"
# which does not work with `get_distribution()`
dep.marker = None
instrumentation_deps.append(str(dep))
extra = "extra"
instruments = "instruments"
instruments_marker = {extra: instruments}
for dep in dist.requires:
if extra not in dep or instruments not in dep:
continue

req = Requirement(dep)
if req.marker.evaluate(instruments_marker):
instrumentation_deps.append(req)

return get_dependency_conflicts(instrumentation_deps)


def get_dependency_conflicts(
deps: Collection[str],
deps: Collection[Union[str, Requirement]],
) -> Optional[DependencyConflict]:
for dep in deps:
if isinstance(dep, Requirement):
req = dep
else:
try:
req = Requirement(dep)
except InvalidRequirement as exc:
logger.warning(
'error parsing dependency, reporting as a conflict: "%s" - %s',
dep,
exc,
)
return DependencyConflict(dep)

try:
get_distribution(dep)
except VersionConflict as exc:
return DependencyConflict(dep, exc.dist)
except DistributionNotFound:
return DependencyConflict(dep)
except RequirementParseError as exc:
logger.warning(
'error parsing dependency, reporting as a conflict: "%s" - %s',
dep,
exc,
)
dist_version = version(req.name)
except PackageNotFoundError:
Rodrigo-Novas marked this conversation as resolved.
Show resolved Hide resolved
return DependencyConflict(dep)

if not req.specifier.contains(dist_version):
return DependencyConflict(dep, f"{req.name} {dist_version}")
return None
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@
"""

from abc import ABC, abstractmethod
from importlib.metadata import EntryPoint
Rodrigo-Novas marked this conversation as resolved.
Show resolved Hide resolved
from logging import getLogger

from pkg_resources import EntryPoint

from opentelemetry.instrumentation.instrumentor import BaseInstrumentor

_LOG = getLogger(__name__)
Expand Down
Loading
Loading