Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Please make trove classifiers validation optional #4459

Closed
eli-schwartz opened this issue Jul 5, 2024 · 28 comments · Fixed by #4611
Closed

[BUG] Please make trove classifiers validation optional #4459

eli-schwartz opened this issue Jul 5, 2024 · 28 comments · Fixed by #4611

Comments

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jul 5, 2024

setuptools version

from git, also 70.0.0

Python version

Python 3.12

OS

Gentoo Linux

Additional environment information

No response

Description

I tried building a wheel for a package that uses a trove classifier in the very latest version of the trove_classifiers package. Setuptools detected that trove_classifiers was installed, and insisted on validating it.

Looking at the code which handles this, it offers an escape hatch only to "avoid hitting the network" if trove_classifiers is NOT installed:

class _TroveClassifier:
"""The ``trove_classifiers`` package is the official way of validating classifiers,
however this package might not be always available.
As a workaround we can still download a list from PyPI.
We also don't want to be over strict about it, so simply skipping silently is an
option (classifiers will be validated anyway during the upload to PyPI).
"""
downloaded: typing.Union[None, "Literal[False]", typing.Set[str]]
def __init__(self) -> None:
self.downloaded = None
self._skip_download = False
# None => not cached yet
# False => cache not available
self.__name__ = "trove_classifier" # Emulate a public function
def _disable_download(self) -> None:
# This is a private API. Only setuptools has the consent of using it.
self._skip_download = True
def __call__(self, value: str) -> bool:
if self.downloaded is False or self._skip_download is True:
return True
if os.getenv("NO_NETWORK") or os.getenv("VALIDATE_PYPROJECT_NO_NETWORK"):
self.downloaded = False
msg = (
"Install ``trove-classifiers`` to ensure proper validation. "
"Skipping download of classifiers list from PyPI (NO_NETWORK)."
)
_logger.debug(msg)
return True
if self.downloaded is None:
msg = (
"Install ``trove-classifiers`` to ensure proper validation. "
"Meanwhile a list of classifiers will be downloaded from PyPI."
)
_logger.debug(msg)
try:
self.downloaded = set(_download_classifiers().splitlines())
except Exception:
self.downloaded = False
_logger.debug("Problem with download, skipping validation")
return True
return value in self.downloaded or value.lower().startswith("private ::")
try:
from trove_classifiers import classifiers as _trove_classifiers
def trove_classifier(value: str) -> bool:
"""See https://pypi.org/classifiers/"""
return value in _trove_classifiers or value.lower().startswith("private ::")
except ImportError: # pragma: no cover
trove_classifier = _TroveClassifier()

Expected behavior

A way to disable the unwanted functionality without first uninstalling other packages. For example, instead of $VALIDATE_PYPROJECT_NO_NETWORK, I would like to see $VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS.

How to Reproduce

  1. git clone https://github.com/urschrei/pyzotero
  2. python -m build -nwx

Output

* Building wheel...
configuration error: `project.classifiers[10]` must be trove-classifier
DESCRIPTION:
    `PyPI classifier <https://pypi.org/classifiers/>`_.

GIVEN VALUE:
    "License :: OSI Approved :: Blue Oak Model License (BlueOak-1.0.0)"

OFFENDING RULE: 'format'

DEFINITION:
    {
        "type": "string",
        "format": "trove-classifier"
    }

For more details about `format` see
https://validate-pyproject.readthedocs.io/en/latest/api/validate_pyproject.formats.html

Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in <module>
    main()
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 357, in main
    json_out["return_val"] = hook(**hook_input["kwargs"])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 271, in build_wheel
    return _build_backend().build_wheel(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 410, in build_wheel
    return self._build_with_temp_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
    self.run_setup()
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup
    exec(code, locals())
  File "<string>", line 12, in <module>
  File "/usr/lib/python3.12/site-packages/setuptools/__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 158, in setup
    dist.parse_config_files()
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 632, in parse_config_files
    pyprojecttoml.apply_configuration(self, filename, ignore_option_errors)
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 68, in apply_configuration
    config = read_configuration(filepath, True, ignore_option_errors, dist)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 133, in read_configuration
    validate(subset, filepath)
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 57, in validate
    raise ValueError(f"{error}\n{summary}") from None
ValueError: invalid pyproject.toml config: `project.classifiers[10]`.
configuration error: `project.classifiers[10]` must be trove-classifier

ERROR Backend subprocess exited when trying to invoke build_wheel
@eli-schwartz eli-schwartz added bug Needs Triage Issues that need to be evaluated for severity and status. labels Jul 5, 2024
@eli-schwartz
Copy link
Contributor Author

See also: urschrei/pyzotero#178

@mgorny
Copy link
Contributor

mgorny commented Jul 5, 2024

Making it into a warning rather than error would also work for us.

@jaraco
Copy link
Member

jaraco commented Jul 17, 2024

After reviewing the code, I can add some detail.

The escape hatch was created to avoid network access for the fallback behavior when the trove_classifiers package isn't present. It assumes that if that package is present, it contains valid data and should be used to do the validation. The fact that it's stale suggests that maybe that package is not yet valid in that configuration.

I'm a little reluctant to add yet another escape hatch for this one configuration item especially for this rare case that's likely to resolve itself over time.

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

The suggestion to make it a warning instead of an error seems reasonable, but my guess is that most users will miss the warning and only encounter it when attempting to upload to an index anyway, suggesting maybe the validation should be removed altogether.

But, the validation logic isn't maintained here, it's maintained instead in abravalheri/validate-pyproject.

@abravalheri What do you think of the proposal and the alternatives?

@jaraco jaraco added enhancement downstream and removed bug Needs Triage Issues that need to be evaluated for severity and status. labels Jul 17, 2024
@eli-schwartz
Copy link
Contributor Author

The fact that it's stale suggests that maybe that package is not yet valid in that configuration.

The fact that it's stale in this case indicates we never intended trove_classifiers to be valid at all, but we build without virtualenv isolation.

I'm a little reluctant to add yet another escape hatch for this one configuration item especially for this rare case that's likely to resolve itself over time.

It's not going to resolve itself over time, because due to dependency graph ordering there's no real reason:

  • when both trove_classifiers and pyzotero are installed globally at the distro level
  • when both have pending updates

for trove_classifiers to end up queued for installation first. Instead, updating fails, and continues to fail because every time a system update is attempted, it fails again.

If setuptools removed its validation entirely, or made it only use the network, that would equally solve the gentoo use case though. :) We already disable networking anyway.

Of course, one might ask, why not just build in an isolated environment without trove_classifiers? That's recommended by PyPA via virtualenvs. But distros generally don't do this, because they already have both network isolation (cannot install build-requires with pip) and clean build isolation via spinning up temporary containers with just distro-packaged dependencies installed. And distros want to use their own packaged dependencies for various reasons.

... but Gentoo (can, and usually does) build from source on every user's machine, and spinning up containers for every package is a hassle, slow, implies nontrivial setup, and generally is impractical. Distros like Debian and Fedora have complex distributed systems to do this and reset to a clean slate in between each build, so it doesn't matter if it takes 15 minutes to build each pure-python wheel because they have tends of buildbots building thousands of packages, and each one only has to get built once.

And most other cases of additional packages installed in a build environment aren't a problem. ;) Build isolation primarily solves the problem of people forgetting to specify everything the build depends on because they already have it installed.

So gentoo usually does what those other distros also do when individual users want to build a .deb or .rpm, and builds on the live system, and relies on --disable-foo style options where applicable (e.g. C/C++ environments, where there is a culture of build options, unlike Python where build options largely aren't a thing unless you're building data science packages that use both C++ and Fortran and cannot be built by pip because pip doesn't know how to install OpenBLAS).

And so we end up here, asking for, well, a --disable-foo style option to disable trove classifier checking. :D

@mgorny
Copy link
Contributor

mgorny commented Jul 17, 2024

But, the validation logic isn't maintained here, it's maintained instead in abravalheri/validate-pyproject.

Yeah, I've noticed that earlier and I was just about to open a pull request there. Good that I've checked mail first, let's see what resolution is preferred.

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

Do you mean one that fetches data straight from PyPI? As long as we can force-disable Internet access, that would work for us.

@abravalheri
Copy link
Contributor

abravalheri commented Jul 18, 2024

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

I would love to take this approach, but the PyPI Web page for listing classifiers is not completely official. I know that flit uses it... but the standards mention the package as the only true source of truth... So I am not ready to do that move unless there is commitment in supporting the HTTP api for classifiers in PyPI.

My view is that this is a tricky problem, but also a minor one... Once the latest trove-classifiers package is adopted downstream, the problem would naturally go away, right? Also, the trove-classifiers package should be mostly backwards compatible, so the risk of adopting a new version is very low...

A question: if the problem is the installation order is it possible to make trove-classifiers a build-only dependency of pyzotero? That would ensure that the latest classifiers are installed right?

@eli-schwartz
Copy link
Contributor Author

My view is that this is a tricky problem, but also a minor one... Once the latest trove-classifiers package is adopted downstream, the problem would naturally go away, right? Also, the trove-classifiers package should be mostly backwards compatible, so the risk of adopting a new version is very low...

I'm pretty sure I was quite clear that the problem does not naturally go away ever, because the solver gets into a stuck state and doesn't get unstuck until someone manually kicks it -- because it will keep trying to install the failing package before trove_classifiers. That usually means patching the setuptools distro package to have a hard dependency on trove_classifiers >= $LATEST_RELEASE which is kind of annoying.

It would be much better if we could unstick things automatically by telling setuptools to stop performing validations we don't want. We aren't uploading to PyPI.

It's confusing anyway because there's absolutely no reason people shouldn't be allowed to use whatever trove classifiers they like, even unofficial ones, on private self-run indexes.

@eli-schwartz
Copy link
Contributor Author

A question: if the problem is the installation order is it possible to make trove-classifiers a build-only dependency of pyzotero? That would ensure that the latest classifiers are installed right?

Just want to reiterate here that adding trove-classifiers as build-requires for arbitrary packages throughout the python ecosystem is not a very comfortable solution in the slightest.

Especially because trove-classifiers may not need to be installed at all -- and in fact the only reason I have it on my system is because hatchling required it.

It is terrible UX to have pyzotero require an additional build-requires, solely because if it doesn't have that build-requires then installing hatchling will result in setuptools erroring out. It's not even the kind of bug that anyone will notice for a while.

@abravalheri
Copy link
Contributor

Hi @eli-schwartz, thank you very much for the reply. I understand your concerns and appreciate your feedback.

Please note that I am not suggesting to add trove-classifiers as a build-requirements to all packages throughout the ecosystem, but rather this package in particular. I don't imagine that the majority of the packages on the ecosystem will require the very latest trove-classifiers.

Also note that when I suggested that the "problem eventually goes away" I was referring to "greenfield" deployments in the future when the trove-classifiers in the repositories that gets pulled by default is already new enough (at least this is my impression). I do agree that for a particular machine that already have the packages installed, it does not go away and require manual intervention.

I understand that you guys have your reasons to disable build isolation, but I would like to point out that build isolation was introduced to fix this kind of problem1... It is perfectly fine for OS-maintainers to attempt to optimise builds, but I think that in this case when the lack of build-isolation is directly involved in a bug, it would be important to explore other workarounds (like the inclusion in this particular package of trove-classifiers as a build dependency to force a particular installation order; or removing the problematic classifier in the package metadata).

I aim to keep the codebase as simple as possible, which is why I’m hesitant to add yet more knobs and levers. However, one potential solution could be introducing an environment variable like SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION to skip the validations all together in https://github.com/pypa/setuptools/blob/main/setuptools/config/pyprojecttoml.py. In general, this would be risky, as it bypasses important checks that reduce the set of accepted inputs; but for classifiers, it should be fine. The responsibility would fall on the developer/user that sets this environment variable to ensure the overall configuration is valid.

Regarding the idea of adding --disable-foo style options to setuptools/validate-pyproject, the current codebase (in both setuptools and validate-pyproject) doesn’t support this. However, if anyone has concrete proposals and is willing to contribute PRs, I’m open to discussing and reviewing these suggestions.

Footnotes

  1. Disabling build-isolation is historically know to cause problems with some features of the PyPA packaging ecosystem like optional dependencies and entrypoint-based plugins.

@eli-schwartz
Copy link
Contributor Author

I understand that you guys have your reasons to disable build isolation, but I would like to point out that build isolation was introduced to fix this kind of problem1...

  1. Disabling build-isolation is historically know to cause problems with some features of the PyPA packaging ecosystem like optional dependencies and entrypoint-based plugins.

This is indeed a long-standing flaw in the python ecosystem, yes. :)

Automagically doing the wrong thing based on unpredictable environment inputs is bad, and everyone agrees on that. Due to several decades of legacy baggage, it is difficult to solve this for setuptools because the automagic environment inputs serve multiple roles including extensibility by way of entrypoint-based plugins.

Build isolation is not, and never has been, a solution to this engineering problem. It's a way to avoid the side effects in a scenario where the design constraints prevent a solution from being achieved. It is also, largely, a problem that only affects setuptools and doesn't affect other build backends which take a more structural approach to defining builds.

You may wish to take note here of the wording for PEP 517. It describes build isolation and marks it as an RFC 2119 compliant "SHOULD" feature, with two PEP rationales:

  1. that it permits building multiple incompatible packages whose build requirements are mutually exclusive (and calls out pinned versions as the probable cause)
  2. that it provides the hygienic ability to catch scenarios where people have forgotten to list all their build requirements

It doesn't mention the possibility of additional environment packages modifying your own build requirements, indicating that this isn't a motivational concern behind the recommendation to use build isolation.

@eli-schwartz
Copy link
Contributor Author

Regarding the idea of adding --disable-foo style options to setuptools/validate-pyproject, the current codebase (in both setuptools and validate-pyproject) doesn’t support this. However, if anyone has concrete proposals and is willing to contribute PRs, I’m open to discussing and reviewing these suggestions.

Just for clarity, my mention of --disable-foo style options was intended as an explicit way to be evocative of GNU autoconf, not because I specifically believe setuptools should be configured via CLI flags instead of environment variables.

I suspect environment variables is ultimately easier for setuptools, if only because it doesn't require passing state around through multiple layers of "execute the setup.py script together with a phase to run".

The thing that I care about is the Autoconf model where functionality is expected to have some form of on/off switch rather than unconditionally probe whether an import is available.

@eli-schwartz
Copy link
Contributor Author

I was not notified that you opened a PR eventually, and it appears that despite my attempting to explain why the entire thought process behind "skip the validations altogether" was based on false promises, that's the approach that ended up used anyway.

This is unfortunate and in my opinion you haven't actually solved my bug report. I asked to make validating trove classifiers optional, I didn't ask to make validating the entire file optional.

In particular, the fact that the fix you merged emits a SetuptoolsWarning feels mean-spirited.

I understand why the warning exists, of course. Since it involves skipping ALL validation including basic consistency checks of load-bearing information that setuptools relies upon to successfully build a wheel (as opposed to trove classifiers which are anonymous display-only data strings where validation is only required when uploading to PyPI, and which PyPI already performs on upload), using that option is in fact dangerous in ways that far exceed the perfectly safe request which I had originally made.

But that's really the point, isn't it. The reason the SetuptoolsWarning feels mean-spirited is because the whole approach to closing this ticket feels mean-spirited.

I asked for a specific change, and provided specific reasons for that change. You disagreed with my reasons (which is your right) and engaged in dialogue about why you disagree (I'm happy to have a discussion about the topic -- maybe one of us will convince the other) and then a month later without notifying me (?) or asking me for feedback (???) created a PR implementing something I didn't ask for that is designed in a manner optimized to make sure that I wouldn't be willing to use it (??????????). Not only that, but you raised a question for feedback on that PR, and waited several months for practically nobody to submit feedback since nobody knew the PR existed. You plainly wanted the feedback, so why... hide its existence from the person who opened the ticket.

If you dislike my request that much then please be straightforward about it and close it as "WONTFIX".

@abravalheri
Copy link
Contributor

abravalheri commented Nov 11, 2024

@eli-schwartz I am sorry this did not fix your problem.

Indeed, I was not keen to take the approach you suggested by the series of reasons I have already explained previously. But I was genuinely concerned that there was no escape hatch for scenarios involving problems with "accidental extra dependencies" for validate-pyproject so that is why I created the PR.

Please note that in #4459 (comment), I laid out the one potential solution I was willing to implement and the PR pretty much followed it. So, the choice of implementation should not come as a surprise.

Personally, I did not see other viable alternative, but in the same comment I welcomed PRs implementing alternative approaches, because I belive that different people will have different insights and different solutions that can be very valuable to solve the problem. But I needed to see a concrete implementation, not only debate general ideas on how things could ideally be, since feasibility was one of the concerns.

Regarding the criticism about not pinging you: by no means I was trying to hide anything from you (well, I pretty much told you about the one implementation that I was willing to put together before doing so). I simply don't like to ping people if I can avoid it. Life is already too full of notifications. Besides, doesn’t Github already do the notifications?

In general, I do write PRs and leave them open for a while, so the entire community has the chance to have a look and provide feedback (unless I feel it is urgent or I need it for something else).

In the future, when you want to be pinged about a specific implementation, please indicate that, and I will be happy to do so.

@abravalheri
Copy link
Contributor

abravalheri commented Nov 11, 2024

Anyway, now we have an escape hatch for "spooky action at distance" in validate-pyproject that may be useful in emergencies. But if both you and @mgorny don't see value in it, please let me know and I can proceed to remove it before people start to rely on it.

@eli-schwartz
Copy link
Contributor Author

I simply don't like to ping people all the time (besides, doesn’t Github already do the notifications?)

Github does notifications, but I wasn't subscribed to the PR. It's okay to leave a comment on an issue saying "I've posted a PR at XXXX", no @username involved, that will reach people subscribed to this ticket and interested in further developments about this ticket. Github does NOT (and never has) send email notifications to people subscribed to a ticket whenever another issue or PR anywhere on github references a ticket -- so there is no way to know that "abravalheri mentioned this issue on Aug 28" without visiting this issue, which is unlikely to occur a month later if I don't get an email notification for new activity.

I'm subscribed to this ticket -- why would I not want to get an email notification about new developments such as a proposed solution? If I didn't want email notifications, I can always submit a ticket and then unsubscribe.

New comments to a ticket I'm subscribed to, don't even do anything overly noisy such as create an android system notifications event (from the GitHub App) the way an overt @username mention does. It's an extremely light, no-pressure, no-nag approach to keeping people in the loop. It's not about being pinged, specifically.

Indeed, I was not keen to take the approach you suggested by the series of reasons I have already explained previously. But I was genuinely concerned that there was no escape hatch for scenarios involving problems with "accidental extra dependencies" for validate-pyproject so that is why I created the PR.

Well, if you're solving a problem that isn't the one reported here I'd personally say it's a bit clearer to implement that change separately, and then close this one as "Won't Implement", if it is indeed not going to be implemented...

@abravalheri
Copy link
Contributor

Thanks for the info on Github notifications. Because github does add a note to the issue thread when a related PR is implemented, my understanding was that some sort of notification is available. But I see now that I was wrong.

Well, if you're solving a problem that isn't the one reported here I'd personally say it a bit clearer to implement that change separately, and then close this one as "Won't Implement", if it is indeed not going to be implemented...

That is a fair comment. I will keep that in mind for the future.

Note, however, that the 2 problems are not exactly distinct in my understanding and that the escape hatch provided is useful to workaround the originally described problem. It might not be exactly how you wanted it, but well, it is what I could do.

Finally, please let me know if you prefer the environment variable to be removed. I will also wait for mgorny on that.

@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Nov 11, 2024

Anyway, now we have an escape hatch for "spooky action at distance" in validate-pyproject that may be useful in emergencies. But if both you and @mgorny don't see value in it, please let me know and I can proceed to remove it before people start to rely on it.

The main issue is that it replaces one form of spooky action at a distance with a slightly less distant, but possibly even more spooky, action.

As the warning indicates, ignoring all validation for all configuration keys can lead to deeply broken wheels that don't even function as usable wheels. I daresay it would have to be quite an emergency before I'd consider using that. But I don't actually know under what circumstances you think it would make sense to skip validation of the pyproject.toml format (because one does not wish to see validation errors for the format) but does want to attempt to build a wheel using that invalid configuration.

It's entirely different, and significantly more reasonable, to want to skip "validate that this wheel doesn't violate PyPI's policy for uploading wheels to https://pypi.org which requires to limit yourself to search tags (i.e. trove classifiers) that have been agreed upon as common enough to be worth sorting on". Why should setuptools be enforcing an extremely arbitrary time-sensitive upload policy for a single hosting site, especially when you aren't using that hosting site yourself?

Personally, I did not see other viable alternative, but in the same comment I welcomed PRs implementing alternative approaches, because I belive that different people will have different insights and different solutions that can be very valuable to solve the problem. But I needed to see a concrete implementation, not only debate general ideas on how things could ideally be, since feasibility was one of the concerns.

I offered a very specific targeted suggestion. I suggested that instead of $VALIDATE_PYPROJECT_NO_NETWORK, this code right here:

class _TroveClassifier:
"""The ``trove_classifiers`` package is the official way of validating classifiers,
however this package might not be always available.
As a workaround we can still download a list from PyPI.
We also don't want to be over strict about it, so simply skipping silently is an
option (classifiers will be validated anyway during the upload to PyPI).
"""
downloaded: typing.Union[None, "Literal[False]", typing.Set[str]]
def __init__(self) -> None:
self.downloaded = None
self._skip_download = False
# None => not cached yet
# False => cache not available
self.__name__ = "trove_classifier" # Emulate a public function
def _disable_download(self) -> None:
# This is a private API. Only setuptools has the consent of using it.
self._skip_download = True
def __call__(self, value: str) -> bool:
if self.downloaded is False or self._skip_download is True:
return True
if os.getenv("NO_NETWORK") or os.getenv("VALIDATE_PYPROJECT_NO_NETWORK"):
self.downloaded = False
msg = (
"Install ``trove-classifiers`` to ensure proper validation. "
"Skipping download of classifiers list from PyPI (NO_NETWORK)."
)
_logger.debug(msg)
return True
if self.downloaded is None:
msg = (
"Install ``trove-classifiers`` to ensure proper validation. "
"Meanwhile a list of classifiers will be downloaded from PyPI."
)
_logger.debug(msg)
try:
self.downloaded = set(_download_classifiers().splitlines())
except Exception:
self.downloaded = False
_logger.debug("Problem with download, skipping validation")
return True
return value in self.downloaded or value.lower().startswith("private ::")
try:
from trove_classifiers import classifiers as _trove_classifiers
def trove_classifier(value: str) -> bool:
"""See https://pypi.org/classifiers/"""
return value in _trove_classifiers or value.lower().startswith("private ::")
except ImportError: # pragma: no cover
trove_classifier = _TroveClassifier()

could check for $VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS, and if set, skip:

  • importing trove_classifiers
  • falling back to _TroveClassifier()

and rather, the $VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS environment variable would result in _disable_download() being used.

That's not a general idea. That's a pretty concrete suggestion.

With specific regard to:

I welcomed PRs implementing alternative approaches, because I belive that different people will have different insights and different solutions that can be very valuable to solve the problem.

You did no such thing. I mentioned a tangent about --enable-foo / --with-foo, as an explanation for why Linux distros (all of them!) disagree with and reject the notion of build isolation. The usefulness of such things is a broadly applicable topic related to things such as PURE=1 python -m build causing a project with optional rust bindings to disable those rust bindings and fall back to the pure python implementation (for example, to target an operating system that Rust hasn't been ported to).1

You said you welcome concrete proposals for implementing --with-foo, a topic I don't have concrete proposals for as it would also require coordinating with the PEP 517 build backend interface among numerous other things, and might require redesigning how setuptools processes cmdclass phases as well.

Indeed, I was not keen to take the approach you suggested by the series of reasons I have already explained previously.

I only saw you state a single reason, which was:

I aim to keep the codebase as simple as possible, which is why I’m hesitant to add yet more knobs and levers.

You are of course welcome to that viewpoint! It's your project (half yours?) and you're responsible for maintaining it. I respect your desire to have a simple codebase and avoid adding "too many knobs and levers".

While, at the same time, also desiring for my part that communications around what you as a setuptools maintainer are comfortable with, are transparently done, including, sometimes, saying "sorry, I would rather not do this" and closing an issue as "Won't Implement". And also, that if you're going to use that as the reason to reject something, you not say "for the series of reasons" when there is not, in fact, any such (1 is not a series).

I'm genuinely startled that you implemented $SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION as I cannot visualize how one would conclude that this dangerous knob equates to a simple codebase that avoids adding yet more knobs and levers -- especially when no one has a use case for such an option.

Footnotes

  1. This was a recent concern for me just last week. I pinged you about it on the "dulwich" project, in a PR I submitted to said project, in order to ask about the projected future stability of the PEP 517 interface for detecting when a setup.py file is being processed by the get_requires_for_build_* hook.

@abravalheri
Copy link
Contributor

@eli-schwartz I appreciate you commenting on the topic, but I think we are starting to go to far. There has been too much miscommunication, and the messages are being lost and misinterpreted both sides. I do not wish to keep this going.

Could you please indicate if you wish SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION to be removed?

@eli-schwartz
Copy link
Contributor Author

Could you please indicate if you wish SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION to be removed?

I would suggest removing it, yes. I wouldn't be using it myself, because it is dangerous.

@eli-schwartz
Copy link
Contributor Author

Also, since you mentioned "concrete suggestions".

I was originally hesitant to write code for my concrete suggestion because:

  • it wasn't obvious to me that it would be accepted anyway
  • it requires doing stuff across multiple projects due to setuptools including the output of the validate_pyproject program

My original concrete suggestion now exists in code form, so you can see what I mean by example: eli-schwartz/validate-pyproject@f568d59

I would also argue that with that, the existing support for $NO_NETWORK could just be dropped. But I'm not invested in that. Honestly, I would also argue that the entire validate_pyproject logic for doing anything other than checking that it is an array of strings should be outright deleted, because it is better served by pypa/twine#430 and pypa/twine#1166 and has no role to serve in validate_pyproject.

@mgorny
Copy link
Contributor

mgorny commented Nov 12, 2024

Could you please indicate if you wish SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION to be removed?

I have two concerns with it:

  1. It disables much more than what we need. I don't know whether it's dangerous for us. The way I see it, for things to break it would either take 1) upstream not testing their package (admittedly, accidents happen), or 2) setuptools making breaking changes. Then, there's the question whether that could actually cause the level of breakage that wouldn't cause setuptools to explode anyway, and instead would produce a subtly broken package that we wouldn't notice.
  2. Using it would mean emitting the warning for every single package our users build. This would probably mean that either people would start reporting false positives, or we'd have to go out of the way to actually disable the warning. All things considered, it would probably be simpler for us to patch the whole thing out, and if we're stuck with local patching, then we could instead patch the thing we need, and that would be better and simpler in the end.

So yeah, I would very much prefer that it was possible to disable the part of validation that actually causes problems, rather than all of it.

@eli-schwartz
Copy link
Contributor Author

It disables much more than what we need. I don't know whether it's dangerous for us. The way I see it, for things to break it would either take 1) upstream not testing their package (admittedly, accidents happen), or 2) setuptools making breaking changes.

  1. PATCHES=() modifies pyproject.toml, sed in src_prepare...

@abravalheri
Copy link
Contributor

In #4746 I propose to revert the change that introduced SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION.

So yeah, I would very much prefer that it was possible to disable the part of validation that actually causes problems, rather than all of it.

Thank you very much @mgorny, just not that at this stage the PR is only for removing SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION. I am not considering any replacement yet.

@abravalheri
Copy link
Contributor

abravalheri commented Nov 12, 2024

Thank you very much @eli-schwartz for having a look on this and providing a contribution. I appreciate the effort you’ve put into proposing a solution and raising important discussion points. To ensure we use our efforts effectively, I’d like to clarify the approaches I would consider moving forward.

If I understood the implementation in eli-schwartz/validate-pyproject@f568d59 correctly, it introduces branching, adds one level of nesting and adds another environment variable that affects how classifiers are validated. This is an approach that I have considered before as a result of the earlier discussion in the thread, but decided to not follow. My resistance to follow this approach is what I tried to convey when I previously said I aim to keep the codebase as simple as possible, which is why I’m hesitant to add yet more knobs and levers. Maybe I was not successful in my communication on the first time, but I hope it is more clear now. This path forward is not something I wish to take.

Regarding NO_NETWORK, this feature was added in response to user requests, and removing it could be seen as a regression. Therefore, I am not considering its removal at this time.

Honestly, I would also argue that the entire validate_pyproject logic for doing anything other than checking that it is an array of strings should be outright deleted, because it is better served by pypa/twine#430 and pypa/twine#1166 and has no role to serve in validate_pyproject.

I understand your perspective, but I believe that ensuring we do not produce wheels that will be rejected later is an important role for validate_pyproject. This is not an approach I will consider at this moment.

The reason why I initially suggested SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION is because I don't want to be caught in a loop with increasing requests for disabling bits and pieces. For example, initially we had a request for NO_NETWORK, now for NO_TROVE_CLASSIFIERS. I don't know what comes next. So giving the users the ultimate escape hatch sounded like a compromise between not wanting to implement these piece-wise bespoke conditional checks and allowing the users to proceed without being constrained by my implementation choices1.

At this point, I don’t think there is a good way to reconcile the request with my personal requirements for the source code. That is why I previously said that something like SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION is the one approach I would consider. Of course, other people might have a different perspective on the challenge and they might propose something more clever that I haven't thought about yet. That is why I want to keep the possibility open and I will review PRs with an alternative take on the problem. But so far, the approaches we have been discussing in this thread don’t look promising to me2.

Footnotes

  1. It had to be caveated by a warning and discouraging variable naming because it is a pretty advanced setting, to be avoided whenever possible. If parts of the pyproject.toml are not well-formed, the code for parsing/interpreting it, may start to fail, for example with weird AttributeErrors, KeyErrors etc...

  2. The approach of having a single unified "download from PyPI" validator with its escape hatches and warnings is something I am completely fine with. However it does require commitment from PyPI and or standards to make it an official source of truth. I am happy to support if someone champion this in the relevant forums, but this is not a conversation that I want to spearhead.

@eli-schwartz
Copy link
Contributor Author

The reason why I initially suggested SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION is because I don't want to be caught in a loop with increasing requests for disabling bits and pieces. For example, initially we had a request for NO_NETWORK, now for NO_TROVE_CLASSIFIERS. I don't know what comes next. So giving the users the ultimate escape hatch sounded like a compromise between not wanting to implement these piece-wise bespoke conditional checks and allowing the users to proceed without being constrained by my implementation choices1.

The way I would look at it instead is that both of those requests for disabling bits and pieces are happening because people feel that trove classifiers, specifically, and nothing else, is something that is "not suitable for setuptools to check".

It's not increasing requests for disabling bits and pieces at all. It's just people who don't want to validate the approved list of allowed classifiers.

  • first, NO_NETWORK was implemented on the theory that checking trove-classifiers via PyPI is unpredictable if you're offline, based on self-directed feedback? I can't find an actual user request for it, it shipped with the initial implementation of checking PyPI at all
  • second, people said checking trove-classifiers via PyPI is unpredictable because you get different behavior if some remote server deprecates something
  • third, people said that checking trove-classifiers is generally unpredictable since you get different behavior in additional scenarios

It's all just requests by people who don't want to check trove-classifiers.

Some of those requests were by people who did not care to solve the unpredictability issue except for their own specific scenario where the unpredictability rears its head. But at the end of the day, it's still specific to trove-classifiers.

The logical conclusion here is that instead of having various different ways to disable trove-classifiers checking via bits and pieces here and there under specific scenarios, there should just be a way to disable trove-classifiers.

Anyone using NO_NETWORK is doing it because... they don't want to validate trove-classifiers. So if they just used VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS, they... get exactly the behavior they want, right?

That's why I said the next logical step is to delete the "old" environment variable. If people worry about it being a regression, just... document it and bump the major version?

@abravalheri
Copy link
Contributor

Thank you very much @eli-schwartz.

In the interest of complying with your request of being as much forthcoming as possible, I maintain that I will not be pursuing such routes, as I commented in #4459 (comment).

@eli-schwartz
Copy link
Contributor Author

Thanks for clarifying. Minor nit: can you use the "close issue as not planned" toggle for searchability reasons?

@abravalheri abravalheri reopened this Nov 12, 2024
@abravalheri abravalheri closed this as not planned Won't fix, can't repro, duplicate, stale Nov 12, 2024
@abravalheri
Copy link
Contributor

I will leave the PR #4746 open for one week (max) to receive feedback strictly on if it should be merged or not.

The aim of that particular PR is to remove SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION since the feedback I received is that it is not going to be used.

I am taking this approach because my understanding is that eli-schwartz is in favour of removing the feature unconditionally, but I have the impression that mgorny is in favour of removing the feature if a different solution is provided. I would like to clarify that I personally will not be providing a different solution.

In the case no feedback is provided by next week, I will use Github search to identify if SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION has been adopted. If so, I will close the PR without merging. If not, I will merge the PR.

mgorny added a commit to mgorny/validate-pyproject that referenced this issue Nov 21, 2024
As requested in pypa/setuptools#4459, add
a VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS environment variable that can
be used to disable using trove_classifiers package even if it is
available.  This can be used when the system features an outdated
trove_classifiers, and therefore incorrectly triggers validation error.
The change is designed to be absolutely minimal and non-intrusive.
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Nov 29, 2024
… strings

Trove classifiers, and their officialness, have no effect on a wheel
other than determining whether they are allowed to be uploaded to a
non-Gentoo website, and enabling the search index of that other site.

We don't need this, and we don't need to validate it. Setuptools will
disable validation if both of:

- network downloads failed

- cannot successfully import the `trove_classifiers` module

occurs. If trove-classifiers is installed by coincidence, this breaks
builds when it doesn't get updated on an extremely rapid basis and some
random package in dev-python/* uses a classifier that was made official
just the other day.

We could solve this another way, by making dev-python/setuptools
PDEPEND on trove-classifiers, and constantly bump the >= dependency. But
this is a pointless hassle. In fact, we're actually doing it, and it's
been a pointless hassle. We need to maintain up-to-the-minute minimum
bounds on the very latest version, and bump setuptools to a new -rX just
to update the minimum version of a package it doesn't even depend on. We
need to package new versions of trove-classifiers before *other* Gentoo
Devs outside of the python project, can successfully revbump their own
packages. We need to coordinate stabilization of trove-classifiers in
combination with those other packages. We force people to install a
pointless package. We overuse PDEPEND.

Instead, apply a *rejected* upstream patch to add an environment
variable that skips this specific validation code block entirely.
Upstream doesn't want to maintain code that contains branches, so we
will maintain it locally.

Since it is Gentoo-specific, the variable is also prefixed with GENTOO_
and is expected to be used solely inside of distribution packaging while
not affecting manual usage of setuptools outside of portage.

Bug: pypa/setuptools#4459
Signed-off-by: Eli Schwartz <[email protected]>
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Nov 29, 2024
…d strings

In the previous commit, a change was patched into setuptools to enable
skipping pypi.org specific validations we do not want. Export the
environment variable which activates this, whenever the build backend is
setuptools.

Bug: pypa/setuptools#4459
Signed-off-by: Eli Schwartz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants