Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pyproject.toml-style configuration (PEP 621) #2924

Closed
wants to merge 55 commits into from

Conversation

abravalheri
Copy link
Contributor

@abravalheri abravalheri commented Dec 11, 2021

First of all, I would like to apologize for the massive size of this PR. It does contain new vendorised dependencies, so it is not as big as it seems, but still it took me a lot of commits to implement this feature in a more conservative way (i.e., I tried to add some safeguards to avoid breaking existing code).

I really wanted to make it smaller, but the problem of retro-fitting the spec in PEP 621 is very complex... If there is something I can do to minimise the burden of the review please let me know.
(It does not help that GitHub does not offer a simple way of working with dependent pull requests, but maybe there is a methodology that we can agree here to split the change in smaller pieces).

I am definitely aware that this PR contains lots of sharp edges and that there is room for a lot of improvement... So please just take it as a proof of concept, that can be improved upon feedback.

Summary of changes

As general guidelines, I tried to roughly follow the ideas described in #1688 and #2671.
The following steps summarize the methodology:

  • The existing setuptools.config module contains functions that do 3 things at the same time:

    • (a) Parse some parts of the setup.cfg files (the actual ini format parsing however happens in distutils.core)
    • (b) Expand dynamic values based on the configuration parsed (e.g. file:, attr:, find: directives, file globbing, etc...)
    • (c) Apply the configuration (after the dynamic parts are expanded) to the dist object (and its metadata member)
      I created a new module setuptools.config.expand by extracting (a) from setuptools.config to make it re-usable.
  • I renamed the existing setuptools.config to setuptools.config.legacy_setupcfg and implemented 2 new modules: setuptools.config.setupcfg and setuptools.config.pyprojecttoml.

    • (a) setuptools.config.pyproject uses tomli (new vendored dependency) to parse pyproject.toml files. The parsed values are validated using JSONSchema via validate-pyproject (new vendored dependency).
    • (b) setuptools.config.setupcfg uses ini2toml (new vendored dependency) to convert existing setup.cfg into a data structure that follows the same format as the one proposed for pyproject.toml in accordance with the discussion in Discussion: support for pyproject.toml configuration #1688 (the details of the parameters not covered by PEP 621 were left unspecified, so this proposal was addopted). An actual pyproject.toml file is not produced because it would require vendorising yet another dependency (tomli-w).
  • I created 2 modules with logic to apply the pyprojec.toml-format configuration into an existing distribution object setuptools.metadata and setuptools.options.

    • As a form of safeguard I also added functions to this module to be able to compare 2 configurations. The idea here is to compare metadata/options from setup.cfg and the automatically generated pyproject.toml and check if they are compatible, falling back to the "native/legacy" setup.cfg configuration.
    • setuptools.metadata is based in the specification for the core metadata in packaging.python.org, which differs a bit from the one currently used by setuptools. This is something that can be further explored in a follow up PR
  • I created a new higher level API to coordinate these different options in setuptools/config/__init__.py. It can read pyproject.toml-style configuration and apply it to distribution objects.

    • When working with setup.cfg files, this API takes an conservative approach and uses setuptools.{metadata,options} to compare the automatically converted config with the original one, raising an error if they differ (this error is properly handled in build_meta). The idea here is to have a "transition period" where this feature is experimental, so users can report problems without having their build breaking. This check is implemented in setuptools.config._backward_compatibility.
  • I adapted setuptools.build_meta to use all of the previous implementations to provide support for pyproject.toml-style configuration.

    • In particular, I rely on distutils.core.run_setup to obtain the distribution object without having to expand any configuration or run any command.
    • As a side-effect, with the distribution object available, the workaround to obtain setup_requires was no longer necessary (so I replaced it with simply reading the setup_requires attribute of the dist object).
  • Finally, I made the default values for include_package_data=True, but only for configuration read from pyproject.toml (configuration from setup.cfg or setup.py are kept using the default value of include_package_data=False for maximum backward compatibility).

    • This was motivated by the suggestion of several members of the community and seems to be widely considered a QoL improvement. It is not a new idea (the iconic posts from Ionel in 2014/2015 already discussed this idea, but moving from one file format to another and having an automatic conversion is a perfect opportunity to do this change without breaking old packages).

Closes #1688, (and maybe?) #2671.

Pull Request Checklist

@jaraco
Copy link
Member

jaraco commented Dec 18, 2021

In general, this looks good. I might have a few nitpicks, and I certainly haven't devoted the time to absorb it all. Please have a look at the failing tests and identify any open concerns you have.

@abravalheri
Copy link
Contributor Author

Thank you very much for having a look @jaraco . I think I have an idea on what might be going wrong, but I haven't had the chance to code it yet 😅 (I plan to get back to it on Monday)

I was thinking to create a separate PR without the automatic conversion from setup.cfg to pyproject.toml...

I understand that automatic conversion is the prefered approach as discussed in the related issue, but (as this PR suggests) it also adds loads of complexity, that might not be worthy...

I am providing a separated tool (ini2toml) to help those that want to update. Maybe that is enough?

@jaraco
Copy link
Member

jaraco commented Dec 18, 2021

Definitely happy to consider a simpler approach to start.

This will facilitate the implementation of other configuration formats
(such as pyproject.toml as initially defined by PEP 621)
We can split the process of interpreting configuration files into 2 steps:

1. The parsing the file contents from strings to value objects
   that can be understand by Python (for example a string with a comma
   separated list of keywords into an actual Python list of strings).

2. The expansion (or post-processing) of these values according to the
   semantics ``setuptools`` assign to them (for example a configuration field
   with the ``file:`` directive should be expanded from a list of file paths to
   a single string with the contents of those files concatenated)

The idea of this change is to extract the functions responsible for (2.)
into a new module, so they can be reused between different config file
formats.
… from config to expand
This eventually will allow reading project metadata directly from
`pyproject.toml`
In order to minimise dependencies, `validate-pyproject` has the ability
to "dump" only the code necessary to run the validations to a given
directory. This special strategy is used instead of the default
`pip install -t`.

The idea of using JSONSchema for validation was suggested in pypa#2671,
and the rationale for that approach is further discussed in
https://github.com/abravalheri/validate-pyproject/blob/main/docs/faq.rst

Using a library such as `validate-pyproject` has the advantage of
incentive sing reuse and collaboration with other projects.

Currently `validate-pyproject` ships a JSONSchema for the proposed
use of `pyproject.toml` as means of configuration for setuptools.
In the future, if there is interest, setuptools could also ship its own
schema and just use the shared infrastructure of `validate-pyproject`
(by advertising the schemas via entry-points).
This is the first step towards making setuptools understand
`pyproject.toml` as a configuration file.

The implementation deliberately allows splitting the act of loading the
configuration from a file in 2 stages: the reading of the file itself
and the expansion of directives (and other derived information).
The user might specify dynamic `entry-points` via a `file:`
directive (a similar feature for `setup.cfg` is documented in
[declarative config]).

The changes introduced here add the ability to expand them
when reading the configuration from `pyproject.toml`.

[declarative config]: https://setuptools.pypa.io/en/latest/userguide/declarative_config.html
The `setuptools.metadata` module contains functions that allows applying
metadata-related configuration read from a `pyproject.toml` file into an
existing `dist` object. It also allows comparing metadata obtained from
different places to check if they are equivalent or not.
The `setuptools.options` module contains functions that allow applying
non-metadata related configuration that comes from a `pyproject.toml`
file into an existing `dist` object. Similarly to `setuptools.metadata`,
comparing options is also allowed.
This change allows comparing and applying command options stored under
the `[tool.distutils.<COMMAND NAME>]` table in `pyproject.toml`.
In pypa#2685 the plan that the community seems to agree on is to always
automatically translate `setup.cfg` into a `pyproject.toml` equivalent
and then proceed reading/parsing the configuration.

The objective of this change is to open space so we can implement this
way of reading `setup.cfg` in a new `setuptools.config.setupcfg` module
while keeping the legacy way of handling `setup.cfg` around.

The rationale behind keeping the legacy way around is that, to avoid
breaking existing packages during a transition period, we can compare
the old and the new way of parsing the configuration (e.g. via
`setuptools.{metadata,options}.compare`) and in the case they are
conflicting, use the old way and emit a warning asking the user to
report the error.
Some `ini2toml` modules are removed, since they are not necessary for
setuptools use case and might require extra dependencies (e.g. CLI
usage, writing TOML files, format preserving).

--

The automatic conversion of `setup.cfg` into `pyproject.toml` as the
*one true way* of doing declarative builds was first suggested in pypa#1688.

--

There are advantages and disadvantages in using an external tool such
as `ini2toml` for the automatic conversion.

The main advantage is that users can use the exact same tool for
converting their old packages and it will work in the same way
setuptools works. This means that on top of doing an automatic
conversion, the users are also offered an alternative to explicitly
update their package configuration.

The main disadvantage is that `ini2toml` works in a way that is a bit
more complex than simply parsing the `setup.cfg` file and dumping it as
`pyproject.toml` (we can think about `ini2toml` as if it was
transforming an "AST-equivalent" representation of the INI and TOML
formats -- this is necessary for other use cases covered by the library).
In order to minimise this complexity, some parts of the
package are stripped out during the vendoring process.

Also note that currently `ini2toml` ships a "built-in" plugin for
setuptools. In the future, if there is interest, there is also the
possibility of moving this plugin directly under setuptools repository.
This is the initial implementation of the "configuration driver"
that indirectly reads `setup.cfg` by first converting it to a data
structure corresponding to `pyproject.toml` and then expanding it.

This idea is based on the approach defined in pypa#2685.

LIMITATION: Differently from the `legacy_setupcfg` "configuration driver",
`setupcfg` does not support reading other distutils file.
The `find_others` flag is removed because of that.
After the changes in build_meta, some checks in the associated test
suite are required.

It seems that some assertions are not really needed, but it is important
to clarify that. For the time being, some skips are added until further
clarification.
The `setuptools.config` module needs at least 3 vendored dependencies to
work, but it seems that this might cause some bootstrapping problems.

This change implements a workaround for that (and add better debugging
messages).
The `tmp_src` fixture copies the setuptools directory to prevent errors
that appear when running tests concurrently. However it seems that is
copying everything including the `.git` directory (and possibly others
like `.tox`). These directories can be quite heavy and error prone to
copy.

The changes introduced here prevent copying these unnecessary
folders/files. As a side effect, the tests should run slightly faster.
There seems to be an opinion in the community that
`include_package_data=True` would be a better default and a quality of
life improvement.

This change relies on the fact that `ini2toml` will automatically
backfill `include_package_data=False` for converted `setup.cfg` files
to implement this change in a backward compatible way.
in _distutils/dist/_parse_command_opts a issubclass check is done
to verify the given command is a subclass of distutils.cmd.Command.
This check will fail if the command is mocked (the mock object is not a
class object and will generate a TypeError).
@abravalheri abravalheri force-pushed the support-pep621 branch 2 times, most recently from caf2e5a to 4a3487e Compare December 20, 2021 20:16
For some reason it is not easy to replicate the cygwin failures pointed out
by the CI in the local environment, this commit just adds some extra
debugging statements so we can have more information about the problem
@hukkin
Copy link

hukkin commented Dec 21, 2021

I'm not a lawyer, but shouldn't the license and copyright of any bundled code be included?

@abravalheri
Copy link
Contributor Author

abravalheri commented Dec 21, 2021

Hi @hukkin, probably yes 😅

But I am mostly relying on the existing vendoring script, so the best would be tackling it in a separated PR. (Would you like to open an issue for that?)

I can manually add a license for tomli (temporarily while the vendoring script is not fixed).

BTW, this is mostly a PoC, my plan is to simply it further in a different PR (once I am done wrestling with the CI/tests).

As pointed out in
pypa#2924 (comment),
license files are likely to be required for vendored packages.
This change manually adds the missing license files for the dependencies
introduced in the PR 2924.
@hukkin
Copy link

hukkin commented Dec 21, 2021

Oh yeah seems the already included vendored libs are missing a license as well so agree this should be a separate issue. I'm making one right now!

EDIT: Issue here #2950

@abravalheri
Copy link
Contributor Author

abravalheri commented Jan 4, 2022

For all those that have subscribed to this PR. I submitted a simplified version of it on #2970 (which would be my preferred approach right now).

It is still a bit on the heavy side, but I tried to provide a diagram and links that can help individually reviewing the different steps adopted for the implementation.

Any reviews/discussions on that PR are more than welcome!

@abravalheri abravalheri closed this Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discussion: support for pyproject.toml configuration
3 participants