Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use prepare_metadata_for_build_wheel instead of parsing setup.py #2296

Closed
wants to merge 12 commits into from

Conversation

finswimmer
Copy link
Member

@finswimmer finswimmer commented Apr 12, 2020

This PR is a complete rework, how poetry gather metadata about non-poetry project, that are necessary for dependency resolution, in case no sdist or wheel is available.

In the original implementation this was done by trying to parse the setup.py or setup.cfg. This has the strong limitation to situation where version and dependencies are not determine during runtime. And poetry has to run with python3, because the ast module for python2 works in a different way.

Build tools like setuptools that implements PEP 517 should provide a method prepare_metadata_for_build_wheel for getting access to the metadata one need for dependency resolution.

To gain access to this API hook following steps are needed:

  • create a separate environment in which the build tool is installed
  • run prepare_metadata_for_build_wheel for the build backend on the folder containing the package
  • parse the output with importlib.metadata

Lot of this work is already done in the pep517 package.

This implementation has the advantage:

  • metadata for complex setup.py can be obtained
  • the implementation in not necessary limited to setuptools. Virtually any PEP517 compliant build system can be integrated easily
  • the code is less complex

The only downside is, that the whole process needs more time than parsing. In my opinion this is negligible over the advantages.

Pull Request Check List

This is just a reminder about the most common mistakes. Please make sure that you tick all appropriate boxes. But please read our contribution guide at least once, it will save you unnecessary review cycles!

  • [*] Added tests for changed code.
    - [ ] Updated documentation for changed code.

@pradyunsg
Copy link
Contributor

This seems to be reimplementing a lot of the functionality from pep517. Consider directly using pep517 for the implementation here?

@finswimmer
Copy link
Member Author

@pradyunsg Your comment was fast :)

First I was a bit irritated by it, what you mean by pep517. After reading PEP517 once again I've found this part:

A Python library will be provided which frontends can use to easily call hooks this way.

And then I found this: https://pypi.org/project/pep517/ I guess this is what you meant?

Looks promising. Will have a look at this.

Thanks a lot! 👍

@pradyunsg
Copy link
Contributor

We're all kinds of bad at naming things. :)

I was on mobile, so linking to the PyPI page was... possible but annoyingly difficult. Sorry!

@finswimmer finswimmer marked this pull request as ready for review April 14, 2020 09:53
@finswimmer finswimmer requested a review from a team April 14, 2020 09:54
@finswimmer finswimmer changed the title [WIP]: use prepare_metadata_for_build_wheel instead of parsing setup.py use prepare_metadata_for_build_wheel instead of parsing setup.py Apr 14, 2020
Copy link
Member

@abn abn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, i think this is a step in the right direction. The AST parser was too fragile anyway. One minor thing to note is that this will remove the improvenents at #2257. Which is I think is okay, because we are receiving more accurate results.

A few minor questions and suggestions in the code. Leaving this as a comment since this is not ready for review yet afaict.

@@ -144,7 +144,7 @@ def test_interactive_with_git_dependencies(app, repo, mocker, poetry):
command = app.find("init")
command._pool = poetry.pool

mocker.patch("poetry.utils._compat.Path.open")
# mocker.patch("poetry.utils._compat.Path.open")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have not considered it yet, you could consider adding a test fixture to do this since this is reused. Something like this might be useful.

@pytest.fixture
def mocked_pyproject_toml(mocker):
    def side_effect(self, *args, **kwargs):
        if self.name == "pyproject.toml":
            return MagicMock()
        return self.open(*args, **kwargs)

    patched_open = mocker.patch("poetry.utils._compat.Path.open")
    patched_open.side_effect = side_effect
    yield patched_open

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already added for another change at

poetry/tests/conftest.py

Lines 122 to 134 in 95e3490

@pytest.fixture
def mocked_open_files(mocker):
files = []
original = Path.open
def mocked_open(self, *args, **kwargs):
if self.name in {"pyproject.toml"}:
return mocker.MagicMock()
return original(self, *args, **kwargs)
mocker.patch("poetry.utils._compat.Path.open", mocked_open)
yield files

Could reuse this.

@@ -116,16 +116,16 @@ def test_search_for_vcs_read_setup_with_extras(provider, mocker):
}


def test_search_for_vcs_read_setup_raises_error_if_no_version(provider, mocker):
def test_search_for_vcs_read_setup_with_dynamic_version(provider, mocker):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider retaining the old case as well since it is an expected failure?

]

package_data = {"": ["*"]}

install_requires = ["python-dateutil>=2.6,<3.0", "pytzdata>=2018.3"]

extras_require = {':python_version < "3.5"': ["typing>=3.6,<4.0"]}
extras_require = {'typing:python_version < "3.5"': ["typing>=3.6,<4.0"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -134,28 +131,10 @@ def test_setup_reader_read_setup_kwargs(setup):
assert expected_python_requires == result["python_requires"]


@pytest.mark.skipif(not PY35, reason="AST parsing does not work for Python <3.4")
def test_setup_reader_read_setup_call_in_main(setup):
Copy link
Member

@abn abn Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this case still work as expected with this change? ie. a setup file with the call in main.

except ImportError:
from ConfigParser import ConfigParser
from importlib_metadata import PathDistribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be redundant?

not result["install_requires"]
and not result["extras_require"]
and not result["python_requires"]
def read_from_pep517_hook(cls, directory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I think it might be great to retain the name for now. Since we are reading form the givent directory.

setup_call, body, "python_requires"
)
with pep517.envbuild.BuildEnvironment() as env, temporary_directory() as tmp_dir:
env.pip_install([cls.build_requires])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pep517 do something like python -m ensurepip when using pip commands?


if distribution.requires:
for record in distribution.requires:
requirements = record.split(";", 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simply use poetry.core.packages.dependency_from_pep_508(record) here?

except IndexError:
result["install_requires"].append(project_name)

egg_info = Path(directory).glob("*.egg-info")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this anymore correct? Since the metadata will be generated in the isolated environment?

"pendulum.tz.data",
"pendulum.tz.zoneinfo",
"pendulum.utils",
# "pendulum._extensions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because we use a truncated source in the fixture?

@sdispater
Copy link
Member

Thanks for the thorough and comprehensive work on this PR!

However, I think the purpose of the SetupReader is lost in this PR. The idea was to not execute the setup.py file to get the package metadata for security and consistency reasons (since a setup.py file is not necessarily deterministic).

But for projects using setuptools as a build backend getting the metadata will execute setup.py (see https://github.com/pypa/setuptools/blob/master/setuptools/build_meta.py#L158).

By going this route, we might end up losing performances (using the BuildEnvironment might be costly) and security for a marginal gain which is getting more accurate metadata for non deterministic setup.py files.

I am not too keen on executing anything just for the sake of getting metadata (even using the prepare_metadata_for_build_wheel does not make sense in a context of dependency resolution because it means downloading and using an external tool for something that should be static).

I know it's a tricky subject and not everyone agrees but, in my opinion, the build backend should only be used to build a wheel for installation and nothing else.

That being said I appreciate the work you put into it, don't get me wrong, and if someone can guarantee me that nothing will ever be executed by going this PR route, I am willing to do it. But for now I would prefer not to go this way, even if it follows the current packaging guidelines.

And, again, this is great work and I appreciate the time put into it.

And I am open for discussion :-)

@abn
Copy link
Member

abn commented Apr 14, 2020

@sdispater the ast parsing solution can still be retained. In our current implementation, as I understand it, we do execute the setup.py file before the AST parsing anyway using python setup.py egg_info. From what I can tell, using pep517, we get a more rounded solution here that could potentially handle more cases than we already do.

@sdispater
Copy link
Member

@abn This is true only for git and directory dependencies (since in this case we assume people know what they are doing). So if we remove the change from the Inspector class which is used by the LegacyRepository and PyPIRepository classes then I think it would be more acceptable.

The other thing that bothers me (even though I must admit it's already the case) is that it will build a new environment each time. A better solution would be to create a unique environment lazily that will be reused during the dependency resolution and cleaned up after.

I know the AST parser is flaky (I wrote most of it 😅) but it works reasonably well so we should try to rely on it as much as possible.

@abn
Copy link
Member

abn commented Apr 14, 2020

Ha, I agree that the AST parser is a decent fallback - we can obviously improve it as we go along. Considering we can cache this information for packages and that this would be the exception cases, I do not see the perf-hit to be considerable or unmanagable spinning up an isolated environment (we can always improve this later). I guess there are different views on that one.

Fwiw, if we can get better metadata, I feel that this or a similar solution might be worth it. It will also mean we can support non poetry PEP-517 source packages too (the ones without the metadata). Also, this discussion is relevant for #2301 as well.

@finswimmer
Copy link
Member Author

Hello @sdispater ,

I can understand your reservation about executing something to get the necessary and I would also prefer something static. That's why I started the discussion here

The current result of the discussion is, that using prepare_metadata_for_build_wheel is the intended way to get the data we need for a dependency resolution. I would be happy if we can go on in the discussion and can convince more people, that this is not the optimal way.

The biggest plus for my implementation I see in the fact that we can support other packaging tools aside poetry and setuptools.

Maybe we can use prepare_metadata_for_build_wheel as a fallback for now, if ast parsing isn't enough?

@abn: Thanks a lot for your code comments. I will go through them later.

@pradyunsg
Copy link
Contributor

I agree that using the AST parser, when the backend is setuptools, is a good idea. Maybe the setuptools backend should probably be doing these AST shenanigans instead, but that's not something under poetry's control, so it's definitely a good idea to use the AST parser for situations where it's pretty unambigous what the results would be.

For non-setuptools backends though, I do think it'd be good to have PEP 517 support in poetry. :)

@finswimmer finswimmer mentioned this pull request Jun 1, 2020
3 tasks
@sdispater
Copy link
Member

So, I agree with @pradyunsg: If the backend is setuptools we use the AST parser (unless all the information we need is in the setup.cfg file). Note that I am pretty sure we can further improve the parser.

For other backends, we use the appropriate PEP-517 hooks. I don't think we can use the pep517 library however. It uses in a few places sys.executable directly meaning we can't use it with the environment system we have in place in Poetry.

@abn abn mentioned this pull request Jul 4, 2020
2 tasks
@abn abn closed this in #2632 Jul 24, 2020
Copy link

github-actions bot commented Mar 1, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants