Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building from an sdist archive #1020

Closed
wants to merge 1 commit into from

Conversation

pombredanne
Copy link

It is now possible to call the cibuildwheel command line and point it
to a package source distribution archive tarball (sdist) instead of
only a directory.

Reference: #597
Signed-off-by: Philippe Ombredanne [email protected]

cibuildwheel/__main__.py Outdated Show resolved Hide resolved
cibuildwheel/__main__.py Outdated Show resolved Hide resolved
cibuildwheel/__main__.py Outdated Show resolved Hide resolved
@nsoranzo
Copy link
Contributor

@pombredanne Thanks for working on this, it's a feature I'd need for a wheel-building project and I was hoping to have time to implement it at some point.
I haven't looked into it much, but I was wondering if unpacking the sdist in a temporary directory would simplify things. You then could re-assign package_dir to the temporary directory, so wouldn't need to skip the search for pyproject.toml and maybe also the build backend works?

@pombredanne
Copy link
Author

@nsoranzo Thanks for the review!

I haven't looked into it much, but I was wondering if unpacking the sdist in a temporary directory would simplify things. You then could re-assign package_dir to the temporary directory, so wouldn't need to skip the search for pyproject.toml and maybe also the build backend works?

yeah, I have thought about it and decided to go simpler, but it would be fairly trivial to extract the tarball and rejoin the "package_dir" groove after this. This would make "build" work... though there are subtle things about an sdist layout that the "build" frontend may not recognize, chiefly the PKG-INFO METADATA-like file?

I am fine either way.

@@ -299,6 +299,13 @@ def build(options: Options, tmp_path: Path) -> None:
env=env,
)
elif build_options.build_frontend == "build":
if build_options.package_dir.is_file():
print(
"cibuildwheel: cannot build wheel from sdist with 'build'. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can and probably will be fixed inside build eventually. pypa/build#311

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone now... I am extracting a tarball.

pyproject_toml_path = Path(args.package_dir) / "pyproject.toml"
if pyproject_toml_path.exists():
return pyproject_toml_path
package_dir = Path(args.package_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we pull this out of the tar.gz? Or even zip, but I'm fine in following build and only allowing .tar.gz's.

(There's also the temporary dir idea - the only benefit of this I can immediately think of is just validation, but if the build tools support it (and build should at some point), this seems cleaner)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We possibly could move the reading of it, since you won't get a "path" out of an archive, but you can read the file.

Copy link
Author

@pombredanne pombredanne Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we pull this out of the tar.gz? Or even zip, but I'm fine in following build and only allowing .tar.gz's.

For now, I am only allowing tar.gz like build, and a proper sdist should have a well defined root directory.
Note that the latest code is a bit different.

@pombredanne pombredanne force-pushed the sdist-and-pure branch 2 times, most recently from 3cb0d27 to f951a34 Compare February 11, 2022 15:32
@pombredanne
Copy link
Author

@henryiii Thanks for your review.... I just pushed a vastly simplified version that extracts a tarball in place... much less churn and code involved with this one. This is extracting the tarball as suggested by @nsoranzo and I think as you suggested too.

package_dir=package_dir,
add_env={
# this shouldn't depend on the version of python, so build only CPython 3.6
"CIBW_BUILD": "cp36-*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably go with something a bit newer, just so we don't have to mess with this when 3.6 goes away (mid next year, I think, but still). 3.9 or something.

I'd also really like to see that --config-file=cibuildwheel.toml still works with this (I think so?), maybe this could be dropped into a toml file instead, and the equivalent of this passed in?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably go with something a bit newer, just so we don't have to mess with this when 3.6 goes away (mid next year, I think, but still). 3.9 or something.

I modeled this after the other tests for the pure wheel and it is using a 3.6 version. Therefore I thought that was the right way to process this and there must have been a reason for sticking to 3.6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was probably shiny and new at the time. 🤣 We can update them all at the same time, then, fine as is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was probably shiny and new at the time. rofl We can update them all at the same time, then, fine as is.

This was the intent in adopting the existing ways

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also really like to see that --config-file=cibuildwheel.toml still works with this (I think so?), maybe this could be dropped into a toml file instead, and the equivalent of this passed in?

can you elaborate what you mean a bit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there are no changes that relates to configuration. The process is simply extract and carry on with the regular flow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see --config-file=cibuildwheel.toml in a test. We should check that it's resolved relative to the cwd of the invocation, not to the sdist or a temp dir (if that's what's used).


extract_dir, _, _ = str(package_dir).rpartition(".tar.gz")
with tarfile.open(package_dir) as tar:
tar.extractall()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should with pass a path= argument here to extract in a temporary directory instead of the current working directory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deal for a proper sdist is that the root directory inside the tarball is the base of name of the archive (e.g., minus the tar.gz extension). Exacting in place means that an extract dir will be created alright.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsoranzo I am fine to create a temp directory but this will then need to be moved back after under the project dir which is the contract for the "package dir" argument so far. I am not sure trying to keep things elsewhere would help and this would require extensive surgery

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think a temp dir is better than extracting inplace. That way cibuildwheel isn't leaking its implementation into the filesystem. But you're right that it's a more substantial change. I'll have a think about the best way to go about this.

@henryiii
Copy link
Contributor

NotADirectoryError: [Errno 20] Not a directory: '/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-1/popen-gw1/test2/project/spam-0.1.0.tar.gz/pyproject.toml'

It is now possible to call the cibuildwheel command line and point it
to a package source distribution archive tarball (sdist) instead of
only a directory. The sdist must be a .tar.gz tarball and will be
extracted in place before being built.

Reference: pypa#597
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Author

@henryiii re:

NotADirectoryError: [Errno 20] Not a directory: '/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-1/popen-gw1/test2/project/spam-0.1.0.tar.gz/pyproject.toml'

This has been fixed.

print(msg, file=sys.stderr)
sys.exit(2)

extract_dir, _, _ = str(package_dir).rpartition(".tar.gz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use str.removesuffix() yet, but this still looks simpler to me:

Suggested change
extract_dir, _, _ = str(package_dir).rpartition(".tar.gz")
extract_dir = str(package_dir)[:-len(".tar.gz")]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the code here and below seems to make some assumptions that are not guaranteed, namely:

  • that the tarball is located in the current directory (can be easily fixed by a os.path.basename())
  • that the tarball contains a directory with the same name as the archive (minus the .tar.gz)
  • that the tarball doesn't have files that will be created outside of the c.w.d. (see the warning in https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractall )

I think we could at least verify that the tarball is "well-behaved" by calculating the common prefix directory of the members of the archive: if there is one, it can replace extract_dir here; otherwise, we just bail out.

Some inspiration can be taken e.g. from https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/unpacking.py


extract_dir, _, _ = str(package_dir).rpartition(".tar.gz")
with tarfile.open(package_dir) as tar:
tar.extractall()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think a temp dir is better than extracting inplace. That way cibuildwheel isn't leaking its implementation into the filesystem. But you're right that it's a more substantial change. I'll have a think about the best way to go about this.

Linux. Default: the working directory.
Path to the package that you want wheels for. Must be either a
a subdirectory of the working directory or a source distribution
(sdist) tar.gz archive file that will be extracted in-place.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted in-place

Seems surprising for a user, I think. I wonder if we can improve this...

package_dir=package_dir,
add_env={
# this shouldn't depend on the version of python, so build only CPython 3.6
"CIBW_BUILD": "cp36-*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see --config-file=cibuildwheel.toml in a test. We should check that it's resolved relative to the cwd of the invocation, not to the sdist or a temp dir (if that's what's used).

Comment on lines +8 to +9
sdist_filename = "spam-0.1.0.tar.gz"
sdist = Path(__file__).parent / sdist_filename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than including a tar.gz in the repo, I'd rather build this within the test, so we can all see what's actually being tested by reading the code.

@joerick
Copy link
Contributor

joerick commented Feb 22, 2022

Ah, this is actually more tricky to build in than I initially thought! In my opinion, the best way to do this is to build in a temporary directory, with the CWD set as the sdist root, so pyproject.toml is at ./pyproject.toml. Having CWD at the sdist root is best, because 1) we can expand the tarball to a tmpdir, and 2) pyproject.toml might contain commands like before_build = "sh scripts/prepare.sh", and these should be executed relative to that location.

But, this change to the CWD makes me a bit uncomfortable, because it's changing the behaviour of all options that contain paths, because their cwd is not resolved relative to the cwd of the invocation. And, I'm also not super keen on making __main__ more complicated for the sake of this feature.

I'm leaning towards a different solution, which doesn't actually modify cibuildwheel, but gives us another, separate utility. So instead of calling cibuildwheel my-package.tar.gz, you'd call cibuildwheel-from-sdist my-package.tar.gz.

usage: cibuildwheel-from-sdist [-h] [--output-dir OUTPUT_DIR] 
                               [--config-file CONFIG_FILE]
                               [cibuildwheel-args...] sdist

Build wheels from an sdist archive.

Extracts the sdist to a temp dir and calls cibuildwheel on the resulting
package directory. Note that cibuildwheel will be invoked with its working
directory as the package directory, so options aside from --output-dir
and --config-file are relative to the package directory.

positional arguments:
  package               Path to the sdist archive that you want wheels for.
                        Must be a tar.gz archive file.
  cibuildwheel-args...
                        Any further arguments will be passed on to cibuildwheel.

optional arguments:
  -h, --help            show this help message and exit
  --output-dir OUTPUT_DIR
                        Destination folder for the wheels. Default: wheelhouse.
  --config-file CONFIG_FILE
                        TOML config file. Default: "", meaning
                        the pyproject.toml inside the sdist, if it exists.

I think this would be a lot less complexity for cibuildwheel itself to manage, and clearer API for users too.

Would this fit your use case @pombredanne and @nsoranzo ?

@henryiii
Copy link
Contributor

My longer-term hope would be that pypa/build would pick up the ability to build from SDist too (pip wheel already supports it), and I'd hope that then cibulidwheel could just support it naturally - the only thing we'd need to add is the ability to look inside for the pyproject.toml, which doesn't even need to be unpacked on disk, just memory.

Having a new command will not work well with either the GitHub Action, or pipx, where you'll have to run something like pipx run cibuildwheel-from-sdist --spec cibuildwheel to make it work. I'm a strong -1 on that design. That's basically already how https://github.com/astrofrog/autowheel worked.

In the interim, is there really that much of a problem just manually unpackaing an sdist into the current directory and then calling cibuildwheel? Simpler, composable commands vs. complicated integrated ones, and all that? It's just an extra command you'd need to stick before the GitHub action or pipx call, however you are running cibuildwheel. Might be a hair harder to make work on windows, but probably not terrible.

@joerick
Copy link
Contributor

joerick commented Feb 22, 2022

My longer-term hope would be that pypa/build would pick up the ability to build from SDist too (pip wheel already supports it), and I'd hope that then cibulidwheel could just support it naturally

Still, the issue of broken paths in options persists here. Pulling options from pyproject.toml and executing them from outside the sdist makes very little sense to me.

Having a new command will not work well with either the GitHub Action, or pipx, where you'll have to run something like pipx run cibuildwheel-from-sdist --spec cibuildwheel to make it work.

This seems like a pretty minor concern to me, this wouldn't be a mainstream feature, so I don't mind if users need to have a slightly longer command line invocation to do it.

@henryiii
Copy link
Contributor

If you are running from the SDist, it's quite unlikely you have commands in your pyproject.toml for cibuildwheel. If you support cibuildwheel, you'd probably not build separately, and if you did, you'd want your configuration separate too - otherwise, you would not be able to edit it while you are working on making a release. The main thing to pull from it would be the python requires settings (and even then we could just not support any discovery of settings when running on a SDist, requiring everything to be in a local config file)

@joerick
Copy link
Contributor

joerick commented Mar 21, 2022

If you are running from the SDist, it's quite unlikely you have commands in your pyproject.toml for cibuildwheel.

I'm not so sure. I think that building from sdist could open up some interesting opportunities longer-term. I'm not sure what exactly, but when I think about a process like-

┌─────┐         ┌─────┐
│sdist│────┬───▶│wheel│
└─────┘    │    └─────┘
           │    ┌─────┐
           ├───▶│wheel│
           │    └─────┘
           │    ┌─────┐
           ├───▶│wheel│
           │    └─────┘
           │    ┌─────┐
           ├───▶│wheel│
           │    └─────┘
           │    ┌─────┐
           └───▶│wheel│
                └─────┘

it opens up new possibilities in the process. I'm thinking about things like... a company audits the sdist, then builds all their wheels in a system they trust. Or verifiable builds in general. Maybe one day there's a way to prove that wheels on PyPI come from a specific tarball.

@nsoranzo
Copy link
Contributor

nsoranzo commented Apr 7, 2022

If you are running from the SDist, it's quite unlikely you have commands in your pyproject.toml for cibuildwheel. If you support cibuildwheel, you'd probably not build separately, and if you did, you'd want your configuration separate too - otherwise, you would not be able to edit it while you are working on making a release. The main thing to pull from it would be the python requires settings (and even then we could just not support any discovery of settings when running on a SDist, requiring everything to be in a local config file)

Sorry for the late reply on this discussion. Just wanted to mention that use case is indeed building wheels for third-party packages that don't distribute them on PyPI, and put them on our wheel server. So for this use case the cibuildwheel configuration for each package will definitely be outside the package itself.

@henryiii henryiii mentioned this pull request Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants