Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build steps? #119

Closed
takluyver opened this issue Jul 11, 2017 · 35 comments
Closed

Build steps? #119

takluyver opened this issue Jul 11, 2017 · 35 comments

Comments

@takluyver
Copy link
Member

This is another speculative issue for discussion, not something that's necessarily going to be implemented. The question is whether & how to support build steps to run when producing a wheel.

Should we?

So far, I have always said that flit is a simple tool for simple packages, i.e. those with no build step. I think these are the vast majority of packages we publish, and the simplicity argument is still an important one against adding such features. I also don't know much about compilers, which are one of the most common sorts of build steps.

The best argument I see in favour is that you may start packaging something with flit, then add something that requires a build step. With no support for that, you have to throw away your packaging metadata and workflows to switch to another tool.

I expect that the majority of people looking at this issue will be those who want such features, so I'm not going to pay too much attention to +1s.

How would it work?

Briefly, the wheel build would copy files (the files in the package and... some others?) to a temporary location, and invoke external tools there to do the build.

It would need some way to figure out the compatibility effects of the hooks:

  • Some do not restrict compatibility (e.g. minifying JS code)
  • Some restrict compatibility towards the platform they're run on (e.g. standard compilation)
  • Some restrict compatibility towards a specific platform regardless of where they're run.
  • Some can be asked to target a particular platform (cross compilation)

It would also need to elegantly combine multiple build hooks - e.g. if you have a C extension module and a Qt designer .ui file to compile, it should be possible to specify those as two independent build steps, without either needing to be aware of the other. Or maybe this composition should be the responsibility of a proper build system which we invoke.

@dholth
Copy link
Member

dholth commented Jul 11, 2017

Not rocket science to convert your metadata format to mine either. Or a standardized toml metadata based on setup() arguments could be used. Or flit could provide a service to write the dist-info folder and perhaps the manifest, and the build system could do everything else. Or flit could provide useful command line tools like 'publish' and the build step could do almost everything else including writing the wheel.
Tricky part would be communicating which files had been built and where they go between flit and build tool. Collaboration could be "write your .so to a new folder that is zipped up to be a wheel", build step may or may not do the .py copy itself, and flit could handle slightly tricky issue of wheel manifests and metadata.
https://bitbucket.org/dholth/enscons/src/f31e60b7e29a9dc1bc4e314b7457727e5619db9e/pyproject.toml?at=default&fileviewer=file-view-default

@dholth
Copy link
Member

dholth commented Jul 11, 2017

Also enscons doesn't really require the metadata to be in toml, it's all dicts internally.

@takluyver
Copy link
Member Author

Yeah, there could be a flit2encsons tool which would help people upgrade to the more flexible packaging when they need it. :-)

@techtonik
Copy link

I expect that the majority of people looking at this issue will be those who want such features, so I'm not going to pay too much attention to +1s.

I am actually looking to see an SVG diagram of those "build steps" whatever they are.

@takluyver
Copy link
Member Author

I'm not quite sure what you mean. The discussion is about supporting whatever build steps people might want (compiling, cython, JS minification...), not writing any specific build steps.

@dholth
Copy link
Member

dholth commented Jul 12, 2017

The cargo build script is a decent example. http://doc.crates.io/build-script.html

Cargo runs the script, the script does anything it wants to, and then cargo goes back to its own strength which is building Rust packages. Seems analogous to the place flit wants to occupy.

@techtonik here are a couple of examples of non-C-compiler build steps, with ascii art instead of SVG.

My pysdl2-cffi wrapper has a build dependency tree excerpted in part here, printed with SCons --tree=all:

    +-sdl/ttf.py
      +-sdl/__init__.py
      | +-builder/__init__.py
      | +-builder/build_image.py

ttf.py needs to be generated after sdl/init.py and they both depend on the contents of a special purpose code generator in builder/*.py (since editing the code generator is a big part of doing the project). The build system knows how to run the code generator to transform the inputs to the output by traversing this graph.

  +-pysdl2_cffi.egg-info
  | +-pysdl2_cffi.egg-info/PKG-INFO
  | | +-pyproject.toml
  | +-pysdl2_cffi.egg-info/entry_points.txt
  | | +-pyproject.toml
  | +-pysdl2_cffi.egg-info/requires.txt
  |   +-pyproject.toml

The .egg-info directory is built by generating PKG-INFO, entry_points.txt and requires.txt, which depend on the contents of pyproject.toml.

@takluyver
Copy link
Member Author

Yup, the cargo build script is the sort of thing I'm thinking of.

@ncoghlan
Copy link
Member

One option to consider would be to re-use PEP 517 and delegate the entire wheel building step (and get_requires_for_build_wheel), while leaving flit in control of the sdist generation and other aspects of the project's local development experience.

This would align with Daniel's suggestion of "flit could provide useful command line tools like 'publish' and the build step could do almost everything else including writing the wheel."

@takluyver
Copy link
Member Author

One option for the 'build script' model is for flit to allow build steps which don't limit platform compatibility, like minifying Javascript, or generating Python UI code. But you would still need to use another tool if you have build steps like compiling native code, where the result is less portable than the source.

I'm still not sure whether this is worth the added complexity, though.

@techtonik
Copy link

There a plenty of build systems out there. Why not to make flit a plugin for those tools? For example, https://github.com/SCons

The reason for that is to increase transparency into what flit does. Integrating with other build systems will show which things are reinvented, which things are really hacks, and which best practices can be reused.

@takluyver
Copy link
Member Author

I definitely don't want to reinvent a build system; I'm certainly not qualified to tackle that, and as you say, there are plenty of build systems out there. What I wonder about is whether we can (and should) make things easier for people who are going from a pure Python package with no build step to a package that maybe bundles some minified Javascript, or compiles a Cython file. It seems rough to make them throw away their flit packaging and start again with another tool, as we currently do.

If you want to integrate building Python packages with Scons, see @dholth's enscons. I have no plans to turn flit into that kind of tool, but I'd be interested to hear if you make any other such integrations with build systems.

@takluyver
Copy link
Member Author

I'm warming slightly to the idea of a cargo-style build script: Flit would make the sdist, unpack to a temporary directory, and then run the build script there before packing up a wheel. In theory, this isn't much extra complexity, but... what does theory know?

Other complexities that have occurred to me:

  • Caching: should flit attempt to provide any support for caching, so that the build step doesn't have to do 100% of the work every single time? Or is it entirely up to whatever tools you're invoking to figure out caching if they need to?
  • Build steps that want to deal with the VCS: this won't work from the unpacked tempdir. I don't think packages should need VCS info, but there may be a desire to do things like embedding a commit ID in the package to precisely identify the version.

@ncoghlan
Copy link
Member

ncoghlan commented May 7, 2018

For the latter concern, you may want to define a set of environment variables that flit makes available to the build script, and have one of them be FLIT_VCS_DIR.

That could also work if you receive requests for help with managing artifact caching - pass in a FLIT_CACHE_DIR setting. However, it's probably simpler to just let tools manage their own caches (e.g. someone wanting to speed up C/C++ builds is going to get more benefit out of setting up ccache than they are from learning a flit-specific artifact caching regime)

@ipmb
Copy link

ipmb commented May 7, 2018

One word of caution about copying things to a temporary directory for building... it takes several minutes for projects with large git histories, lots of node_modules, etc. Pip suffers from this: pypa/pip#2195

@takluyver
Copy link
Member Author

I'd copy only the current VCS checkout - no history, no files that aren't version controlled - so the speed hopefully wouldn't be an issue. Of course, relying on and working with VCSs brings its own set of problems, but I've already chosen that path for building sdists.

It would need a fallback path for when there's no VCS info available, though; either because it's not a VCS checkout, or because the VCS software is not found. In that case, it would probably have to copy all the files and hope.

@ipmb
Copy link

ipmb commented May 7, 2018

It's probably a tradeoff. Thinking of an example webapp, either you copy node_modules or count on the build script to run npm install or whatever to recreate them. Both are slow, but a build script gives the developer more control to optimize the process.

@takluyver
Copy link
Member Author

Yup, precisely. And a build script is also more predictable, because we can be sure that it was run on these files, not the files as they were two weeks ago.

Build tools can also cache data outside the package directory. I don't think npm does, but if a build script used pip install --target to bundle other packages, it would benefit from pip's caches.

@techtonik
Copy link

Because of read or because of write?

If flit could detect if tmp is memory based - can this speed things up? https://superuser.com/questions/45342/when-should-i-use-dev-shm-and-when-should-i-use-tmp

@dholth
Copy link
Member

dholth commented May 8, 2018 via email

@pradyunsg
Copy link
Member

It might be worth exploring 2 builds -- one before sdist, one before wheel.

I was thinking about Cython modules. Usually, one would want to compile the Cython code to C code before creating an sdist. And then before creating a wheel, compile the C files to libraries.

@takluyver
Copy link
Member Author

I know a lot of people do things like that, but I think it's a leftover from the days before wheels, when we needed to make it easy to install from sdist. Now that we have both wheels and a good way to specify build requirements, I don't think this is necessary: sdists should be pure source, minus generated files. So any build steps would run as part of generating wheels.

If I do this, it will probably only handle build steps that produce platform-independent wheels, at least for the first version. So it would work for things like bundling Javascript, but not for building Cython modules.

@pradyunsg
Copy link
Member

Sounds fair to me. :)

@neumond
Copy link

neumond commented Jun 13, 2018

I have very complicated compiling workflow for one of my packages: first step is code generation, second is launching a virtual machine with different OS to actually compile binaries, then I use setuptools+wheel to pack wheels. I build binaries manually and "attach" them to a package using package_data.

I can't ship source dists, they would be absolutely useless, only wheels, it would be helpful to disable sdists in flit config entirely. In my case it is far better to prevent people installing fallback source dists and complaining why the package doesn't work for them.

Another thing to note is managing tags for several wheels. Currenly I have to edit setup.cfg every time I build wheels:

[bdist_wheel]
python-tag = cp36  ; cp35 cp37 etc
plat-name = win32  ; win64 etc

What I expect from flit is just making set of wheels (for different pythons and platforms) and uploading them to pypi, given set of binaries I want to "attach". It would be helpful if flit could make some validation, e.g. remind me if I didn't (re)build something to prevent uploading incorrect wheels (python hook for flit?).

Variant A

[tool.flit.binaries]
disable-sdist = true
python-tag = ["cp35", "cp36", "cp37"]
plat-name = ["win32", "win64"]
exclude = [
  { python-tag = "cp35", plat-name = "win64" },
]
paths = []
check-binary-hook = "project:check_binary"
def check_binary(binary_path):
    s = binary_path.stat()
    if not s.st_size:
        return False  # have to rebuild
    if time.time() - s.st_mtime > 300:
        return False
    return True

Variant B

[tool.flit.binaries]
disable-sdist = true
wheel-list = "project:wheel_list"
file-mapper = "project:file_mapper"
file-checker = "project:file_checker"
def wheel_list():
    for python_tag in ('cp35', 'cp36', 'cp37'):
        for plat_name in ('win32', 'win64'):
            if python_tag == 'cp35' and plat_name == 'win64':
                continue
            yield python_tag, plat_name

def file_mapper(python_tag, plat_name):
    folder = Path('{}_{}'.format(python_tag, plat_name))
    yield folder / 'module.pyd'
    yield folder / 'submodule.pyd'

def file_checker(binary_path):
    # probably instead checker we could define content function
    # returning chunk of bytes
    # e.g. for fetching binaries as artifacts from CI servers
    s = binary_path.stat()
    if not s.st_size:
        return False  # have to rebuild
    if time.time() - s.st_mtime > 300:
        return False
    return True

The goal of PEP517/PEP518 is making things more automatized for end user/developer, making less assumptions of build time requirements. But these PEPs work entirely at python side, they cannot pull compilers, visual studio, virtual machines with different OSes, external binary packages and third party applications, e.g. blender, firefox, even build systems like make/scons/cmake. It is unavoidable for developers to install all these packages manually following some guide in project's readme.

Before wheels pip could compile some packages for you from source dists. It requires installed compiler, python headers, may be cython, etc. But pip fails if your package depends on something external, like blender headers: you have to install prerequisites first, then pip install again. I think you can't automatize all possible building workflows and it is unavoidable to add ability of including raw binary files to wheels.

@takluyver
Copy link
Member Author

I have very complicated compiling workflow for one of my packages...

Flit is probably not going to be the right tool for you, then. Even if we add build steps, it's focused on the simple use cases, not the complicated ones. You might want to look at a system like encsons instead.

But these PEPs work entirely at python side, they cannot pull compilers, visual studio, virtual machines with different OSes, external binary packages and third party applications, e.g. blender, firefox, even build systems like make/scons/cmake.

The PEPs don't specify a mechanism for that, but there's no reason that a build backend complying with these PEPs couldn't download and run a VM or a container to do the build. In fact, I'd be surprised if no-one makes a docker build backend for PEP 517.

@neumond
Copy link

neumond commented Jun 13, 2018

Probably I wrote an overloaded post, but for me (regarding changes only in flit) it's quite enough to have:

  1. Ability to disable sdists entirely
  2. Ability to inject arbitrary files into wheels (package_data-alike)
  3. Ability to produce several wheels for different architectural tags (flit build and I have multiple files in dist folder)

I have no plans to distribute buildable sdists with scripts, makefiles and docker.

@dholth
Copy link
Member

dholth commented Jun 13, 2018

@neumond you might actually be part of the small target "market" for enscons. It provides a few tools for building a wheel, which can contain anything, as a result of arbitrarily complex build steps. You don't have to include the sdist target if you don't want one. There is also the pre-enscons approach taken by https://bitbucket.org/dholth/sdl2_lib/src/default/, a library that only packages sdl2 dlls in a wheel and cannot have a reasonable sdist, which just implements a standalone wheel in a 130 line waf script.

@neumond
Copy link

neumond commented Jun 15, 2018

@dholth Nice idea of building on top of scons, but for me scons is kinda counter-intuitive, I spend too much time searching APIs for every single function I want to use. Have you considered building on top of Waf?

UPD: oh, sorry, just realised your links above point exactly to waf-built wheels.

@ncoghlan
Copy link
Member

ncoghlan commented Jun 16, 2018

@neumond In addition to being useful in its own right for folks that like Scons, enscons is considered an illustration of the principle of adapting an existing full-featured build system to the Python ecosystem's build system interface expectations, rather than writing a fresh build system from scratch. The entire purpose of the PEP 517/518 build system abstraction layer is to make that easier to do.

However, this aspect of the discussion has now shifted to be entirely off-topic for flit's issue tracker - this issue is about asking whether or not there's a useful middle ground between flit supporting full PEP 517 style build delegation to arbitrary build backends, and flit not supporting build steps at all.

@neumond
Copy link

neumond commented Jun 16, 2018

@ncoghlan Right now I feel the need to have abstract wheel assembling library, not tied with flit and enscons. It looks like wheel package should do this, but I couldn't find clear and easy to use API there, which would clarify wheel contents & metadata, enforce constraints, warn on metadata contradictions (for humans™ ☺). Nevertheless I find wheel's source code very helpful at filling gaps of PEP425 and 427.

@FRidh
Copy link

FRidh commented Nov 5, 2019

What I wonder about is whether we can (and should) make things easier for people who are going from a pure Python package with no build step to a package that maybe bundles some minified Javascript, or compiles a Cython file. It seems rough to make them throw away their flit packaging and start again with another tool, as we currently do.

I started a thread about using Meson as build-system https://discuss.python.org/t/should-python-packaging-aim-for-meson-as-build-system-in-case-of-extension-modules/2579 which would function just like enscons.

See e.g. https://github.com/FRidh/mesonpep517examples/blob/master/pyproject.toml and the meson.build file in the root and subfolder. The pyproject.toml looks a lot like flit already and I think the steps that are needed to use meson here are small. We just need more and better examples probably.

@pradyunsg
Copy link
Member

pradyunsg commented Sep 6, 2020

I've now got a project that fits the use case of "minifying the JS/CSS". Given that I now do have experience with the use case, here's what I can say we'd need for that usecase.

  1. Specify a build script. I want to run npm install then gulp build.
  2. Need to include the minified files (which are NOT tracked in version control)
  3. Should exclude the non-minified files (which are tracked in version control)

I'm imagining a new tool.flit.wheel table, that has keys for each of these (following tool.flit.sdist):

[tool.flit.wheel]
build = "build.py"
include = ["src/projectname/scripts/bundle.min.js", "src/projectname/styles/bundle.css"]
exclude = ["src/projectname/assets/*"]

With this, I can then have a build.py file in the root of the repository like:

from subprocess import run

run(["npm", "install"], check=True)
run(["gulp", "build"], check=True)

I do think there's a few not-yet-decided-on items, so... well, here's my take on them: :P

file ↓ | build key → specified not specified
exists run given ???
not exists fail status quo

@takluyver do you think this looks like a reasonable overview of the approach that flit could take for this?

@pradyunsg
Copy link
Member

Another option is the build key be a table instead, that has an optional "requires" key whose value is returned by https://www.python.org/dev/peps/pep-0517/#get-requires-for-build-wheel:

build = { script = "build.py", requires = ["pylibsass"] }

If we make the requires key optional, we could also get away with:

build.script = "build.py"

which reads really nicely IMO. :)

@pradyunsg
Copy link
Member

pradyunsg commented Sep 15, 2020

@takluyver gentle nudge If you have express any interest in this, I'm happy to put up a PR for the above. :)

@takluyver
Copy link
Member Author

With the coming of PEP 621 - implemented but undocumented in Flit 3.2, hopefully to be documented once people have kicked the tyres a bit - I'm leaning towards saying no to build steps. I originally wrote in the issue description that:

The best argument I see in favour is that you may start packaging something with flit, then add something that requires a build step. With no support for that, you have to throw away your packaging metadata and workflows to switch to another tool.

PEP 621 should make the packaging metadata portable between build tools, so it will be easier to switch from Flit to something like enscons or setuptools, assuming they also get support. It probably won't be entirely seamless - changing build processes never is - but it should avoid having to rewrite the same metadata in a slightly different format. And that metadata can be most of what you give Flit.

I've also come to appreciate that building software is a massive, complicated topic, and I would rather not be responsible for a build system. We could limit the scope to make it an easier problem, e.g. only allowing build steps which don't affect platform compatibility, like minifying JS. But I think there are relatively few use cases like this between the much more common ones where there's either no build step at all, or we want to build native code for a specific target platform. So I don't think there's much value in adding build steps without support for the complex cases.

So I'm planning to close this issue in a week or two, unless someone makes a compelling argument in favour of supporting build steps despite PEP 621. Of course, that won't mean it's written in stone - we can always revisit the question if circumstances change or if there's a brilliant new idea for how to implement it.

@takluyver
Copy link
Member Author

Closing for the reasons described in my previous comment. Maybe we'll revisit this some day in different circumstances, but for now, I think the drawbacks outweigh the advantages, and I don't think it's useful to have questions like this hanging in uncertainty indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants