Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace setup.py with pyproject.toml #9021

Merged
merged 61 commits into from
Apr 20, 2023
Merged

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Apr 11, 2023

Closes #8090

  • Create pyproject.toml
  • Implement a custom build backend (see below) in packager directory. Build logic from setup.py has been refactored and migrated into the new backend.
  • Tested: pip wheel . (build wheel), python -m build --sdist . (source distribution)

TODOs

  • Update CI to use new build commands.
  • Write a doc for building wheels and sdists. Also show how to set CMake args:
pip install -v . --config-setting use_system_libxgboost=True
  • Handle CMake args in packager/pep517.py.
  • Test editable installation.
  • Add typing annotation to packager.

Context

  • Directly calling setup.py for installing the Python package is deprecated. python setup.py install will throw a warning:
SetuptoolsDeprecationWarning: setup.py install is deprecated.
Use build and pip and other standards-based tools.
  • The new replacement for setup.py is PEP 517 and PEP 621.
    • PEP 621: The package's metadata is now specified in pyproject.toml. The TOML file is declarative and cannot contain arbitrary code unlike setup.py.
    • PEP 517: It is now possible to specify a build backend in pyproject.toml:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
  • Users now can choose between a variety of build tools: setuptools, poetry, hatch etc. In addition, individual Python packages can specify a custom (bespoke) build backend by specifying backend-path:
[build-system]
requires = ["packaging"]
backend-path = ["xgboost"]
build-backend = "packager.pep517"

A build backend implements two methods:

  • def build_wheel(wheel_directory, config_settings=None, metadata_directory=None)
  • def build_sdist(sdist_directory, config_settings=None)

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 11, 2023

@jameslamb You might be interested in this work.

@jameslamb
Copy link
Contributor

jameslamb commented Apr 11, 2023

Thanks for tagging me! I'm very interested in this and happy to provide a review or answer questions if you want some extra help.

For reference, in LightGBM I'm pursuing using scikit-build-core instead of having a custom build backend: microsoft/LightGBM#5759. But I don't know much about the custom build backends. I'll read through the approach you took here.

@vyasr
Copy link

vyasr commented Apr 11, 2023

Some thoughts here:

  • While PEP 517 does not specify editable installs, PEP 660 does. This is the standard that you'll want your custom backend to use if we move forward with a custom backend.
  • I would strongly recommend trying to use existing backends rather than rolling your own. Maintaining a separate backend may be expedient now, but in the long run it's likely to cause more headache than it's worth.
  • I suspect that the main reason you decided to implement a custom backend is because you weren't able to get the C++ code into the sdist, please correct me if I am wrong. If that is the case, the problem I have previously observed in this situation is that the C++ code is not contained in a subdirectory of the Python package but is instead at the same level in the directory tree, and AFAIK the Python build process does not allow you to include directories above the current dir. It may be possible to fix this by some technique involving symlinking the cpp directory into python-package, but I know that when I tried this with rmm I wasn't able to get it to respect a simple symlink. However, I just tried an alternate strategy with rmm (which currently uses scikit-build) and this worked:
    • Create an sdist using python -m build -s
    • Manually copy important files from the project root into the tarball. In the case of rmm, this is the CMakeLists.txt file for building the C++ code, the cmake directory containing helper files, and the include directory containing the actual C++ headers
    • Modify the one line in the Python CMakeLists.txt file containing the relative path to the C++
    • Run pip install
  • The above strategy could be automated using a custom backend that implements build_sdist by calling scikit-build-core's build_sdist and then runs the necessary tar append commands.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, is it possible that we split the XGBoost package into two like what the conda forge package does? One as libxgboost and another one as (py)xgboost.

CMake is now on pypi, which has two implications. Firstly it's a C++ project with only a CLI interface. The mentioned libxgboost would be similar: a C++ project with only a C interface. Secondly, I am not sure if there is a distinction between build-time and run-time dependency for toml projects. If so then we can have cmake as a build-time dependency for the source distribution.

Lastly, it's a little bit weird that the packaging logic might be installed and a user can import it. Not really an issue, just feels unnature.

@jameslamb
Copy link
Contributor

I am not sure if there is a distinction between build-time and run-time dependency for toml projects

There is! This is one of the nice benefits of the new backends... you don't have to have something like setuptools or wheel as a runtime dependency.

Also on the topic of CMake being available on PyPI, two very important things to consider:

  1. anyone running pip install cmake on a platform for which there isn't a CMake wheel is going end up compiling it from source, which can be really painful and definitely more difficult than installing it from package managers with broader platform support like brew on macOS, apt on various Linux flavors
  2. because of that, scikit-build-core has the behavior "if an already-installed cmake is found on PATH, use that... otherwise, install the one from PyPI"

@vyasr
Copy link

vyasr commented Apr 12, 2023

Secondly, I am not sure if there is a distinction between build-time and run-time dependency for toml projects.

It's worth noting that this is in fact one of the motivating reasons for the creation of the modern build infrastructure, namely PEP 517 as previously mentioned as well as PEP 518, which defines the build-system part of pyproject.toml. The canonical issue this was designed to fix is a package that requires numpy and/or Cython to be installed in order to run setup.py in order to provide numpy headers (np.get_include) or Cythonize files, respectively. Quoting the PEP:

But when a project chooses to use setuptools, the use of an executable file like setup.py becomes an issue. You can’t execute a setup.py file without knowing its dependencies, but currently there is no standard way to know what those dependencies are in an automated fashion without executing the setup.py file where that information is stored. It’s a catch-22 of a file not being runnable without knowing its own contents which can’t be known programmatically unless you run the file.

The build-system section of pyproject.toml allows specifying the requirements for the build. PEP 517 builds occur in a self-contained virtual environment where all build requirements are installed. Run requirements are a separate list that is installed into the user's environment along with the built package after a build is complete (when using pip install . to build locally) or a pre-built binary is found (pip install ${pkg}). The contained build is analogous to how conda builds work (in venvs instead of conda envs).

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 12, 2023

@vyasr Thanks for your input. I will try to remove boilerplates from the custom backend by re-using logic from other backends.

@trivialfis I suspect that libxgboost-only package on PyPI will not be terribly useful. The reason is that different virtual environment managers for Python would install libxgboost.so in different locations, so it will be difficult for other packages to link with libxgboost. One of the main benefit of Conda is that native libs get installed in standard locations, so that other packages can easily find them.

@trivialfis
Copy link
Member

It's not necessary to be useful, just a way for us to split c++ packaging from the Python package.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 12, 2023

@trivialfis Let's not split the packaging. It will introduce headaches, since different virtual env managers for Python install native libs in different locations. We don't want to make libpath.py even more complicated than it is now.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 13, 2023

@vyasr I managed to use hatchilng backend (with some customization). As a result, I was able to remove the boilerplate code, such as the metadata generation. Now XGBoost's build backend only contains the necessary customization. Thanks for your advice!

@trivialfis The packager/ directory has been removed from the xgboost/ directory, so we don't distribute the packaging logic in the XGBoost wheel.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 14, 2023

@trivialfis Editable installation is now supported. Run: pip install -e .

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 15, 2023

I am removing vcomp140.dll from the XGBoost wheel. I just tested the wheel on a minimal installation of Windows, and import xgboost fails with

Error message(s): ["Could not find module 'C:\Users\Administrator\Downloads\foobar\lib\site-packages\xgboost\lib\xgboost.dll' (or one of its dependencies). Try using the full path with constructor syntax."]

I tried a couple things to salvage the error, but none worked.

  • Preload dependent DLLs using the approach taken by scikit-learn. The approach only works with compiled extension but not vanilla C lib that's loaded via ctypes.cdll.LoadLibrary().
  • Use delvewheel to patch the wheel. The import still failed.
  • Pass winmode arg to ctypes.CDLL. No avail.
  • I tested both the new wheel and 1.7.5 version. All failed.

I regret adding insert_vcomp140.dll without thoroughly testing it on a truly fresh Windows installation. My bad.

Conclusion: Let's not bundle vcomp140.dll at all, since it doesn't work. Instead, let's tell Windows users to install Visual C++ Redistributable package. This is what LightGBM does. Alternatively, they can use Conda which handles this automatically.

@trivialfis
Copy link
Member

note to myself: need to update the release script.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 15, 2023

@trivialfis I updated dev/release-artifacts.py. Can you take a look?

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me. Thank you again for working on this! I'm not entirely sure how all the tools interact with each other. Seems complicated as there's code generation during build/install. Would be great if others can take a look as well.

tests/ci_build/lint_python.py Outdated Show resolved Hide resolved
Copy link

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I'm not too familiar with hatch, so I needed to dig a bit through that backend to review some of the more internal components, but this approach seems pretty reasonable to me.

doc/contrib/python_packaging.rst Outdated Show resolved Hide resolved
doc/contrib/python_packaging.rst Show resolved Hide resolved
doc/contrib/python_packaging.rst Show resolved Hide resolved
doc/install.rst Show resolved Hide resolved
python-package/packager/build_config.py Outdated Show resolved Hide resolved
python-package/packager/pep517.py Show resolved Hide resolved
python-package/packager/pep517.py Show resolved Hide resolved
python-package/packager/nativelib.py Outdated Show resolved Hide resolved
tests/buildkite/build-cpu-arm64.sh Show resolved Hide resolved
python-package/packager/pep517.py Outdated Show resolved Hide resolved
Copy link

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! There are a couple of outstanding comments, but almost everything has been addressed and nothing is left that I would block merging on. Great to see this work here!

doc/contrib/python_packaging.rst Outdated Show resolved Hide resolved
Comment on lines 149 to 153
if locate_local_libxgboost(TOPLEVEL_DIR, logger=logger) is None:
raise AssertionError(
"To use the editable installation, first build libxgboost with CMake. "
"See https://xgboost.readthedocs.io/en/latest/build.html for detailed instructions."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what happens if you remove this? Is it possible to use an editable installation that also builds libxgboost? In that situation running pip install -e would need to rebuild the library as well.

Not saying this feature should be implemented, just want to understand the current setup of the backend to figure out how best to work with this in the future for other RAPIDS libraries too.

Copy link
Collaborator Author

@hcho3 hcho3 Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if you remove this?

Editable installation won't build libxgboost.so, so import xgboost will error out.

Is it possible to use an editable installation that also builds libxgboost?

I decided against it, as there is no reliable way for packager module to detect changes in the C++ code. If we want to re-build libxgboost.so upon changes made to C++, scikit-build-core will probably be a better choice.

I ultimately decided against scikit-build-core because:

  • It doesn't yet support editable installation.
  • I couldn't customize it enough to fit the current use cases of XGBoost.
  • Much of what scikit-learn-core offers is not useful for XGBoost, since XGBoost does not use compiled extensions. XGBoost uses ctypes exclusively to access C API functions from libxgboost.so.

The calculus will be different for RAPIDS libraries which uses compiled extensions extensively. IMHO, if I were to use compiled extensions a lot, I would use scikit-build-core and give up on editable installation.

Copy link
Contributor

@jameslamb jameslamb Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, scikit-build-core just added experimental support for editable installs: https://github.com/scikit-build/scikit-build-core/releases/tag/v0.3.0

Not sure exactly what "experimental" means, to be fair, and to what extent that's acceptable for a project as widely-used as XGBoost. But just wanted to note it.

I am planning to pursue scikit-build-core for LightGBM (microsoft/LightGBM#5759), to not have to have any lingering Python glue code that's exposed to the changes in Python packaging. That is still very much a work in progress though, so I'm definitely not recommending that you switch to that over hatchling... just noting the link as something you may want to watch as we all figure out the setup.py-free world together 😊

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Guess my knowledge was outdated :) In the future, I'll definitely consider scikit-learn-core as a potential alternative.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be migrating RAPIDS to use scikit-build-core at some point in the not-too-distant future. We're already using scikit-build, so the transition is the most natural way to get us onto a PEP 517 build.

Support for editable installs is a must for RAPIDS, and is one of the reasons I haven't attempted the migration yet. That feature request is something they're aware of and are working on. My understanding is that the current experimental support only covers some possible cases around library changes but not all. I will definitely be keeping a close eye on this. I expect full support for editable installs is a precondition of a 1.0 release, and the project has been moving at a reasonable pace so it shouldn't be too long before that exists.

That said, at this stage scikit-build-core is definitely less mature than hatchling and I can't fault your choice since you seem to have been able to get everything working fairly painlessly 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyasr @hcho3 @trivialfis FYI I have a PR ready over in LightGBM for using scikit-build-core: microsoft/LightGBM#5759. If you have time and interested, I'd appreciate any comments you have on it.

We don't care about editable installs for lightgbm, and I'm willing to go contribute to scikit-build-core (both code and discussions) as we discover issues in it, to help that tool mature.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping! I've started testing the migration for a couple RAPIDS libraries as well, see rapidsai/rmm#1287 and rapidsai/cudf#13531. Editable installs are already in significantly better shape than they were a couple of months ago FWIW.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I learned a lot about scikit-build-core in this process, @ me any time on those RAPIDS PRs if you want an opinion or another set of eyes.

doc/contrib/python_packaging.rst Show resolved Hide resolved
@vyasr
Copy link

vyasr commented Apr 20, 2023

Note that I don't have permissions to resolve threads on this PR so I tried to leave comments or otherwise clearly indicate where I thought things were resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python] 'setup.py install' is deprecated
4 participants