Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Refactor setup.py for improved build process #435

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 51 additions & 31 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
from setuptools import setup, find_packages, Extension
from setuptools.command.build_ext import build_ext as _build_ext
import multiprocessing
import os
import sys
from pathlib import Path

import numpy
import os
import os.path as path
import multiprocessing
from setuptools import Extension, find_packages, setup
from setuptools.command.build_ext import build_ext as _build_ext

multiprocessing.set_start_method('fork')

if os.name == "nt":
multiprocessing.set_start_method("spawn")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raysect 0.8.x does not support Windows. The multiprocessing paradigm it uses relies on the Linux kernel's copy-on-write when forking processes, and this is not something Windows supports. The fix is to use a new parallel render engine which doesn't require fork's COW semantics, and this is not a small project. Adjusting setup.py to support building on Windows may give Windows users the false impression that they should expect Raysect to work well on this platform.

Copy link
Contributor Author

@munechika-koyo munechika-koyo Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review!
Is the incompatibility for Windows tied to all multiprocessing methods? If so, can I also fix this issue by modifying the workflow.py module?
(I guess things are not so simple...)

Copy link
Contributor

@vsnever vsnever Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing "fork" to "spawn" in MulticoreEngine will cause local copies of the entire scene graph, including all its data, to be created in child processes. Meshes used in Raysect/Cherab scenes can take up several gigabytes. With dozens of parallel processes, this can lead to out of memory problem, not to mention possible performance issues. The scene graph must be shared between all the processes instead of copying. This is currently achieved using the POSIX fork() Copy-on-Write semantics. I don't know of any other way to do this in pure Python. I think the best way to use Raysect on Windows is via WSL2.

Copy link
Contributor

@vsnever vsnever Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.13 added an experimental free-threading build. In the future, multithreading may replace multiprocessing in Raysect's MulticoreEngine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Matt and I ever get time to return to Raysect development (its not impossible, just impractical for now) I'll like to experiment with the new threading model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsnever, thank you for your comment! "fork" seems to use a shared memory scheme, doesn't it?
The GIL-free threading is very attractive and could be an option in the future.

@CnlPepper Thank you for commenting! I can't wait for you to come back as a developer😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the "spawn" setting in the if case to raise OSError to throw an error if the OS is Windows.

else:
multiprocessing.set_start_method("fork")

use_cython = True
force = False
Expand Down Expand Up @@ -34,56 +39,70 @@
annotate = True
sys.argv.remove("--annotate")

source_paths = ['raysect', 'demos']
source_paths = ["raysect", "demos"]
compilation_includes = [".", numpy.get_include()]
compilation_args = ['-O3']
compilation_args = ["-O3"]
cython_directives = {
# 'auto_pickle': True,
'language_level': 3
"language_level": 3
}
setup_path = path.dirname(path.abspath(__file__))
setup_path = Path(__file__).parent

if line_profile:
compilation_args.append("-DCYTHON_TRACE=1")
compilation_args.append("-DCYTHON_TRACE_NOGIL=1")
cython_directives["linetrace"] = True

if use_cython:

from Cython.Build import cythonize

# build .pyx extension list
extensions = []
for package in source_paths:
for root, dirs, files in os.walk(path.join(setup_path, package)):
for file in files:
if path.splitext(file)[1] == ".pyx":
pyx_file = path.relpath(path.join(root, file), setup_path)
module = path.splitext(pyx_file)[0].replace("/", ".")
extensions.append(Extension(module, [pyx_file], include_dirs=compilation_includes, extra_compile_args=compilation_args),)
for pyx in (setup_path / package).glob("**/*.pyx"):
pyx_path = pyx.relative_to(setup_path)
module = ".".join(pyx_path.with_suffix("").parts)
extensions.append(
Extension(
module,
[str(pyx_path)],
include_dirs=compilation_includes,
extra_compile_args=compilation_args,
),
)

if profile:
cython_directives["profile"] = True

# generate .c files from .pyx
extensions = cythonize(extensions, nthreads=multiprocessing.cpu_count(), force=force, compiler_directives=cython_directives, annotate=annotate)
extensions = cythonize(
extensions,
nthreads=multiprocessing.cpu_count(),
force=force,
compiler_directives=cython_directives,
annotate=annotate,
)

else:

# build .c extension list
extensions = []
for package in source_paths:
for root, dirs, files in os.walk(path.join(setup_path, package)):
for file in files:
if path.splitext(file)[1] == ".c":
c_file = path.relpath(path.join(root, file), setup_path)
module = path.splitext(c_file)[0].replace("/", ".")
extensions.append(Extension(module, [c_file], include_dirs=compilation_includes, extra_compile_args=compilation_args),)

for c_file in (setup_path / package).glob("**/*.c"):
c_file_path = c_file.relative_to(setup_path)
module = ".".join(c_file_path.with_suffix("").parts)
extensions.append(
Extension(
module,
[str(c_file_path)],
include_dirs=compilation_includes,
extra_compile_args=compilation_args,
),
)
# parse the package version number
with open(path.join(path.dirname(__file__), 'raysect/VERSION')) as version_file:
with (Path(__file__).parent / "raysect/VERSION").open("r") as version_file:
version = version_file.read().strip()


# Use multiple processes by default for building extensions
class build_ext(_build_ext):
def finalize_options(self):
Expand All @@ -92,13 +111,14 @@ def finalize_options(self):
nproc = int(os.getenv("RAYSECT_BUILD_JOBS", str(multiprocessing.cpu_count())))
self.parallel = nproc


setup(
name="raysect",
version=version,
url="http://www.raysect.org",
author="Dr Alex Meakins et al.",
author_email="[email protected]",
description='A Ray-tracing Framework for Science and Engineering',
description="A Ray-tracing Framework for Science and Engineering",
license="BSD",
classifiers=[
"Development Status :: 5 - Production/Stable",
Expand All @@ -111,12 +131,12 @@ def finalize_options(self):
"Programming Language :: Cython",
"Programming Language :: Python :: 3",
"Topic :: Multimedia :: Graphics :: 3D Rendering",
"Topic :: Scientific/Engineering :: Physics"
"Topic :: Scientific/Engineering :: Physics",
],
install_requires=['numpy', 'matplotlib'],
install_requires=["numpy", "matplotlib"],
packages=find_packages(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pyproject.toml we use oldest-supported-numpy. The project is thus built against numpy 1.x for all versions of Python where that is supported, and means in turn that numpy<2 is required at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should keep numpy<2.0 in runtime.
However, when I specify numpy > 2.0 in build time, raysect's unit tests seem to be succeeded according to the conda building process.
Needless to say, the unit test does not seem to cover all raysect's functionalities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, some demo scripts were successfully executed even in numpy==2.1.2.

include_package_data=True,
zip_safe= False,
zip_safe=False,
ext_modules=extensions,
cmdclass={"build_ext": build_ext},
)
Loading