Skip to content

Commit

Permalink
pyembed: support loading in-memory extension modules on Windows
Browse files Browse the repository at this point in the history
There exists a dark arts mechanism for loading Windows PE files from
memory. A mechanism to facilitate this is implemented in the
MemoryModule library at https://github.com/fancycode/MemoryModule.

I have published a `memory-module-sys` Rust crate to expose
bindings to this library. This enables Rust to load DLLs from
memory.

Previously in PyOxidizer, we taught the embedded Python resources
data structure to define the contents of a shared library extension
module to be imported from memory.

This commit combines the two efforts and enables the `pyembed` crate
to import Python extension modules which reside in memory.

Getting this working took a fair amount of effort. There were
a handful of attempts that did not pan out. Some of the failed
attempts appeared to work. But they were subtly broken due to e.g.
the `LazyLoader` importer assuming that the `sys.modules()` entry
wouldn't be modified. In the end, the final implementation
emulates CPython's extension module loading mechanism as closely
as possible. This was the only way I was able to preserve compatibility
with `LazyLoader` (just implementing `exec_module()` without
`create_module()` appears impossible - at least without writing our
own lazy module implementation).

While this commit produces working results, it is far from feature
complete. We still do not handle library dependencies properly.
We will likely need to teach the embedded resources data structure
about the existence of shared library resources and dependencies
from extension modules so that shared libraries can be imported
from memory when an extension module is imported.

Because this commit utilizes some CPython APIs outside the paved
road of CPython APIs, we had to contribute support for these
symbols to python3-sys
(dgrunwald/rust-cpython#210). This is why
we now depend on a specific Git commit of python3-sys and the
cpython crates. This means we can't release pyembed to crates.io
until a new version of these crates is published... We're likely
a ways off from a new release, as I don't want to solidify the new
embedded resources format until it has more features. So hopefully
this isn't a problem...
  • Loading branch information
indygreg committed Feb 28, 2020
1 parent 0771233 commit d719386
Show file tree
Hide file tree
Showing 8 changed files with 371 additions and 23 deletions.
49 changes: 42 additions & 7 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions docs/history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ Bug Fixes
New Features
^^^^^^^^^^^^

* Windows binaries can now import extension modules defined as shared libraries
(e.g. `.pyd` files) from memory. PyOxidizer will detect `.pyd` files during
packaging and embed them into the binary as resources. When the module
is imported, the extension module/shared library is loaded from memory
and initialized. This feature enables PyOxidizer to package pre-built
extension modules (e.g. from Windows binary wheels published on PyPI)
while still maintaining the property of a (mostly) self-contained
executable.
* Multiple bytecode optimization levels can now be embedded in binaries.
Previously, it was only possible to embed bytecode for a given module
at a single optimization level.
Expand Down
3 changes: 2 additions & 1 deletion docs/packaging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,8 @@ Adding Extension Modules At Run-Time
====================================

Normally, Python extension modules are compiled into the binary as part
of the embedded Python interpreter.
of the embedded Python interpreter or embedded Python resources data
structure.

``PyOxidizer`` also supports providing additional extension modules at run-time.
This can be useful for larger Rust applications providing extension modules
Expand Down
66 changes: 59 additions & 7 deletions docs/packaging_pitfalls.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,13 +224,65 @@ C is used to implement extension modules.)
The way this typically works is some build system (often ``distutils`` via a
``setup.py`` script) produces a shared library file containing the extension.
On Linux and macOS, the file extension is typically ``.so``. On Windows, it
is ``.pyd``. Python's importing mechanism looks for these files in addition
to normal ``.py`` and ``.pyc`` files when an ``import`` is requested.

PyOxidizer currently has :ref:`limited support <status_extension_modules>` for
extension modules. Under some circumstances, building extension modules as
part of regular package build machinery *just works* and the resulting
extension module can be embedded in the produced binary.
is ``.pyd``. When an ``import`` is requested, Python's importing mechanism
looks for these files in addition to normal ``.py`` and ``.pyc`` files. If
an extension module is found, Python will ``dlopen()`` the file and load the
shared library into the process. It will then call into an initialization
function exported by that shared library to obtain a Python module instance.

Python packaging has defined various conventions for distributing pre-compiled
extension modules in *wheels*. If you see an e.g.
``<package>-<version>-cp38-cp38-win_amd64.whl``,
``<package>-<version>-cp38-cp38-manylinux2014_x86_64.whl``, or
``<package>-<version>-cp38-cp38-macosx_10_9_x86_64.whl`` file, you are
installing a Python package with a pre-compiled extension module. Inside the
*wheel* is a shared library providing the extension module. And that shared
library is configured to work with a Python distribution (typically ``CPython``)
built in a specific way. e.g. with a ``libpythonXY`` shared library exporting
Python symbols.

PyOxidizer currently has :ref:`some support <status_extension_modules>` for
extension modules. The way this works depends on the platform and Python
distribution.

Dynamically Linked Python Distributions on Windows
--------------------------------------------------

When using a dynamically linked Python distribution on Windows (e.g.
via the ``flavor="standalone_dynamic"`` argument to
:ref:`config_default_python_distribution`, PyOxidizer:

* Supports importing shared library extension modules (e.g. ``.pyd`` files)
from memory.
* Automatically detects and uses ``.pyd`` files from pre-built binary
packages installed as part of packaging.
* Automatically detects and uses ``.pyd`` files produced during package
building.

However, there are caveats to this support!

PyOxidizer doesn't currently support resolving additional library
dependencies from ``.pyd`` extension modules / shared libraries when
importing from memory. If an extension module depends on another shared
library (almost certainly a ``.dll``) outside the normal set of libraries
(namely the C Runtime and other common Windows system DLLs), you will
need to manually package this library next to the application ``.exe``.
Failure to do this could result in a failure at ``import`` time.

PyOxidizer does support loading shared library extension modules from
``.pyd`` files on the filesystem like a typical Python program. So
if you cannot make in-memory extension module importing work, you
can fall back to packaging a ``.pyd`` file in a directory registered
on ``sys.path``, as set through the :ref:`config_python_interpreter_config`
Starlark primitive.

Extension Modules Everywhere Else
---------------------------------

If PyOxidizer is not able to easily reuse a Python extension module
built or distributed in a traditional manner, it will attempt to
compile the extension module from source in a way that is compatible
with the PyOxidizer distribution and application configuration.

The way PyOxidizer achieves this is a bit crude, but effective.

Expand Down
4 changes: 2 additions & 2 deletions docs/status.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ binaries.
Native Extension Modules
------------------------

Building and using compiled extension modules (e.g. C extensions) is
partially supported.
Using compiled extension modules (e.g. C extensions) is partially
supported.

Building C extensions to be embedded in the produced binary works
for Windows, Linux, and macOS.
Expand Down
7 changes: 5 additions & 2 deletions pyembed/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@ links = "pythonXY"
[dependencies]
# Update documentation in lib.rs when new dependencies are added.
byteorder = "1"
cpython = "0.4"
cpython = { git = "https://github.com/dgrunwald/rust-cpython", rev = "7fb4dd2e59ccf0fbf6bbe874b602e52b8aa4a8c1" }
jemalloc-sys = { version = "0.3", optional = true }
lazy_static = "1.4"
libc = "0.2"
python3-sys = "0.4"
python3-sys = { git = "https://github.com/dgrunwald/rust-cpython", rev = "7fb4dd2e59ccf0fbf6bbe874b602e52b8aa4a8c1" }
uuid = { version = "0.8", features = ["v4"] }

[target.'cfg(windows)'.dependencies]
memory-module-sys = "0.1"

[dev-dependencies]
pyoxidizer = { version = "0.7.0-pre", path = "../pyoxidizer" }

Expand Down
Loading

0 comments on commit d719386

Please sign in to comment.