python是脚本语言,python的模块可以是python脚本也可以是符合Python规范的动态链接库.利用这一特性python通常被作为胶水语言使用--python作为交互端,实际的仅调用由其他更加高效的语言写好编译成的动态链接库.这种方式并不是本文关注的内容,但由此引入一种思路--有没有这样一个工具可以将python脚本编译为功能相同,性能上全面优化过的动态链接库,那么我们不就可以调用这个动态链接库而不是源码的脚本从而提高代码的运行效率了.
那有没有呢?真的有,而且不止一种
Cython太过复杂,需要系统的学习,因此后面会有专门的文章介绍,本文将专注于mypyc
技术方案
mypyc在语言层面和本文中静态类型检测部分中介绍的完全兼容,仅增加了int
类型的细化类型mypy_extensions.i64
,mypy_extensions.i32
,mypy_extensions.i16
,mypy_extensions.u8
而已(当然也可以不用他们还是直接用int
).正常带typehints的python代码可以平滑的直接使用mypyc进行静态化.
mypyc一样有使用限制,大致可以归结为如下几点:
- 不能使用类型标注
Any
,Any
并没有限制类型,自然无法静态化 - 编译要求类型严格匹配,也不能使用
# type: ignore
取消局部注释.编译过后的模块也只能严格的按标注的类型执行,否则会抛出TypeError
. - 编译出来的模块无法使用
python3 <module>.py
或python3 -m <module>
方式运行,只能使用python3 -c "import <module>"
方式运行.同时源码中的if __name__ == "__main__": ...
也不会生效 - 猴子补丁
Monkey patching
不会生效 - 不支持在条件分支中的函数或类定义
- 不支持生成器表达式
- 编译后的对象将不再支持自省
- 编译后的对象将不再支持打断点debug,以及
profile
,cProfile
,trace
这些用于追踪性能的工具
而经过编译的类,mypyc会将其处理为原生类,静态类又会有一些额外限制:
- 不支持
__del__
,__index__
,__getattr__
,__getattribute__
,__setattr__
,__delattr__
- 不支持嵌套类
- 元类仅支持
abc.ABCMeta
和typing.GenericMeta
- 类装饰器仅支持
@mypy_extensions.trait
,mypy_extensions.mypyc_attr
,@dataclasses.dataclass
和@attr.s(auto_attribs=True)
- 类中的方法不支持除了
@property
,@static_method
和@class_method
外的装饰器 - 无法动态绑定属性和方法,原生类中字段将变为不可变,同时也没有
__dict__
属性,如果要删除类属性需要在定义类时使用__deletable__ = ['x', 'y']
标明 - 只支持单继承或一个基类加若干
@mypy_extensions.trait
装饰的特质类
我们以例子fib.py
开始实战部分
fib.py
import time
def fib(n: int) -> int:
if n <= 1:
return n
else:
return fib(n - 2) + fib(n - 1)
t0 = time.time()
fib(32)
print(time.time() - t0)
我们的编译应该遵循如下顺序步骤:
- 要编译我们首先需要确保它的静态类型检测没有问题
!mypy fib.py
�[1m�[32mSuccess: no issues found in 1 source file�[m
- 使用命令行工具
mypyc
编译模块,mypyc
是mypy
安装的一个可选项,可以用pip install -U 'mypy[mypyc]'
安装
!mypyc fib.py
running build_ext
building 'fib' extension
creating build/temp.macosx-10.9-x86_64-cpython-310
creating build/temp.macosx-10.9-x86_64-cpython-310/build
/usr/local/Cellar/gcc/13.1.0/bin/gcc-13 -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/mac/micromamba/envs/py3.10/include -fPIC -O2 -isystem /Users/mac/micromamba/envs/py3.10/include -I/Users/mac/micromamba/envs/py3.10/lib/python3.10/site-packages/mypyc/lib-rt -I/Users/mac/micromamba/envs/py3.10/include/python3.10 -c build/__native.c -o build/temp.macosx-10.9-x86_64-cpython-310/build/__native.o -O3 -g1 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable -Wno-ignored-optimization-argument -Wno-cpp
creating build/lib.macosx-10.9-x86_64-cpython-310
/usr/local/Cellar/gcc/13.1.0/bin/gcc-13 -bundle -undefined dynamic_lookup -Wl,-rpath,/Users/mac/micromamba/envs/py3.10/lib -L/Users/mac/micromamba/envs/py3.10/lib -Wl,-rpath,/Users/mac/micromamba/envs/py3.10/lib -L/Users/mac/micromamba/envs/py3.10/lib build/temp.macosx-10.9-x86_64-cpython-310/build/__native.o -o build/lib.macosx-10.9-x86_64-cpython-310/fib.cpython-310-darwin.so
copying build/lib.macosx-10.9-x86_64-cpython-310/fib.cpython-310-darwin.so ->
- 测试我们的编译效果
!python3 fib.py
0.6489529609680176
!python3 -c "import fib"
0.04175305366516113
可以看到效果拔群
这个例子在这里包编译可以在增加参数-p
,它会编译指定包中的所有子模块中的所有.py
文件,包括__init__.py
:
mypyc -p mymath
编译包个人建议包的顶层__init__.py
中导入所有要暴露的接口,避免暴露内部模块结构
如果你希望你的包在有条件的情况下可以被编译,没条件的情况下不编译也可以执行,可以这样写setup.py
相关的文件:
-
pyproject.toml
[build-system] requires = ["setuptools>=61.0.0", "wheel"] build-backend = "setuptools.build_meta" [project] name = "mymath" authors = [ {name = "hsz", email = "[email protected]"}, ] classifiers = [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", ] description = "A sample Python project for test." keywords = ["math", "test"] license = {file = "LICENSE"} dynamic = ["version", "readme", "dependencies"] requires-python = ">=3.10" [project.urls] changelog = "https://github.com/me/spam/blob/master/CHANGELOG.md" documentation = "https://readthedocs.org" homepage = "https://example.com" repository = "https://github.com/me/spam.git" [tool.setuptools] platforms = ["all"] [tool.setuptools.dynamic] readme = {file = ["README.md"], content-type = "text/markdown"} version = {attr = "mymath.version.__version__"} [tool.setuptools.packages.find] exclude = ['contrib', 'docs', 'test']
-
setup.py
from setuptools import setup try: from mypyc.build import mypycify except Exception as e: print("没有 mypyc, model install as a pure python model") setup() else: from mypyc.build import mypycify print("有 mypyc, model install as a dll model") setup( packages=['mymath'], ext_modules=mypycify([ 'mymath/__init__.py', 'mymath/fib.py', 'mymath/square_op/quadratic_sum.py', 'mymath/square_op/square_error.py', 'mymath/square_op/square_root.py', 'mymath/square_op/square.py', ]), )
我们没有将mypy[mypyc]
放入build-system.requires
,而是会根据是否能导入mypyc.build.mypycify
来决定build行为是否要用mypyc进行编译.因此如果要分发到包索引仓库,我们应该执行如下步骤:
-
打源码包并上传:
python -m build --sdist twine upload dist/*
-
在不同平台,不同指令集,不同python版本下在打包环境中安装好mypyc并不使用临时虚拟环境打二进制包并上传:
python -m venv env source env/bin/activate python -m pip install 'mypy[mypyc]' python -m pip install 'setuptools>=61.0.0' python -m pip install 'wheel' python -m build --wheel -n # 或者python -m pip wheel --no-build-isolation twine upload dist/*
一般我们开发只有一台机器,要多python版本多平台多指令集都编译一遍确实有点难为人,为了简化我们的build过程,可以有两种解决方案:
- 使用包含多平台支持的ci/cd工具,比如
github action
- 使用工具cibuildwheel,cibuildwheel是一个专为打包wheel设计的工具,可以针对不同平台不同指令集打包支持的大部分python版本的wheel包.
以github action
配置为例,我们可以写成这样
-
pyproject.toml
... [tool.cibuildwheel] # 使用`build --wheel -n`命令来打包whee build-frontend = { name = "build", args = ["-n"] } before-build = "pip install build mypy[mypyc]" [tool.cibuildwheel.macos] archs = ["x86_64", "arm64"] # On an Linux Intel runner with qemu installed, build Intel and ARM wheels [tool.cibuildwheel.linux] archs = ["x86_64", "aarch64"] [tool.cibuildwheel.windows] archs = ["x86_64"]
-
.github/workflows/build_and_publish.yml
name: Build_And_Publish on: release: types: [created] jobs: build_source: name: Build Source on linux runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.x' - name: Install dependencies run: | python -m pip install --upgrade pip pip install setuptools wheel twine build - name: Build and publish run: | python -m build --sdist - name: 'Upload dist' uses: 'actions/upload-artifact@v2' with: name: packages path: dist/* - name: Publish env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} run: twine upload dist/* build_wheels: name: Build wheels on ${{ matrix.os }} runs-on: ${{ matrix.os }} strategy: matrix: os: [ubuntu-20.04, windows-2019, macos-11] steps: - uses: actions/checkout@v4 - name: Build wheels uses: pypa/[email protected] # env: # CIBW_SOME_OPTION: value # ... # with: # package-dir: . # output-dir: wheelhouse # config-file: "{package}/pyproject.toml" - uses: actions/upload-artifact@v3 with: path: ./wheelhouse/*.whl - name: Install dependencies run: | pip install twine - name: Publish env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} run: twine upload ./wheelhouse/*
这样每当你在github上创建release时就会自动触发执行上面两个步骤.cibuildwheel
会在各自的系统重执行各自平台的打包操作,python版本也会根据pyproject.toml
中的project.requires-python
字段中定义的版本限制查找它支持的python版本进行编译.
优点:
- 适用范围相对广,计算密集型任务,业务数据处理,io密集型任务都多少可以有加速
- 没有太多的学习成本,会python的typehints基本就不能用
缺点:
- 需要编译
- 分发会比较复杂
- 加速效果上限低
- 仅针对模块
- 无法与猴子补丁兼容
适用场景:
mypyc非常适合业务开发场景,在业务相对稳定,且需要一定性能提升的情况下,只要你的代码是纯python写的且没有打猴子补丁,mypyc是最简单快速的加速工具.