Skip to content

Latest commit

 

History

History
337 lines (244 loc) · 12.4 KB

通过静态化参数提升模块性能.md

File metadata and controls

337 lines (244 loc) · 12.4 KB

通过静态化参数提升模块性能

python是脚本语言,python的模块可以是python脚本也可以是符合Python规范的动态链接库.利用这一特性python通常被作为胶水语言使用--python作为交互端,实际的仅调用由其他更加高效的语言写好编译成的动态链接库.这种方式并不是本文关注的内容,但由此引入一种思路--有没有这样一个工具可以将python脚本编译为功能相同,性能上全面优化过的动态链接库,那么我们不就可以调用这个动态链接库而不是源码的脚本从而提高代码的运行效率了.

那有没有呢?真的有,而且不止一种

  • Cython,老牌项目,功能强大,专注于桥接python和C,上面的功能仅是它能力的一小部分.
  • mypyc,由mypy项目扩展而来.

Cython太过复杂,需要系统的学习,因此后面会有专门的文章介绍,本文将专注于mypyc技术方案

使用mypyc静态化python模块

mypyc在语言层面和本文中静态类型检测部分中介绍的完全兼容,仅增加了int类型的细化类型mypy_extensions.i64,mypy_extensions.i32,mypy_extensions.i16,mypy_extensions.u8而已(当然也可以不用他们还是直接用int).正常带typehints的python代码可以平滑的直接使用mypyc进行静态化.

使用限制

mypyc一样有使用限制,大致可以归结为如下几点:

  • 不能使用类型标注Any,Any并没有限制类型,自然无法静态化
  • 编译要求类型严格匹配,也不能使用# type: ignore取消局部注释.编译过后的模块也只能严格的按标注的类型执行,否则会抛出TypeError.
  • 编译出来的模块无法使用python3 <module>.pypython3 -m <module>方式运行,只能使用python3 -c "import <module>"方式运行.同时源码中的if __name__ == "__main__": ...也不会生效
  • 猴子补丁Monkey patching不会生效
  • 不支持在条件分支中的函数或类定义
  • 不支持生成器表达式
  • 编译后的对象将不再支持自省
  • 编译后的对象将不再支持打断点debug,以及profile,cProfile, trace这些用于追踪性能的工具

而经过编译的类,mypyc会将其处理为原生类,静态类又会有一些额外限制:

  • 不支持__del__,__index__,__getattr__,__getattribute__,__setattr__,__delattr__
  • 不支持嵌套类
  • 元类仅支持abc.ABCMetatyping.GenericMeta
  • 类装饰器仅支持@mypy_extensions.trait,mypy_extensions.mypyc_attr,@dataclasses.dataclass@attr.s(auto_attribs=True)
  • 类中的方法不支持除了@property,@static_method@class_method外的装饰器
  • 无法动态绑定属性和方法,原生类中字段将变为不可变,同时也没有__dict__属性,如果要删除类属性需要在定义类时使用__deletable__ = ['x', 'y']标明
  • 只支持单继承或一个基类加若干@mypy_extensions.trait装饰的特质类

第一个例子

我们以例子fib.py开始实战部分

  • fib.py
import time


def fib(n: int) -> int:
    if n <= 1:
        return n
    else:
        return fib(n - 2) + fib(n - 1)


t0 = time.time()
fib(32)
print(time.time() - t0)

我们的编译应该遵循如下顺序步骤:

  1. 要编译我们首先需要确保它的静态类型检测没有问题
!mypy fib.py
�[1m�[32mSuccess: no issues found in 1 source file�[m
  1. 使用命令行工具mypyc编译模块,mypycmypy安装的一个可选项,可以用pip install -U 'mypy[mypyc]'安装
!mypyc fib.py
running build_ext
building 'fib' extension
creating build/temp.macosx-10.9-x86_64-cpython-310
creating build/temp.macosx-10.9-x86_64-cpython-310/build
/usr/local/Cellar/gcc/13.1.0/bin/gcc-13 -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/mac/micromamba/envs/py3.10/include -fPIC -O2 -isystem /Users/mac/micromamba/envs/py3.10/include -I/Users/mac/micromamba/envs/py3.10/lib/python3.10/site-packages/mypyc/lib-rt -I/Users/mac/micromamba/envs/py3.10/include/python3.10 -c build/__native.c -o build/temp.macosx-10.9-x86_64-cpython-310/build/__native.o -O3 -g1 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable -Wno-ignored-optimization-argument -Wno-cpp
creating build/lib.macosx-10.9-x86_64-cpython-310
/usr/local/Cellar/gcc/13.1.0/bin/gcc-13 -bundle -undefined dynamic_lookup -Wl,-rpath,/Users/mac/micromamba/envs/py3.10/lib -L/Users/mac/micromamba/envs/py3.10/lib -Wl,-rpath,/Users/mac/micromamba/envs/py3.10/lib -L/Users/mac/micromamba/envs/py3.10/lib build/temp.macosx-10.9-x86_64-cpython-310/build/__native.o -o build/lib.macosx-10.9-x86_64-cpython-310/fib.cpython-310-darwin.so
copying build/lib.macosx-10.9-x86_64-cpython-310/fib.cpython-310-darwin.so -> 
  1. 测试我们的编译效果
!python3 fib.py
0.6489529609680176
!python3 -c "import fib"
0.04175305366516113

可以看到效果拔群

模块包编译

这个例子在这里包编译可以在增加参数-p,它会编译指定包中的所有子模块中的所有.py文件,包括__init__.py:

mypyc -p mymath

编译包个人建议包的顶层__init__.py中导入所有要暴露的接口,避免暴露内部模块结构

如果你希望你的包在有条件的情况下可以被编译,没条件的情况下不编译也可以执行,可以这样写setup.py相关的文件:

  • pyproject.toml

    [build-system]
    requires = ["setuptools>=61.0.0", "wheel"]
    build-backend = "setuptools.build_meta"
    
    [project]
    name = "mymath"
    authors = [
      {name = "hsz", email = "[email protected]"},
    ]
    classifiers = [
      "License :: OSI Approved :: MIT License",
      "Programming Language :: Python :: 3",
      "Programming Language :: Python :: 3.10",
      "Programming Language :: Python :: 3.11",
    ]
    description = "A sample Python project for test."
    keywords = ["math", "test"]
    license = {file = "LICENSE"}
    dynamic = ["version", "readme", "dependencies"]
    requires-python = ">=3.10"
    
    [project.urls]
    changelog = "https://github.com/me/spam/blob/master/CHANGELOG.md"
    documentation = "https://readthedocs.org"
    homepage = "https://example.com"
    repository = "https://github.com/me/spam.git"
    
    [tool.setuptools]
    platforms = ["all"]
    
    [tool.setuptools.dynamic]
    readme = {file = ["README.md"], content-type = "text/markdown"}
    version = {attr = "mymath.version.__version__"}
    
    [tool.setuptools.packages.find]
    exclude = ['contrib', 'docs', 'test']
  • setup.py

    from setuptools import setup
    
    
    try:
        from mypyc.build import mypycify
    except Exception as e:
        print("没有 mypyc, model install as a pure python model")
        setup()
    else:
        from mypyc.build import mypycify
        print("有 mypyc, model install as a dll model")
        setup(
            packages=['mymath'],
            ext_modules=mypycify([
                'mymath/__init__.py',
                'mymath/fib.py',
                'mymath/square_op/quadratic_sum.py',
                'mymath/square_op/square_error.py',
                'mymath/square_op/square_root.py',
                'mymath/square_op/square.py',
            ]),
        )

我们没有将mypy[mypyc]放入build-system.requires,而是会根据是否能导入mypyc.build.mypycify来决定build行为是否要用mypyc进行编译.因此如果要分发到包索引仓库,我们应该执行如下步骤:

  1. 打源码包并上传:

    python -m build --sdist 
    twine upload  dist/*
  2. 在不同平台,不同指令集,不同python版本下在打包环境中安装好mypyc并不使用临时虚拟环境打二进制包并上传:

    python -m venv env
    source env/bin/activate
    python -m pip install 'mypy[mypyc]'
    python -m pip install 'setuptools>=61.0.0'
    python -m pip install 'wheel'
    python -m build --wheel -n # 或者python -m pip wheel --no-build-isolation
    twine upload  dist/*

一般我们开发只有一台机器,要多python版本多平台多指令集都编译一遍确实有点难为人,为了简化我们的build过程,可以有两种解决方案:

  1. 使用包含多平台支持的ci/cd工具,比如github action
  2. 使用工具cibuildwheel,cibuildwheel是一个专为打包wheel设计的工具,可以针对不同平台不同指令集打包支持的大部分python版本的wheel包.

github action配置为例,我们可以写成这样

  • pyproject.toml

    ...
    [tool.cibuildwheel]
    # 使用`build --wheel -n`命令来打包whee
    build-frontend = { name = "build", args = ["-n"] }
    before-build = "pip install build mypy[mypyc]"
    
    [tool.cibuildwheel.macos]
    archs = ["x86_64", "arm64"]
    
    # On an Linux Intel runner with qemu installed, build Intel and ARM wheels
    [tool.cibuildwheel.linux]
    archs = ["x86_64", "aarch64"]
    
    [tool.cibuildwheel.windows]
    archs = ["x86_64"]
  • .github/workflows/build_and_publish.yml

    name: Build_And_Publish
    
    on:
      release:
        types: [created]
    
    jobs:
      build_source:
        name: Build Source on linux
        runs-on: ubuntu-latest
        steps:
        - uses: actions/checkout@v2
        - name: Set up Python
          uses: actions/setup-python@v2
          with:
            python-version: '3.x'
        - name: Install dependencies
          run: |
            python -m pip install --upgrade pip
            pip install setuptools wheel twine build
        - name: Build and publish
          run: |
            python -m build --sdist
        - name: 'Upload dist'
          uses: 'actions/upload-artifact@v2'
          with:
            name: packages
            path: dist/*
        - name: Publish
          env:
            TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
            TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
          run:
            twine upload dist/*
    
      build_wheels:
        name: Build wheels on ${{ matrix.os }}
        runs-on: ${{ matrix.os }}
        strategy:
          matrix:
            os: [ubuntu-20.04, windows-2019, macos-11]
    
        steps:
          - uses: actions/checkout@v4
    
          - name: Build wheels
            uses: pypa/[email protected]
            # env:
            #   CIBW_SOME_OPTION: value
            #    ...
            # with:
            #   package-dir: .
            #   output-dir: wheelhouse
            #   config-file: "{package}/pyproject.toml"
    
          - uses: actions/upload-artifact@v3
            with:
              path: ./wheelhouse/*.whl
    
          - name: Install dependencies
            run: |
              pip install twine
    
          - name: Publish
            env:
              TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
              TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
            run:
              twine upload ./wheelhouse/*
    

这样每当你在github上创建release时就会自动触发执行上面两个步骤.cibuildwheel会在各自的系统重执行各自平台的打包操作,python版本也会根据pyproject.toml中的project.requires-python字段中定义的版本限制查找它支持的python版本进行编译.

总结

优点:

  1. 适用范围相对广,计算密集型任务,业务数据处理,io密集型任务都多少可以有加速
  2. 没有太多的学习成本,会python的typehints基本就不能用

缺点:

  1. 需要编译
  2. 分发会比较复杂
  3. 加速效果上限低
  4. 仅针对模块
  5. 无法与猴子补丁兼容

适用场景:

mypyc非常适合业务开发场景,在业务相对稳定,且需要一定性能提升的情况下,只要你的代码是纯python写的且没有打猴子补丁,mypyc是最简单快速的加速工具.