-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PyPy to speed up *_postgres2es.py scripts #137
Labels
Enhancement
New feature or request
Comments
anthonyfok
added a commit
to anthonyfok/python-env
that referenced
this issue
Oct 22, 2021
Fast-forward to debian:sid-20201012-slim, which is the last snapshot before Python 3.9 became the default in Debian Install PyPy (pypy3 7.3.2) for speed improvement; see OpenDRR/opendrr-api#137 Refresh most Python libraries from Debian packages, and avoid building psycopg2 from source: - python3-numpy: 1.19.2 (was 1.18.5) - python3-pandas: 1.0.5 (was 1.0.4) - python3-psycopg2: 2.8.5 (was 2.6) - python3-psycopg2cffi: 2.8.1 (newly added) - python3-requests: 2.23.0 (was 2.23.0) - python3-sqlalchemy: 1.3.19 (was 1.3.17) From PyPI using pip3: - elasticsearch: 7.12.0 (was 7.7.1) Install git 2.30.2 and git-lfs 2.13.2 from Debian 11 (bullseye) as the old git-lfs 2.11.0 hangs at "GIT_LFS_SKIP_SMUDGE=1 git checkout". Move installation of extra utilities needed by add_data.sh from OpenDRR/opendrr-api python/Dockerfile to this repo: - dos2unix eatmydata jq moreutils nano time Add org.opencontainers.image.* labels Clean up /var/lib/apt/lists/* to further save space. Image size has been reduced from 662MB to 561MB.
This was referenced Oct 22, 2021
jvanulde
added
Priority: Must Have
and removed
Priority: Must Have
Priority: Should Have
labels
Jan 4, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While tweaking the Dockerfile for ghcr.io/opendrr/python-env, replacing some pip-installed Python libraries with Debian-prepackaged ones, and searching for python3-psyopg2 (
apt search psycopg2
), I came across the python3-psyopg2cffi package, which eventually led me to PyPy, the alternative Python implementation with Just-In-Time compiler that promises much faster speed than the normal interpreted CPython implementation.Early test results look promising. The time to run the following command:
is reduced from 11 seconds with
python3
down to 6 seconds withpypy3
. populateElasticsearchIndex(), not much change.Anyhow, it appears we could shave off the time needed to run the *_postgres2es.py scripts by up to 30% (which varies among different scripts and datasets apparently).
CPython:
PyPy:
Pull request will follow soon.
Prerequisite:
pypy3
andpython3-psycopg2cffi
preinstalled in ghcr.io/opendrr/python-env container image (probably tagged1.1.0
?)The text was updated successfully, but these errors were encountered: