Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put fewer packages on PyPI #487

Open
mattip opened this issue Aug 13, 2021 · 9 comments
Open

Put fewer packages on PyPI #487

mattip opened this issue Aug 13, 2021 · 9 comments
Assignees

Comments

@mattip
Copy link

mattip commented Aug 13, 2021

This project requires an out-of-proportion amount of storage space on PyPI. This is problematic since the storage space is donated and the general assumption is that projects will not over-use the resources. In order to analyze what is going on, let's look at some data.

Each release of the project creates these artifacts (taken from the 4.35.1 release)

cp27-cp27m-macosx_10_9_x86_64.whl, 684.1 kB
cp27-cp27m-manylinux1_i686.whl, 2.4 MB
cp27-cp27m-manylinux1_x86_64.whl, 2.6 MB
cp27-cp27m-manylinux2010_i686.whl, 2.4 MB
cp27-cp27m-manylinux2010_x86_64.whl, 2.6 MB
cp27-cp27mu-manylinux1_i686.whl, 2.4 MB
cp27-cp27mu-manylinux1_x86_64.whl, 2.6 MB
cp27-cp27mu-manylinux2010_i686.whl, 2.4 MB
cp27-cp27mu-manylinux2010_x86_64.whl, 2.6 MB
cp35-cp35m-macosx_10_9_x86_64.whl, 698.8 kB
cp35-cp35m-manylinux1_i686.whl, 2.9 MB
cp35-cp35m-manylinux1_x86_64.whl, 3.1 MB
cp35-cp35m-manylinux2010_i686.whl, 2.9 MB
cp35-cp35m-manylinux2010_x86_64.whl, 3.1 MB
cp35-cp35m-manylinux2014_aarch64.whl, 3.7 MB
cp35-cp35m-win32.whl, 354.9 kB
cp35-cp35m-win_amd64.whl, 422.1 kB
cp36-cp36m-macosx_10_9_x86_64.whl, 700.7 kB
cp36-cp36m-manylinux1_i686.whl, 3.0 MB
cp36-cp36m-manylinux1_x86_64.whl, 3.3 MB
cp36-cp36m-manylinux2010_i686.whl, 3.0 MB
cp36-cp36m-manylinux2010_x86_64.whl, 3.3 MB
cp36-cp36m-manylinux2014_aarch64.whl, 3.8 MB
cp36-cp36m-win32.whl, 383.6 kB
cp36-cp36m-win_amd64.whl, 451.7 kB
cp37-cp37m-macosx_10_9_x86_64.whl, 704.6 kB
cp37-cp37m-manylinux1_i686.whl, 3.0 MB
cp37-cp37m-manylinux1_x86_64.whl, 3.2 MB
cp37-cp37m-manylinux2010_i686.whl, 3.0 MB
cp37-cp37m-manylinux2010_x86_64.whl, 3.2 MB
cp37-cp37m-manylinux2014_aarch64.whl, 3.8 MB
cp37-cp37m-win32.whl, 381.7 kB
cp37-cp37m-win_amd64.whl, 452.2 kB
cp38-cp38-macosx_10_9_x86_64.whl, 730.4 kB
cp38-cp38-manylinux1_i686.whl, 4.0 MB
cp38-cp38-manylinux1_x86_64.whl, 4.2 MB
cp38-cp38-manylinux2010_i686.whl, 4.0 MB
cp38-cp38-manylinux2010_x86_64.whl, 4.2 MB
cp38-cp38-manylinux2014_aarch64.whl, 4.8 MB
cp38-cp38-win32.whl, 394.0 kB
cp38-cp38-win_amd64.whl, 479.7 kB
cp39-cp39-macosx_10_9_x86_64.whl, 734.2 kB
cp39-cp39-manylinux1_i686.whl, 3.5 MB
cp39-cp39-manylinux1_x86_64.whl, 3.8 MB
cp39-cp39-manylinux2010_i686.whl, 3.5 MB
cp39-cp39-manylinux2010_x86_64.whl, 3.8 MB
cp39-cp39-manylinux2014_aarch64.whl, 4.3 MB
cp39-cp39-win32.whl, 392.6 kB
cp39-cp39-win_amd64.whl, 479.4 kB
pp27-pypy_73-macosx_10_9_x86_64.whl, 501.4 kB
pp27-pypy_73-manylinux1_x86_64.whl, 543.0 kB
pp27-pypy_73-manylinux2010_x86_64.whl, 543.0 kB
pp27-pypy_73-win32.whl, 342.4 kB
pp36-pypy36_pp73-macosx_10_9_x86_64.whl, 498.5 kB
pp36-pypy36_pp73-manylinux1_x86_64.whl, 542.1 kB
pp36-pypy36_pp73-manylinux2010_x86_64.whl, 542.1 kB
pp36-pypy36_pp73-win32.whl, 300.8 kB
pp37-pypy37_pp73-macosx_10_9_x86_64.whl, 498.5 kB
pp37-pypy37_pp73-manylinux1_x86_64.whl, 542.1 kB
pp37-pypy37_pp73-manylinux2010_x86_64.whl, 542.0 kB
pp37-pypy37_pp73-win32.whl, 300.8 kB

I think I left out the source tarball. This sums up to 122GB 122MB per release. The project has had about 50 releases in the first half of 2021, sometimes multiple releases on a single day. This comes out to about 12 TB 12GB a year. It seems this project has under 2000 downloads a month. Scipy, by comparision, ships 18 wheels, each about 30MB, twice a year for 30GB of yearly storage and has about 30 million downloads a month (take those statistics with a grain of salt, they say the last version of this package is 1.2.0).

So how can you reduce the resource requirements by three orders of magnitude?

  • Release a pure-python version of the package. This would reduce both the number of wheels and the size. Is it clear that the cython speed is a requirement of the project? Note this would not preclude building wheels for the "more important" platforms, pip install will prefer binary wheels to pure python ones. You may be interested in refactoring the code to use the "pure python" mode available in cython 3.0, which will make supporting both modes in the codebase simpler.
  • Release 4 times a year instead of ~100 times a year. (a 25x reduction)
  • Do not release both manylinux1 and manylinux2010 packages (a 2x reduction). I would stick with manylinux2010, but you know your users better than I do.
  • Drop older versions of python (3.5, 3.6, pypy2.7, pypy3.6) (around a 2x reduction)
  • Strip the builds. I see you use cibuildwheel, there is a discussion on how to do this Strip debug symbols of wheels pypa/cibuildwheel#331 (maybe ~3x reduction, maybe more?).
@mattip
Copy link
Author

mattip commented Aug 13, 2021

Another thing to think about is whether the package can be built using the limited API, it seems cython has some support and there are some hints for cibuildwheel. Then there would only be one wheel for all the python versions on a platform.

@da-woods
Copy link

122GB

Unless I'm missing something, I make it 122MB. Which seems a little more reasonable. Although smaller is obviously still better.

I'd be reluctant to recommend Cython's limited API support yet (by all means try it and submit bug reports, but I wouldn't release it to actual users)

@mattip
Copy link
Author

mattip commented Aug 13, 2021

Sorry, miscalculated. Fixing the comment

@da-woods
Copy link

The other approach that people sometimes use is to ship the Cython-generated C files rather than binaries (although obviously that then requires the user to have a C compiler, but doesn't require the user to have Cython). Obviously that could be combined with pure-Python mode as an additional fallback level.

@rmk135
Copy link
Member

rmk135 commented Aug 13, 2021

@mattip , this project had 360 000 downloads last month - https://pypistats.org/packages/dependency-injector

You were looking at a wrong project with similar name.

@rmk135
Copy link
Member

rmk135 commented Aug 13, 2021

The other approach that people sometimes use is to ship the Cython-generated C files rather than binaries (although obviously that then requires the user to have a C compiler, but doesn't require the user to have Cython). Obviously that could be combined with pure-Python mode as an additional fallback level.

I shipped project as generated C code and it created some problems for the users. As you noted, with C sources everybody will need to install C compilers. And build time will be much higher. Also this spends much more resources in a global scope because of thousand of compilations vs a single one on CD server. Just removing the pre-compiled wheels from the project that already had ones doesn't seem to be a good solution. This will break people's software.

@rmk135 rmk135 self-assigned this Aug 13, 2021
@rmk135
Copy link
Member

rmk135 commented Aug 13, 2021

So how can you reduce the resource requirements by three orders of magnitude?

  • Release a pure-python version of the package. This would reduce both the number of wheels and the size. Is it clear that the cython speed is a requirement of the project? Note this would not preclude building wheels for the "more important" platforms, pip install will prefer binary wheels to pure python ones. You may be interested in refactoring the code to use the "pure python" mode available in cython 3.0, which will make supporting both modes in the codebase simpler.
  • Release 4 times a year instead of ~100 times a year. (a 25x reduction)
  • Do not release both manylinux1 and manylinux2010 packages (a 2x reduction). I would stick with manylinux2010, but you know your users better than I do.
  • Drop older versions of python (3.5, 3.6, pypy2.7, pypy3.6) (around a 2x reduction)
  • Strip the builds. I see you use cibuildwheel, there is a discussion on how to do this Strip debug symbols of wheels pypa/cibuildwheel#331 (maybe ~3x reduction, maybe more?).

@mattip , thank you, that's something to think about. The problem is that I'm stuck at the moment cause I can not make a single release.

@mattip
Copy link
Author

mattip commented Aug 13, 2021

You were looking at a wrong project with similar name.

That makes more sense, thanks.

Correct me if I am wrong, it seems this new release is for issue #477. Perhaps you could delay the release to the end of the month, and in the mean time shrink the release so it fits into the space you freed up by deleting older versions. I would expect the approval process for PyPI to take a few weeks anyway, they are very overburdened.

@rmk135
Copy link
Member

rmk135 commented Aug 13, 2021

@mattip Yeah, that's correct. You brought up a lot of interesting suggestions. I don't think I'll be able to apply all mentioned, but I should be able to make some improvements for sure. Thanks again for your input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants