Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip debug symbols of wheels #331

Open
YannickJadoul opened this issue Apr 30, 2020 · 16 comments
Open

Strip debug symbols of wheels #331

YannickJadoul opened this issue Apr 30, 2020 · 16 comments

Comments

@YannickJadoul
Copy link
Member

I was just made aware by @mattip that some Python distributions have -g in sysconfig.get_config_vars('CFLAGS'), and thus include debug symbols (including the versions in the manylinux images, it seems). The reason for this apparently is that these are then stripped before packed in the Debian/Fedora/... package managers (and often the symbols themselves are added as separate package for debugging).

The thing about building wheels is that these sysconfig are used when building wheels. So should we somehow strip symbols as well, or add -Wl,-strip-all to the build flags, or ... ?

@YannickJadoul
Copy link
Member Author

See also numpy/numpy#16110

@mattip
Copy link
Contributor

mattip commented Apr 30, 2020

xref MacPython/numpy-wheels#82. FWIW, multibuild uses -Wl,-strip-all by default

@joerick
Copy link
Contributor

joerick commented May 1, 2020

It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports. But it might be nice to document it somewhere, for example in the Tips and Tricks section of the docs "Why are my wheels so big?"

@YannickJadoul
Copy link
Member Author

YannickJadoul commented May 1, 2020

That could also work, ofc. If we then document that adding -Wl,-strip-all does the job, that would help.

On the other hand, wouldn't stripping them be a sensible default? By default Python won't print any C stack trace, and it would be reasonably hard for a typical Python users to get these (going through gdb would then be the way to go, no, or is there a simpler way?).

The problem not stripping by default is that (almost) no one reads the docs unless there's a problem (like missing debug symbols).

@joerick
Copy link
Contributor

joerick commented May 2, 2020

The problem not stripping by default is that (almost) no one reads the docs unless there's a problem (like missing debug symbols).

That's true, but I'm not sure it's that big of a problem. We don't minify our Python code, even though it could save loads of space/bandwidth. I guess it's a philosophy thing for me - software should be open and hackable by default, especially in open source. But definitely document, as it could be really handy for some projects :)

@mattip
Copy link
Contributor

mattip commented May 2, 2020

This comment suggests using --strip-debug as a compromise between information and size.

@YannickJadoul
Copy link
Member Author

YannickJadoul commented May 3, 2020

For what it's worth, I've tried to compile with multiple settings, and run strip (with or without --strip-debug):

debug.so                       91M
debug_strip-debug.so           42M
debug_strip.so                 36M
minsizerel.so                  28M
minsizerel_strip.so            28M
release.so                     31M
release_strip.so               31M
relwithdebinfo.so              91M
relwithdebinfo_strip-debug.so  32M
relwithdebinfo_strip.so        29M

and if you zip them, as they will be in a wheel (ratios seems to stay approximately the same):

debug.zip                        28M
debug_strip-debug.zip            11M
debug_strip.zip                 9.2M
minsizerel.zip                  7.8M
minsizerel_strip.zip            7.8M
release.zip                     9.2M
release_strip.zip               9.2M
relwithdebinfo.zip               28M
relwithdebinfo_strip-debug.zip  9.0M
relwithdebinfo_strip.zip        8.5M

These are CMake build types, so:

  • Debug: -g
  • Release: -O3 -DNDEBUG
  • RelWithDebInfo: -g -O2 -DNDEBUG
  • MinSizeRel: -Os -DNDEBUG

I don't know how representative my project is (there's an enormous code base I'm wrapping that isn't mine but that's +- normal C/C++, but there's also lots of template instantiations coming from pybind11 that will result in long names, I suppose), but stripping symbols results in approximately a third of the size for builds with -g.

@chaitan94
Copy link

That's true, but I'm not sure it's that big of a problem. We don't minify our Python code, even though it could save loads of space/bandwidth. I guess it's a philosophy thing for me - software should be open and hackable by default, especially in open source. But definitely document, as it could be really handy for some projects :)

While what you say makes sense, I slightly disagree in this particular context. I agree with the part that software should be open and hackable by default, but built binaries - well, need not be. Considering the fact that most people would use cibuildwheels as the final step for releasing software, I would say stripping out the debug symbols by default would be a wiser choice (at least in my opinion). Or alternatively, there should be an easy option to configure it.

@joerick
Copy link
Contributor

joerick commented Jul 11, 2020

I guess... I agree that there's some debate to be had here. cibuildwheel doesn't have a position on this though - Python (via sysconfig) is setting some defaults for CFLAGS to enable this behaviour. I'm not sure I'd want to add code into cibuildwheel to override that - it could get confusing to users where it's coming from.

there should be an easy option to configure it

Module authors have full control over how their extensions compile through setup.py. If you want to strip these symbols, I believe you can do:

setup(
  ext_modules=[
    Extension('_foo', ['foo.c'], extra_compile_args=['-g0'])
  ],
)

Refs: https://docs.python.org/3/distutils/setupscript.html#other-options https://clang.llvm.org/docs/UsersManual.html#cmdoption-g0

If somebody can confirm the above syntax that'd be great! Or if there's a better way, let me know. Then we can add some documentation showing how best to do this.

@chaitan94
Copy link

I just tried out adding the -g0 on one of my projects, it worked out well. (My ~13MB wheels are now ~1.2MB, and when unzipped, that's a reduction from around 52MB to 2MB 😅). So I agree the user still has the full control. Maybe just to continue to debate though - one other point to consider would be that since cibuildwheel aims to be easy to use alternative to the more customizable multibuild - (I think) a lot of people who are maybe not well versed with C/C++ and setup.py (like me) might want to use it - and expect cibuildwheel to handle and apply the best practices for building wheels for them. So if not automating, at least some documentation which might guide them towards it might help. Again - just extending the argument on this perspective - probably you can make the right call.

@cher-nov
Copy link
Contributor

cher-nov commented Nov 2, 2020

It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports.

It doesn't make any sense if extension is linked against the release edition of Python's run-time library, does it?

@Czaki
Copy link
Contributor

Czaki commented Nov 2, 2020

It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports.

It doesn't make any sense if extension is linked against the release edition of Python's run-time library, does it?

Since python 3.8 there is the same ABI for debug and normal build:
https://docs.python.org/3/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build

@YannickJadoul
Copy link
Member Author

@cher-nov You still get full names of the functions and methods of the extension module in the strack traces, though. So often, to locate a bug, that's more than enough, and you don't need a debug build of Python itself.

@henryiii
Copy link
Contributor

henryiii commented Feb 1, 2021

If we had a "tutorial" page, this might somehow make it in, otherwise, it should just be an entry in FAQ? Pybind11's setup helpers add this by default. https://github.com/pybind/pybind11/blob/721834b422482a522abd4e83f11d545ef876f997/pybind11/setup_helpers.py#L146

@joerick
Copy link
Contributor

joerick commented Feb 3, 2021

Yes, an entry in the FAQ would be good, @henryiii. I guess it would be CIBW_ENVIRONMENT: CFLAGS=-g0?

@dvarrazzo
Copy link
Contributor

Hello,

FYI I'm working on this issue in Psycopg 3: psycopg/psycopg#142

I think cython libraries are big offenders, because the names generated are massive. Some stats our side:

Worst offender in psycopg 3.0.2 is the x86-64 Python 3.8 wheel package. Stripping our .so files the download size shrunk 33%:

$ ls -l */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
-rw-rw-r-- 1 piro piro 6340205 Nov  8 14:32 tmp/psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
-rw-rw-r-- 1 piro piro 4275873 Nov  8 18:45 tmpstrip/psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl

Unpacked footprint of all the libs installed shrunk 60%:

$ for f in */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl; do unzip -l $f; done | grep files
 28606803                     30 files
 11346257                     30 files

Footprint of the psycopg binaries alone shrunk 88%:

$ for f in */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl; do unzip -l $f; done | grep so$
 15737560  2021-11-08 14:32   psycopg_binary/_psycopg.cpython-38-x86_64-linux-gnu.so
  2978856  2021-11-08 14:32   psycopg_binary/pq.cpython-38-x86_64-linux-gnu.so
  1139416  2021-11-08 18:44   psycopg_binary/_psycopg.cpython-38-x86_64-linux-gnu.so
   316456  2021-11-08 18:44   psycopg_binary/pq.cpython-38-x86_64-linux-gnu.so

Running auditwheel repair --strip didn't work for us because it broke some of the system libraries, which then fail import with message ELF load command address/offset not properly aligned. However the system libraries seem already stripped and there wasn't relevant decrease in size (some of them actually increased...) So we are experimenting with a pre-repair script to strip only our .so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants