You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When building pandas from source and running the command strip, the resulting folder is 54% lighter than when using the manylinux wheel (via pip).
[...] the strip program removes inessential information from executable binary programs and object files, thus potentially resulting in better performance and sometimes significantly less disk space usage https://en.wikipedia.org/wiki/Strip_(Unix)
This is probably harmless on most systems, yet it is quite important in size-constrained systems (such as AWS Lambda).
Some developers have resorted to distributing their own stripped binaries (e.g. lambda packages for the serverless framework zappa), but it seems like a makeshift solution.
I think the problem should be solved upstream, as each library should be responsible for packaging their own optimized binaries.
An issue has also been opened regarding numpy, as it is a dependency of pandas and they both end up using a lot of unneccesary disk space.
Code Sample
Problem description
When building pandas from source and running the command strip, the resulting folder is 54% lighter than when using the manylinux wheel (via pip).
This is probably harmless on most systems, yet it is quite important in size-constrained systems (such as AWS Lambda).
Some developers have resorted to distributing their own stripped binaries (e.g. lambda packages for the serverless framework zappa), but it seems like a makeshift solution.
I think the problem should be solved upstream, as each library should be responsible for packaging their own optimized binaries.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.87-linuxkit-aufs
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
How to replicate
System specifications
Every command below has been executed using the
amazonlinux
docker image.https://hub.docker.com/_/amazonlinux/
Pandas version:
0.22.0
Python version:
3.6.2
Prepare docker image
Install wheel & measure package size
--> 111 MB
Try strip:
--> 52 MB
Almost 50% of the binary size can be stripped.
Build from source & measure package size
--> 107 MB
Try strip:
--> 51 MB
The text was updated successfully, but these errors were encountered: