Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{Packaging} Optimize Linux package and docker image by removing py file #25801

Open
wants to merge 39 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
08062d2
Init
bebound Mar 13, 2023
0f175e0
Minor fix
bebound Mar 13, 2023
8e7d36f
Minor fix
bebound Mar 13, 2023
04adb00
Skip None
bebound Mar 14, 2023
4aef92d
Minor fix
bebound Mar 14, 2023
7410096
Move pyc in pip
bebound Mar 14, 2023
159c08b
Merge branch 'dev' into trim_package
bebound Mar 15, 2023
a07664b
Add use_pyc.py
bebound Mar 15, 2023
c66c260
Trim docker
bebound Mar 16, 2023
8861360
Use use_pyc on Windows
bebound Mar 16, 2023
c456088
Print folder size
bebound Mar 16, 2023
9645801
MInor fix
bebound Mar 16, 2023
293f252
Merge remote-tracking branch 'upstream/dev' into trim_package
bebound Mar 16, 2023
5efeab5
Remove azure.mgmt.network
bebound Mar 16, 2023
324ec66
Minor fix
bebound Mar 16, 2023
2051488
Minor fix
bebound Mar 16, 2023
7fb7902
Fix python version number
bebound Mar 16, 2023
9e0338c
Minor fix
bebound Mar 16, 2023
a1d6758
Merge remote-tracking branch 'upstream/dev' into trim_package
bebound Mar 17, 2023
39c5b2f
Skip pip
bebound Mar 17, 2023
eec7166
Minor fix
bebound Mar 20, 2023
a960bbf
Minor fix
bebound Mar 20, 2023
8229caf
Test
bebound Mar 20, 2023
3f9af6d
Fix import error
bebound Mar 20, 2023
eb0566d
Minor fix
bebound Mar 20, 2023
6d6cb07
Disable use_pyc in rpm
bebound Mar 20, 2023
a971e4e
Minor fix
bebound Mar 21, 2023
219679d
Skip pip folder on Windows
bebound Mar 21, 2023
25c4ffe
Merge remote-tracking branch 'upstream/dev' into trim_package
bebound Mar 21, 2023
e4dcccb
Minor fix
bebound Mar 21, 2023
27a9bab
Minor update
bebound Mar 28, 2023
8a059d3
Merge remote-tracking branch 'upstream/dev' into trim_package
bebound Mar 30, 2023
76e6239
Merge remote-tracking branch 'upstream/dev' into trim_package
bebound Apr 12, 2023
2676b09
Init
bebound Apr 18, 2023
ee56c59
Merge remote-tracking branch 'upstream/dev' into trim_package
bebound Apr 18, 2023
80506cf
Minor fix
bebound Apr 18, 2023
441d23e
Merge branch 'linux-remove-sdk' into trim_package
bebound Apr 19, 2023
fdeb09d
Merge branch 'dev' into trim_package
bebound Apr 19, 2023
c9c8c5d
Merge branch 'dev' into trim_package
bebound Apr 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,11 @@ COPY . /azure-cli

# 1. Build packages and store in tmp dir
# 2. Install the cli and the other command modules that weren't included
RUN ./scripts/install_full.sh \

# Python image has build-in env $PYTHON_VERSION=3.10.10.
# `ARG PYTHON_VERSION="3.10"` works on ARM64, but it can't override the default value on AMD64.
RUN ./scripts/install_full.sh && python ./scripts/trim_sdk.py \
&& python ./scripts/use_pyc.py /usr/local/lib/python${PYTHON_VERSION:0:4}/site-packages/ \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make the pyc change in a separate PR, just in case something goes wrong and we need to roll back.

&& cat /azure-cli/az.completion > ~/.bashrc \
&& runDeps="$( \
scanelf --needed --nobanner --recursive /usr/local \
Expand Down
32 changes: 2 additions & 30 deletions build_scripts/windows/scripts/build.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -139,36 +139,8 @@ copy %REPO_ROOT%\build_scripts\windows\resources\CLI_LICENSE.rtf %BUILDING_DIR%
copy %REPO_ROOT%\build_scripts\windows\resources\ThirdPartyNotices.txt %BUILDING_DIR%
copy %REPO_ROOT%\NOTICE.txt %BUILDING_DIR%

REM Remove .py and only deploy .pyc files
pushd %BUILDING_DIR%\Lib\site-packages
for /f %%f in ('dir /b /s *.pyc') do (
set PARENT_DIR=%%~df%%~pf..
echo !PARENT_DIR! | findstr /C:\Lib\site-packages\pip\ 1>nul
if !errorlevel! neq 0 (
REM Only take the file name without 'pyc' extension: e.g., (same below) __init__.cpython-310
set FILENAME=%%~nf
REM Truncate the '.cpython-310' postfix which is 12 chars long: __init__
REM https://stackoverflow.com/a/636391/2199657
set BASE_FILENAME=!FILENAME:~0,-12!
REM __init__.pyc
set pyc=!BASE_FILENAME!.pyc
REM Delete ..\__init__.py
del !PARENT_DIR!\!BASE_FILENAME!.py
REM Copy to ..\__init__.pyc
copy %%~f !PARENT_DIR!\!pyc! >nul
REM Delete __init__.pyc
del %%~f
) ELSE (
echo --SKIP !PARENT_DIR! under pip
)
)
popd

REM Remove __pycache__
echo remove pycache
for /d /r %BUILDING_DIR%\Lib\site-packages\pip %%d in (__pycache__) do (
if exist %%d rmdir /s /q "%%d"
)
REM replace .py with .pyc and remove __pycache__ dir to save space
%BUILDING_DIR%\python.exe %REPO_ROOT%\scripts\use_pyc.py %BUILDING_DIR%\Lib\site-packages\ %PYTHON_VERSION%
Copy link
Contributor Author

@bebound bebound Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use use_pyc.py to replace cmd scripts not only for readability, but also for bug fix.
invoke==1.2.0 has a weird file: invoke/completion/__pycache__/__init__.cpython-36.pyc, and cmd script saves it as __init_.pyc in final build.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be pretty hard to debug anything when there's no source code left?

Copy link
Contributor Author

@bebound bebound Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but I think most of the users never edit CLI source code for debug.
As evidence of this claim, we've removed the source code on Windows for years and no one complains about this.
But users do care about package size, we get lots of feedback in #7387.

For those who want to change source code, they need to install from pypi.


REM Remove dist-info
echo remove dist-info
Expand Down
5 changes: 5 additions & 0 deletions scripts/release/debian/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ export PATH=$PATH:$WORKDIR/python_env/bin

find ${WORKDIR}/src/ -name setup.py -type f | xargs -I {} dirname {} | grep -v azure-cli-testsdk | xargs pip3 install --no-deps
pip3 install -r ${WORKDIR}/src/azure-cli/requirements.py3.$(uname).txt
$WORKDIR/python_env/bin/python3 ${WORKDIR}/scripts/trim_sdk.py

# replace .py with .pyc and remove __pycache__ dir to save space
site_packages_dir=$(cd $WORKDIR/python_env/lib/python3.*/site-packages/ ; pwd)
$WORKDIR/python_env/bin/python3 ${WORKDIR}/scripts/use_pyc.py $site_packages_dir

# Create create directory for debian build
mkdir -p $WORKDIR/debian
Expand Down
12 changes: 7 additions & 5 deletions scripts/release/rpm/azure-cli.spec
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ source %{repo_path}/scripts/install_full.sh
# cffi 1.15.0 doesn't work with RPM: https://foss.heptapod.net/pypy/cffi/-/issues/513
%{python_cmd} -m pip install cffi==1.14.6

# Remove unused SDK version
%{python_cmd} %{repo_path}/scripts/trim_sdk.py

# replace .py with .pyc and remove __pycache__ dir to save space
site_packages_dir=$(cd %{buildroot}%{cli_lib_dir}/lib/python3.*/site-packages/ ; pwd)
%{python_cmd} %{repo_path}/scripts/use_pyc.py $site_packages_dir

deactivate

# Fix up %{buildroot} appearing in some files...
Expand All @@ -78,11 +85,6 @@ AZ_INSTALLER=RPM PYTHONPATH=\"\$bin_dir/../lib64/az/lib/${python_version}/site-p
" > %{buildroot}%{_bindir}/az
rm %{buildroot}%{cli_lib_dir}/bin/python* %{buildroot}%{cli_lib_dir}/bin/pip*

# Remove unused Network SDK API versions
pushd %{buildroot}%{cli_lib_dir}/lib/${python_version}/site-packages/azure/mgmt/network/ > /dev/null
rm -rf v2016_09_01 v2016_12_01 v2017_03_01 v2017_06_01 v2017_08_01 v2017_09_01 v2017_11_01 v2018_02_01 v2018_04_01 v2018_06_01 v2018_10_01 v2018_12_01 v2019_04_01 v2019_08_01 v2019_09_01 v2019_11_01 v2019_12_01 v2020_03_01
popd > /dev/null

# Set up tab completion
mkdir -p %{buildroot}%{_sysconfdir}/bash_completion.d/
cat %{repo_path}/az.completion > %{buildroot}%{_sysconfdir}/bash_completion.d/azure-cli
Expand Down
9 changes: 7 additions & 2 deletions scripts/trim_sdk.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ def remove_unused_api_versions(resource_type):
if resource_type in profile:
# value is str like '2022-01-01' or SDKProfile
value = profile[resource_type]
if value is None:
_LOGGER.info(f'{resource_type}\'s API version is None, skip')
return
if isinstance(value, str):
used_api_versions.add(value)
else:
Expand Down Expand Up @@ -108,7 +111,9 @@ def _print_folder_size(folder):


def _get_all_sdks_to_trim():
resource_types = [k for k, v in AZURE_API_PROFILES['latest'].items() if k.import_prefix.startswith('azure.mgmt')]
# azure.mgmt.network has been removed in https://github.com/Azure/azure-cli/pull/25451
resource_types = [k for k, v in AZURE_API_PROFILES['latest'].items() if k.import_prefix.startswith('azure.mgmt')
and k.import_prefix != 'azure.mgmt.network']
return resource_types


Expand Down Expand Up @@ -155,7 +160,7 @@ def main():
_print_folder_size(mgmt_sdk_dir)

# Removed unused API versions
resource_types = _get_biggest_sdks_to_trim()
resource_types = _get_all_sdks_to_trim()

for r in resource_types:
remove_unused_api_versions(r)
Expand Down
81 changes: 81 additions & 0 deletions scripts/use_pyc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------
import logging
import glob
import os
import platform
import re
import sys
from pathlib import Path
import shutil

_LOGGER = logging.getLogger(__name__)


def calculate_folder_size(start_path):
"""Calculate total size of a folder and file count."""
# https://stackoverflow.com/questions/1392413/calculating-a-directorys-size-using-python
total_size = 0
total_count = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
# skip if it is symbolic link
if not os.path.islink(fp):
total_count += 1
total_size += os.path.getsize(fp)

return total_size, total_count


def _print_folder_size(folder):
size, count = calculate_folder_size(folder)
size_in_mb = size / 1048576 # 1 MB = 1024 * 1024 B = 1048576 B
_LOGGER.info(f"{size_in_mb:.2f} MB, {count} files")


def main(folder, version=None):
_LOGGER.info(f'Replace .py with .pyc, base folder: {folder}')
_print_folder_size(folder)
if version is None:
version = re.search(r'python(\d\.\d+)', folder).group(1)
else:
# 3.10.10
version = '.'.join(version.split('.')[:2])
# invoke==1.2.0 has a weird file: invoke/completion/__pycache__/__init__.cpython-36.pyc
# define pyc suffix to skip it
pyc_suffix = f'cpython-{version.replace(".", "")}.pyc'
_LOGGER.info(f'pyc suffix: {pyc_suffix}')
for file in glob.glob(f'{folder}/**/__pycache__/*{pyc_suffix}', recursive=True):
# If pip's py files are also removed, the error is raised when installing some packages.
# See https://github.com/Azure/azure-cli/pull/25801 for details.
if 'site-packages/pip' in file:
continue

# file is /opt/az/lib/python3.10/site-packages/websocket/__pycache__/_app.cpython-310.pyc
# py_filename is _app.py
py_filename = Path(file).name[:-len(pyc_suffix)] + 'py'
# py_path is /opt/az/lib/python3.10/site-packages/websocket/_app.py
py_path = Path(file).parent.parent / py_filename
if py_path.exists():
py_path.unlink()
shutil.move(file, py_path.with_suffix('.pyc'))

for f in glob.glob(f'{folder}/**/__pycache__', recursive=True):
# Remove pip __pycache__ folder for Windows only to save more space
if 'site-packages/pip' in f and not platform.system() == 'Windows':
Copy link
Contributor Author

@bebound bebound Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing py file has side effects.
For example: /opt/az/bin/python3 -m pip install --no-cache pgcli==3.4.1

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 2
╰─> [1 lines of output]
    /opt/az/bin/python3: can't open file '/opt/az/lib/python3.10/site-packages/pip/__pip-runner__.py': [Errno 2] No such file or directory
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

The MSI also keeps py files under pip folder.

continue
shutil.rmtree(f)

_LOGGER.info('Finish processing')
_print_folder_size(folder)


if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
if len(sys.argv) == 2:
main(sys.argv[1])
else:
main(sys.argv[1], sys.argv[2])
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@
AbstractPreparer,
SingleValueReplacer)
from azure.core.exceptions import HttpResponseError
from ..._client_factory import cf_mysql_flexible_private_dns_zone_suffix_operations, cf_postgres_flexible_private_dns_zone_suffix_operations
from ...flexible_server_virtual_network import prepare_private_network, prepare_private_dns_zone, prepare_public_network, DEFAULT_VNET_ADDRESS_PREFIX, DEFAULT_SUBNET_ADDRESS_PREFIX
from ...flexible_server_custom_postgres import DbContext as PostgresDbContext
from ...flexible_server_custom_mysql import DbContext as MysqlDbContext
from ...flexible_server_custom_mysql import _determine_iops
from ..._flexible_server_util import get_mysql_list_skus_info
from ..._util import retryable_method
from azure.cli.command_modules.rdbms._client_factory import cf_mysql_flexible_private_dns_zone_suffix_operations, cf_postgres_flexible_private_dns_zone_suffix_operations
from azure.cli.command_modules.rdbms.flexible_server_virtual_network import prepare_private_network, prepare_private_dns_zone, prepare_public_network, DEFAULT_VNET_ADDRESS_PREFIX, DEFAULT_SUBNET_ADDRESS_PREFIX
from azure.cli.command_modules.rdbms.flexible_server_custom_postgres import DbContext as PostgresDbContext
from azure.cli.command_modules.rdbms.flexible_server_custom_mysql import DbContext as MysqlDbContext
from azure.cli.command_modules.rdbms.flexible_server_custom_mysql import _determine_iops
from azure.cli.command_modules.rdbms._flexible_server_util import get_mysql_list_skus_info
from azure.cli.command_modules.rdbms._util import retryable_method
Copy link
Contributor Author

@bebound bebound Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When import statement is from ..._util import retryable_method, this error is raised when run pytest: /opt/az/bin/python3 -m pytest -x -v --forked -p no:warnings --log-level=WARN -n auto --junit-xml /azure_cli_test_result/rdbms.xml --pyargs azure.cli.command_modules.rdbms

________ ERROR collecting tests/latest/test_rdbms_flexible_commands.py _________
ImportError while importing test module '/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/rdbms/tests/latest/test_rdbms_flexible_commands.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/az/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/rdbms/tests/latest/test_rdbms_flexible_commands.py:36: in <module>
    from ..._client_factory import cf_mysql_flexible_private_dns_zone_suffix_operations, cf_postgres_flexible_private_dns_zone_suffix_operations
E   ImportError: attempted relative import beyond top-level package

The __name__ is tests.latest.test_rdbms_flexible_commands when run the above command, so ... is beyond top-level package.
The __name__ is azure.cli.command_modules.rdbms.tests.latest.test_rdbms_flexible_commands when run azdev test rdbms and it works.
The __name__ is command_modules.rdbms.tests.latest.test_rdbms_flexible_commands when py files are kept.

According to pytest import mechanisms and sys.path/PYTHONPATH, pytest searches for __init__.py to find root dir. If py files are kept, it prepends /opt/az/lib/python3.10/site-packages/azure/cli into sys.path.
Otherwise, it can't find __init__.py and prepends /opt/az/lib/python3.10/site-packages/azure/cli/command_modules/rdbms' and the error occurs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this pytest command run/where does it fail? If src/azure-cli/azure/cli/command_modules/rdbms/tests is packaged as a separate package, ... would certainly go beyond the package hierarchy, so using ... is wrong anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails at TestDebPackages Task.

pytest_base_cmd = '/opt/az/bin/python3 -m pytest -x -v --forked -p no:warnings --log-level=WARN'
pytest_parallel_cmd = '{} -n auto'.format(pytest_base_cmd)
serial_test_modules = ['botservice', 'network', 'cloud', 'appservice']
for mod_name in mod_list:
cmd = '{} --junit-xml /azure_cli_test_result/{}.xml --pyargs azure.cli.command_modules.{}'.format(
pytest_base_cmd if mod_name in serial_test_modules else pytest_parallel_cmd, mod_name, mod_name)
print('Running:', cmd, flush=True)
exit_code = subprocess.call(cmd, shell=True)

Yes, we should not use....

# Constants
SERVER_NAME_PREFIX = 'azuredbclitest-'
SERVER_NAME_MAX_LENGTH = 20
Expand Down