Skip to content

Commit

Permalink
Remove dnf update from docker build scripts (microsoft#17551)
Browse files Browse the repository at this point in the history
### Description
1. Remove 'dnf update' from docker build scripts, because it upgrades TRT
packages from CUDA 11.x to CUDA 12.x.
To reproduce it, you can run the following commands in a CentOS CUDA
11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8.
```
export v=8.6.1.6-1.cuda11.8
dnf  install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v} 
dnf update -y
```
The last command will generate the following outputs:
```
========================================================================================================================
 Package                                     Architecture       Version                          Repository        Size
========================================================================================================================
Upgrading:
 libnvinfer-devel                            x86_64             8.6.1.6-1.cuda12.0               cuda             542 M
 libnvinfer-headers-devel                    x86_64             8.6.1.6-1.cuda12.0               cuda             118 k
 libnvinfer-headers-plugin-devel             x86_64             8.6.1.6-1.cuda12.0               cuda              14 k
 libnvinfer-plugin-devel                     x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-plugin8                          x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-vc-plugin-devel                  x86_64             8.6.1.6-1.cuda12.0               cuda             107 k
 libnvinfer-vc-plugin8                       x86_64             8.6.1.6-1.cuda12.0               cuda             251 k
 libnvinfer8                                 x86_64             8.6.1.6-1.cuda12.0               cuda             543 M
 libnvonnxparsers-devel                      x86_64             8.6.1.6-1.cuda12.0               cuda             467 k
 libnvonnxparsers8                           x86_64             8.6.1.6-1.cuda12.0               cuda             757 k
 libnvparsers-devel                          x86_64             8.6.1.6-1.cuda12.0               cuda             2.0 M
 libnvparsers8                               x86_64             8.6.1.6-1.cuda12.0               cuda             854 k
Installing dependencies:
 cuda-toolkit-12-0-config-common             noarch             12.0.146-1                       cuda             7.7 k
 cuda-toolkit-12-config-common               noarch             12.2.140-1                       cuda             7.9 k
 libcublas-12-0                              x86_64             12.0.2.224-1                     cuda             361 M
 libcublas-devel-12-0                        x86_64             12.0.2.224-1                     cuda             397 M

Transaction Summary
========================================================================================================================

```
As you can see from the output,  they are CUDA 12 packages. 

The problem can also be solved by lock the packages' versions by using
"dnf versionlock" command right after installing the CUDA/TRT packages.
However, going forward, to get the better reproducibility, I suggest
manually fix dnf package versions in the installation scripts like we do
for TRT now.

```bash
v="8.6.1.6-1.cuda11.8" &&\
    yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
    yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\
        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v}
```
When we have a need to upgrade a package due to security alert or some
other reasons, we manually change the version string instead of relying
on "dnf update". Though this approach increases efforts, it can make our
pipeines more stable.

2. Move python test to docker
### Motivation and Context
Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x
and the result package is totally not usable(crashes every time)
  • Loading branch information
snnn authored and kleiti committed Mar 22, 2024
1 parent fef6dac commit ea95e98
Show file tree
Hide file tree
Showing 32 changed files with 351 additions and 244 deletions.
7 changes: 5 additions & 2 deletions tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,11 @@ stages:
- stage: arm64_test
dependsOn: ['arm64_build']
jobs:
- template: templates/py-packaging-linux-test.yml
- template: templates/py-packaging-linux-test-cpu.yml
parameters:
arch: 'aarch64'
machine_pool: 'onnxruntime-linux-ARM64-CPU-2019'
device: 'CPU'
base_image: 'arm64v8/almalinux:8'
devtoolset_rootpath: /opt/rh/gcc-toolset-12/root
ld_library_path_arg: /opt/rh/gcc-toolset-12/root/usr/lib64:/opt/rh/gcc-toolset-12/root/usr/lib:/opt/rh/gcc-toolset-12/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-12/root/usr/lib/dyninst:/usr/local/lib64
prepend_path: '/opt/rh/gcc-toolset-12/root/usr/bin:'
37 changes: 19 additions & 18 deletions tools/ci_build/github/azure-pipelines/py-package-test-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,38 @@ resources:
- pipeline: build
source: 'Python packaging pipeline'
trigger: true
branch: main # branch to pick the artifact, Used only for manual triggered pipeline runs for testing the pipeline itself
#TODO: Remove the following dependency. Running python tests should not need to use manylinux.
repositories:
- repository: manylinux # The name used to reference this repository in the checkout step
type: Github
endpoint: Microsoft
name: pypa/manylinux
ref: 5eda9aded5462201e6310105728d33016e637ea7

stages:
- stage: Linux_Test_CPU_x86_64_stage
jobs:
- template: templates/py-packaging-linux-test.yml
- template: templates/py-packaging-linux-test-cpu.yml
parameters:
arch: 'x86_64'
machine_pool: 'onnxruntime-Ubuntu2004-AMD-CPU'
device: 'CPU'
base_image: 'registry.access.redhat.com/ubi8/ubi'
devtoolset_rootpath: /opt/rh/gcc-toolset-12/root
ld_library_path_arg: /opt/rh/gcc-toolset-12/root/usr/lib64:/opt/rh/gcc-toolset-12/root/usr/lib:/opt/rh/gcc-toolset-12/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-12/root/usr/lib/dyninst:/usr/local/lib64
prepend_path: '/opt/rh/gcc-toolset-12/root/usr/bin:'

- stage: Linux_Test_CPU_aarch64_stage
dependsOn: []
jobs:
- template: templates/py-packaging-linux-test.yml
- template: templates/py-packaging-linux-test-cpu.yml
parameters:
arch: 'aarch64'
machine_pool: 'aiinfra-linux-ARM64-CPU-2019'
device: 'CPU'
base_image: 'arm64v8/almalinux:8'
devtoolset_rootpath: /opt/rh/gcc-toolset-12/root
ld_library_path_arg: /opt/rh/gcc-toolset-12/root/usr/lib64:/opt/rh/gcc-toolset-12/root/usr/lib:/opt/rh/gcc-toolset-12/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-12/root/usr/lib/dyninst:/usr/local/lib64
prepend_path: '/opt/rh/gcc-toolset-12/root/usr/bin:'

- stage: Packages_Somking_Test
dependsOn: []
Expand All @@ -31,19 +45,6 @@ stages:
machine_pool:
vmImage: 'macOS-13'
itemPattern: '*/*mac*x86_64.whl'
- template: templates/py-package-smoking-test.yml
parameters:
job_name: Test_WIN_64_Wheels
itemPattern: '*/*win_amd64.whl'
machine_pool:
vmImage: 'windows-2022'
- template: templates/py-package-smoking-test.yml
parameters:
job_name: Test_WIN_32_Wheels
itemPattern: '*/*win32.whl'
python_arch: 'x86'
machine_pool:
vmImage: 'windows-2022'
- template: templates/py-package-smoking-test.yml
parameters:
job_name: Test_LINUX_x86_64_Wheels
Expand All @@ -61,7 +62,7 @@ stages:
- Linux_Test_CPU_aarch64_stage
- Packages_Somking_Test
jobs:
- template: templates/py-packaging-linux-test.yml
- template: templates/py-packaging-linux-test-cuda.yml
parameters:
arch: 'x86_64'
machine_pool: 'Onnxruntime-Linux-GPU'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:
script: |
mkdir -p $HOME/.onnx
docker run --rm -e CFLAGS="${{parameters.OnnxruntimeCFlags}}" -e CXXFLAGS="${{parameters.OnnxruntimeCXXFlags}}" --volume /data/onnx:/data/onnx:ro --volume $(Build.SourcesDirectory):/onnxruntime_src --volume $(Build.BinariesDirectory):/build \
--volume $HOME/.onnx:/home/onnxruntimedev/.onnx -e NIGHTLY_BUILD onnxruntimecpubuildcentos8${{parameters.OnnxruntimeArch}} /bin/bash -c "python3 \
--volume $HOME/.onnx:/home/onnxruntimedev/.onnx -e NIGHTLY_BUILD onnxruntimecpubuildcentos8${{parameters.OnnxruntimeArch}} /bin/bash -c "python3.9 \
/onnxruntime_src/tools/ci_build/build.py --build_java --build_nodejs --build_dir /build --config Release \
--skip_submodule_sync --parallel --build_shared_lib ${{ parameters.AdditionalBuildFlags }} && cd /build/Release && make install DESTDIR=/build/linux-${{parameters.OnnxruntimeArch}}"
workingDirectory: $(Build.SourcesDirectory)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,36 +39,22 @@ jobs:
versionSpec: $(PythonVersion)
architecture: ${{ parameters.python_arch }}

- task: DownloadPipelineArtifact@2
displayName: 'Download Pipeline Artifact'
inputs:
artifactName: 'onnxruntime'
targetPath: '$(Build.BinariesDirectory)/whl'
itemPattern: ${{parameters.itemPattern}}
# The public ADO project
${{ if eq(variables['System.CollectionId'], 'f3ad12f2-e480-4533-baf2-635c95467d29') }}:
buildType: current
# The private ADO project
${{ if eq(variables['System.CollectionId'], 'bc038106-a83b-4dab-9dd3-5a41bc58f34c') }}:
project: '530acbc4-21bc-487d-8cd8-348ff451d2ff'
definition: 841
preferTriggeringPipeline: true
runVersion: 'latest'
buildType: specific
- download: build # pipeline resource identifier.
artifact: 'onnxruntime'

- task: Bash@3
inputs:
targetType: 'inline'
script: |
set -ex
files=(whl/*.whl)
files=(*.whl)
FILE_NAME="${files[0]}"
FILE_NAME=$(basename $FILE_NAME)
PYTHON_PACKAGE_NAME=$(echo "$FILE_NAME" | cut -f 1 -d '-')
python3 -m pip install --find-links "$(Build.BinariesDirectory)/whl" $PYTHON_PACKAGE_NAME
pip show $PYTHON_PACKAGE_NAME
python -c "import onnxruntime as ort; print(ort.__version__)"
workingDirectory: $(Build.BinariesDirectory)
python3 -m pip install --find-links "$(Pipeline.Workspace)/build/onnxruntime" $PYTHON_PACKAGE_NAME
python3 -m pip show $PYTHON_PACKAGE_NAME
python3 -c "import onnxruntime as ort; print(ort.__version__)"
workingDirectory: $(Pipeline.Workspace)/build/onnxruntime
displayName: Test Package Installation

- task: mspremier.PostBuildCleanup.PostBuildCleanup-task.PostBuildCleanup@3
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
parameters:
- name: arch
type: string

- name: base_image
type: string

- name: devtoolset_rootpath
type: string

- name: ld_library_path_arg
type: string

- name: prepend_path
type: string

- name: machine_pool
type: string

- name: extra_job_id
type: string
default: ''

- name: python_wheel_suffix
type: string
default: ''


# TODO: Ideally it should fetch information from the build that triggers it
- name: cmake_build_type
type: string
default: 'Release'
values:
- Debug
- Release
- RelWithDebInfo
- MinSizeRel

- name: timeout
type: number
default: 120

jobs:
- job: Linux_Test_CPU${{ parameters.extra_job_id }}_${{ parameters.arch }}
timeoutInMinutes: ${{ parameters.timeout }}
variables:
skipComponentGovernanceDetection: true
workspace:
clean: all
pool: ${{ parameters.machine_pool }}
steps:
- checkout: self
clean: true
submodules: none
# The public ADO project
- ${{ if eq(variables['System.CollectionId'], 'f3ad12f2-e480-4533-baf2-635c95467d29') }}:
- download: current # pipeline resource identifier.
artifact: 'drop-linux-cpu-${{ parameters.arch }}'

- download: current # pipeline resource identifier.
artifact: 'onnxruntime${{ parameters.python_wheel_suffix }}'

- bash: |
set -e -x
mv "$(Pipeline.Workspace)/drop-linux-cpu-${{ parameters.arch }}" $(Build.BinariesDirectory)/${{parameters.cmake_build_type}}
mv "$(Pipeline.Workspace)/onnxruntime${{ parameters.python_wheel_suffix }}" "$(Build.BinariesDirectory)/whl"
cp -r "$(Build.BinariesDirectory)/whl" $(Build.BinariesDirectory)/tmp
find "$(Build.BinariesDirectory)/tmp" -name '*.whl' -exec bash -c 'unzip -d "${1%.*}" "$1"' _ {} \;
# The private ADO project
- ${{ if eq(variables['System.CollectionId'], 'bc038106-a83b-4dab-9dd3-5a41bc58f34c') }}:
- download: build # pipeline resource identifier.
artifact: 'drop-linux-cpu-${{ parameters.arch }}'

- download: build # pipeline resource identifier.
artifact: 'onnxruntime${{ parameters.python_wheel_suffix }}'

- bash: |
set -e -x
ls $(Pipeline.Workspace)/build
mv "$(Pipeline.Workspace)/build/drop-linux-cpu-${{ parameters.arch }}" $(Build.BinariesDirectory)/${{parameters.cmake_build_type}}
mv "$(Pipeline.Workspace)/build/onnxruntime${{ parameters.python_wheel_suffix }}" "$(Build.BinariesDirectory)/whl"
cp -r "$(Build.BinariesDirectory)/whl" $(Build.BinariesDirectory)/tmp
find "$(Build.BinariesDirectory)/tmp" -name '*.whl' -exec bash -c 'unzip -d "${1%.*}" "$1"' _ {} \;
# The BinSkim task uses a dotnet program which doesn't support ARM CPUs yet
- ${{ if eq(parameters.arch, 'x86_64') }}:
- task: BinSkim@4
displayName: 'Run BinSkim'
inputs:
AnalyzeTargetGlob: '$(Build.BinariesDirectory)/tmp/**/*.so'
continueOnError: true

#- task: PostAnalysis@2
# inputs:
# GdnBreakAllTools: true
# GdnBreakPolicy: M365
# GdnBreakPolicyMinSev: Error

- template: get-docker-image-steps.yml
parameters:
Dockerfile: tools/ci_build/github/linux/docker/inference/x64/python/cpu/Dockerfile.manylinux2_28_cpu
Context: tools/ci_build/github/linux/docker/inference/x64/python/cpu
DockerBuildArgs: "--build-arg POLICY=manylinux_2_28 --build-arg BUILD_UID=$( id -u ) --build-arg BASEIMAGE=${{ parameters.base_image }} --build-arg PLATFORM=${{ parameters.arch }} --build-arg PREPEND_PATH=${{ parameters.prepend_path }} --build-arg LD_LIBRARY_PATH_ARG=${{ parameters.ld_library_path_arg }} --build-arg DEVTOOLSET_ROOTPATH=${{ parameters.devtoolset_rootpath }}"
Repository: onnxruntimecpubuildpython${{ parameters.arch }}
${{ if eq(parameters.arch, 'aarch64') }}:
UpdateDepsTxt: false

- task: Bash@3
displayName: 'Bash Script'
inputs:
targetType: filePath
filePath: tools/ci_build/github/linux/run_python_dockertest.sh
arguments: -d CPU -c ${{parameters.cmake_build_type}} -i onnxruntimecpubuildpython${{ parameters.arch }}

- task: mspremier.PostBuildCleanup.PostBuildCleanup-task.PostBuildCleanup@3
displayName: 'Clean Agent Directories'
condition: always()
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
parameters:
- name: arch
type: string

- name: device
type: string
values:
- CPU
- GPU

- name: machine_pool
type: string

- name: extra_job_id
type: string
default: ''

- name: python_wheel_suffix
type: string
default: ''


# TODO: Ideally it should fetch information from the build that triggers it
- name: cmake_build_type
type: string
default: 'Release'
values:
- Debug
- Release
- RelWithDebInfo
- MinSizeRel

- name: timeout
type: number
default: 120

jobs:
- job: Linux_Test_GPU${{ parameters.extra_job_id }}_${{ parameters.arch }}
timeoutInMinutes: ${{ parameters.timeout }}
variables:
skipComponentGovernanceDetection: true
workspace:
clean: all
pool: ${{ parameters.machine_pool }}
steps:
- checkout: self
clean: true
submodules: none
# The public ADO project
# - ${{ if eq(variables['System.CollectionId'], 'f3ad12f2-e480-4533-baf2-635c95467d29') }}:

# The private ADO project
- ${{ if eq(variables['System.CollectionId'], 'bc038106-a83b-4dab-9dd3-5a41bc58f34c') }}:
- download: build # pipeline resource identifier.
artifact: 'drop-linux-gpu-${{ parameters.arch }}'

- download: build # pipeline resource identifier.
artifact: 'onnxruntime${{ parameters.python_wheel_suffix }}'

- bash: |
set -e -x
ls $(Pipeline.Workspace)/build
mv "$(Pipeline.Workspace)/build/drop-linux-gpu-${{ parameters.arch }}" $(Build.BinariesDirectory)/${{parameters.cmake_build_type}}
mv "$(Pipeline.Workspace)/build/onnxruntime${{ parameters.python_wheel_suffix }}" "$(Build.BinariesDirectory)/whl"
cp -r "$(Build.BinariesDirectory)/whl" $(Build.BinariesDirectory)/tmp
find "$(Build.BinariesDirectory)/tmp" -name '*.whl' -exec bash -c 'unzip -d "${1%.*}" "$1"' _ {} \;
# The BinSkim task uses a dotnet program which doesn't support ARM CPUs yet
- ${{ if eq(parameters.arch, 'x86_64') }}:
- task: BinSkim@4
displayName: 'Run BinSkim'
inputs:
AnalyzeTargetGlob: '$(Build.BinariesDirectory)/tmp/**/*.so'
continueOnError: true

#- task: PostAnalysis@2
# inputs:
# GdnBreakAllTools: true
# GdnBreakPolicy: M365
# GdnBreakPolicyMinSev: Error

- template: get-docker-image-steps.yml
parameters:
Dockerfile: tools/ci_build/github/linux/docker/Dockerfile.manylinux2_28_cuda11_8_tensorrt8_6
Context: tools/ci_build/github/linux/docker
DockerBuildArgs: "--network=host --build-arg POLICY=manylinux_2_28 --build-arg PLATFORM=x86_64 --build-arg PREPEND_PATH=/usr/local/cuda/bin --build-arg LD_LIBRARY_PATH_ARG=/usr/local/lib64 --build-arg DEVTOOLSET_ROOTPATH=/usr --build-arg BUILD_UID=$( id -u ) --build-arg PLATFORM=${{ parameters.arch }}"
Repository: onnxruntimecuda118xtrt86build${{ parameters.arch }}

- task: Bash@3
displayName: 'Bash Script'
inputs:
targetType: filePath
filePath: tools/ci_build/github/linux/run_python_dockertest.sh
arguments: -d GPU -c ${{parameters.cmake_build_type}} -i onnxruntimecuda118xtrt86build${{ parameters.arch }}

- task: mspremier.PostBuildCleanup.PostBuildCleanup-task.PostBuildCleanup@3
displayName: 'Clean Agent Directories'
condition: always()
Loading

0 comments on commit ea95e98

Please sign in to comment.