Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating ROCm whl upon release #259

Merged
merged 9 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 26 additions & 40 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
release:
# Retrieve tag and create release
name: Create Release
runs-on: ubuntu-latest
runs-on: self-hosted
container:
image: rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0
outputs:
upload_url: ${{ steps.create_release.outputs.upload_url }}
steps:
Expand All @@ -41,57 +43,39 @@

wheel:
name: Build Wheel
runs-on: ${{ matrix.os }}
runs-on: self-hosted
container:
image: rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0
needs: release

strategy:
fail-fast: false
matrix:
os: ['ubuntu-20.04']
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
pytorch-version: ['2.4.0'] # Must be the most recent version that meets requirements-cuda.txt.
cuda-version: ['11.8', '12.1']

steps:
- name: Checkout
uses: actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871 # v4.2.1

- name: Setup ccache
uses: hendrikmuhs/ccache-action@ed74d11c0b343532753ecead8a951bb09bb34bc9 # v1.2.14
with:
create-symlink: true
key: ${{ github.job }}-${{ matrix.python-version }}-${{ matrix.cuda-version }}

- name: Set up Linux Env
if: ${{ runner.os == 'Linux' }}
run: |
bash -x .github/workflows/scripts/env.sh

- name: Set up Python
uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3 # v5.2.0
with:
python-version: ${{ matrix.python-version }}

- name: Install CUDA ${{ matrix.cuda-version }}
- name: Prepare
run: |
bash -x .github/workflows/scripts/cuda-install.sh ${{ matrix.cuda-version }} ${{ matrix.os }}
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
pip3 install -U triton

- name: Install PyTorch ${{ matrix.pytorch-version }} with CUDA ${{ matrix.cuda-version }}
run: |
bash -x .github/workflows/scripts/pytorch-install.sh ${{ matrix.python-version }} ${{ matrix.pytorch-version }} ${{ matrix.cuda-version }}
- name: Checkout
uses: actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871 # v4.2.1

- name: Build wheel
shell: bash
env:
CMAKE_BUILD_TYPE: Release # do not compile with debug symbol to reduce wheel size
run: |

Check failure on line 67 in .github/workflows/publish.yml

View workflow job for this annotation

GitHub Actions / actionlint

shellcheck reported issue in this script: SC2129:style:6:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects
bash -x .github/workflows/scripts/build.sh ${{ matrix.python-version }} ${{ matrix.cuda-version }}
bash -x .github/workflows/scripts/build.sh
wheel_name=$(find dist -name "*whl" -print0 | xargs -0 -n 1 basename)
asset_name=${wheel_name//"linux"/"manylinux1"}
gradlib_wheel_name=$(find gradlib/dist -name "*whl" -print0 | xargs -0 -n 1 basename)
gradlib_asset_name=${gradlib_wheel_name//"linux"/"manylinux1"}
echo "wheel_name=${wheel_name}" >> "$GITHUB_ENV"
echo "asset_name=${asset_name}" >> "$GITHUB_ENV"
echo "gradlib_wheel_name=${gradlib_wheel_name}" >> "$GITHUB_ENV"
echo "gradlib_asset_name=${gradlib_asset_name}" >> "$GITHUB_ENV"

- name: Upload Release Asset
- name: Upload vllm Release Asset
uses: actions/upload-release-asset@e8f9f06c4b078e705bd2ea027f0926603fc9b4d5 # v1.0.2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand All @@ -100,11 +84,13 @@
asset_path: ./dist/${{ env.wheel_name }}
asset_name: ${{ env.asset_name }}
asset_content_type: application/*
- name: Upload gradlib Release Asset
uses: actions/upload-release-asset@e8f9f06c4b078e705bd2ea027f0926603fc9b4d5 # v1.0.2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.release.outputs.upload_url }}
asset_path: ./gradlib/dist/${{ env.gradlib_wheel_name }}
asset_name: ${{ env.gradlib_asset_name }}
asset_content_type: application/*

# (Danielkinz): This last step will publish the .whl to pypi. Warning: untested
# - name: Publish package
# uses: pypa/gh-action-pypi-publish@release/v1.8
# with:
# repository-url: https://test.pypi.org/legacy/
# password: ${{ secrets.PYPI_API_TOKEN }}
# skip-existing: true
18 changes: 9 additions & 9 deletions .github/workflows/scripts/build.sh
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
#!/bin/bash
set -eux

python_executable=python$1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely "python_executable=python$1" is a cryptic way to interface the desirable python flavor (not ideal), but why we limit the script to python3 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other options are there in our ROCm release docker (or anywhere)?

Copy link

@Alexei-V-Ivanov-AMD Alexei-V-Ivanov-AMD Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python2 ? If somebody decides to use python2 at some point (for whatever reason) this code will break.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will not be within the top 1000 things that will break :)

cuda_home=/usr/local/cuda-$2
python_executable=python3

# Update paths
PATH=${cuda_home}/bin:$PATH
LD_LIBRARY_PATH=${cuda_home}/lib64:$LD_LIBRARY_PATH

# Install requirements
$python_executable -m pip install -r requirements-build.txt -r requirements-cuda.txt
$python_executable -m pip install -r requirements-rocm.txt

# Limit the number of parallel jobs to avoid OOM
export MAX_JOBS=1
# Make sure release wheels are built for the following architectures
export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
export VLLM_FA_CMAKE_GPU_ARCHES="80-real;90-real"
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"

rm -f $(which sccache)

bash tools/check_repo.sh
export MAX_JOBS=32

# Build
$python_executable setup.py bdist_wheel --dist-dir=dist
cd gradlib
$python_executable setup.py bdist_wheel --dist-dir=dist
cd ..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it perfect, doesn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see if whls will get published this time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the last time I approved without seeing functionality proof.

Loading