Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aarch64] build pytorch wheel with mkldnn and acl backend #1104

Merged
merged 1 commit into from
Oct 25, 2022

Conversation

snadampal
Copy link
Contributor

No description provided.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you split this PR into few smaller ones and provide a more detailed description about some of the proposed changes.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments and please provide a test plan for the PR. (and for example share a link on the wheel build by this script)
I.e. one of the challenges I had in the past, it to build a wheel that works on both Ubuntu and RedHat OSes. Perhaps it would be good to add a test step that could be invoked for wheels already available on calleers local machine)

Comment on lines 410 to 418
# Install and switch to gcc-8 on Ubuntu-18.04
if not host.using_docker() and host.ami == ubuntu18_04_ami and compiler == 'gcc-8':
host.run_cmd("sudo apt-get install -y g++-8 gfortran-8")
host.run_cmd("sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100")
host.run_cmd("sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100")
host.run_cmd("sudo update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-8 100")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but why are you deleting this codepath? Please provide an explanation, why supporting ubuntu-18.04 is wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @malfet, thanks for the review! We need gcc-10 for Arm Compute Library multi-arch support, that's I replaced gcc-8 with gcc-10. But as you suggested, I will split this into incremental commits with more context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think increasing min supported version to gcc-10 requires some more indepth discussion. I.e. is it compatible with aarch64 manylinux2014 requirements as outlined in https://peps.python.org/pep-0599/

@@ -602,7 +608,7 @@ def parse_arguments():
parser.add_argument("--debug", action="store_true")
parser.add_argument("--build-only", action="store_true")
parser.add_argument("--test-only", type=str)
parser.add_argument("--os", type=str, choices=list(os_amis.keys()), default='ubuntu18_04')
parser.add_argument("--os", type=str, choices=list(os_amis.keys()), default='ubuntu20_04')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does default needs to be changed from 18.04 to 20.04?

@snadampal snadampal force-pushed the pt_acl branch 2 times, most recently from c1447f3 to e2bd2f3 Compare September 28, 2022 05:33
@snadampal
Copy link
Contributor Author

@malfet ,updated the pr with multiple commits addressing the latest tools required for pytorch build and enabling mkldnn+acl backend.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LD_PRELOAD the libtorch_cpu.so to work around the 'cannot allocate memory in static TLS block' issue is not a valid workaround, and huge usability regression from the previous iteration

@@ -18,11 +18,10 @@

# AMI images for us-east-1, change the following based on your ~/.aws/config
os_amis = {
'ubuntu18_04': "ami-0f2b111fdc1647918", # login_name: ubuntu
'ubuntu20_04': "ami-0ea142bd244023692", # login_name: ubuntu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why are you removing ubutnu18-04

Suggested change
'ubuntu20_04': "ami-0ea142bd244023692", # login_name: ubuntu
'ubuntu18_04': "ami-0f2b111fdc1647918", # login_name: ubuntu
'ubuntu20_04': "ami-0ea142bd244023692", # login_name: ubuntu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @malfet, even for the manylinux2014 docker based builds, we are still dependent on the host OS for gfortran workaround in the script at:
https://github.com/pytorch/builder/blob/main/build_aarch64_wheel.py#L444

and to support ubuntu 18.04 host build (one of the supported options in this script), it needs lot of tool updates which probably not worth going forward.

pytorch library builds require python >=3.7 and cmake >=3.13. The default
versions of python and cmake on ubuntu 18.04 are much older. Given ubuntu 18.04
EOL is April 2023, the effort to update the script for ubutnu 18.04 changes
is not worth, hence deprecating ubuntu 18.04 host os support.

'ubuntu20_04': "ami-0ea142bd244023692", # login_name: ubuntu
'redhat8': "ami-0698b90665a2ddcf1", # login_name: ec2-user
}
ubuntu18_04_ami = os_amis['ubuntu18_04']
ubuntu20_04_ami = os_amis['ubuntu20_04']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ubuntu20_04_ami = os_amis['ubuntu20_04']
ubuntu18_04_ami = os_amis['ubuntu18_04']
ubuntu20_04_ami = os_amis['ubuntu20_04']

Comment on lines 191 to 192
host.run_cmd("wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null")
host.run_cmd("sudo add-apt-repository 'deb https://apt.kitware.com/ubuntu/ focal main'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is necessary? If you want a newer cmake, let's install int from pip/conda

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I added this for newer cmake. Let me check conda/pip based install. thanks!

@@ -214,15 +215,19 @@ def install_condaforge_python(host: RemoteHost, python_version="3.8") -> None:
else:
install_condaforge(host)
# Pytorch-1.10 or older are not compatible with setuptools=59.6 or newer
host.run_cmd(f"conda install -y python={python_version} numpy pyyaml setuptools=59.5.0")

host.run_cmd(f"conda install -y python={python_version} numpy pyyaml setuptools=59.5.0; python3 --version")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is necessary. And please keep 2 lines separator between functions/classes but one line between class methods

Copy link
Contributor Author

@snadampal snadampal Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's for debugging, I saw sometimes conda activate env was failing silently and leaving the older python version.

Comment on lines 308 to 309
# LD_PRELOAD the libtorch_cpu.so to work around the 'cannot allocate memory in static TLS block' issue
host.run_cmd(f"export LD_PRELOAD=$HOME/miniforge3/lib/python{python_version}/site-packages/torch/lib/libtorch_cpu.so; cd vision; {build_vars} python3 setup.py bdist_wheel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a valid workaround, we need to figure out what is causing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more digging today and looks like it's OpenBLAS source built libraries consuming more TLS memory. Not sure why it wasn't observed earlier.

@snadampal
Copy link
Contributor Author

Hi @malfet, I've limited this PR only for docker (manylinux 2014) build option, so that we don't need to carry lot of tool dependencies for the host build. This is very clean now.
Here are the commands for building wheels with and without mkldnn backend

with mkldnn+acl:
./build_aarch64_wheel.py --python-version 3.8 --use-docker --keep-running --os ubuntu20_04 --enable-mkldnn --branch release/1.13

without mkldnn backend (only openblas based):
./build_aarch64_wheel.py --python-version 3.8 --use-docker --keep-running --os ubuntu20_04 --branch release/1.13

One pending issue I'm still debugging is...
embedding the external libraries (Arm compute libraries in this case) into the torch wheel. I've used the embed_library.py present in this wheel builder script. The libraries are copied with additional hash tags and hence import is failing. Please let me know if you had seen this before and have any ideas....

@snadampal
Copy link
Contributor Author

Hi @malfet , I'm done with my changes. Built the image in manyLinux2014 docker mode and tested it on AWS Graviton instances. Please let me know if there are comments.

with mkldnn+acl:
./build_aarch64_wheel.py --python-version 3.8 --use-docker --keep-running --os ubuntu20_04 --enable-mkldnn --branch release/1.13

without mkldnn backend (only openblas based):
./build_aarch64_wheel.py --python-version 3.8 --use-docker --keep-running --os ubuntu20_04 --branch release/1.13

@malfet malfet merged commit 45aa074 into pytorch:main Oct 25, 2022
@snadampal
Copy link
Contributor Author

@malfet, thanks for merging the PR!

jithunnair-amd added a commit to ROCm/builder that referenced this pull request Apr 11, 2023
* Make sure package_type is set (pytorch#1139)

* Update check_binary.sh

* Update check_binary.sh

* Modifying smoke test to add more advanced validation as requested (pytorch#1124)

* Modify smoke test matrix

More vision smoke tests

Temporary pointing to my repo for testing

Try 2 use atalman builder

Modify path

Fixing commits

Testing

Testing

Smoke test modifications

Refactor test code

Fix typo

Fixing image read

A little more refactoring

Addressing comments

Testing

* Add same test for windows and macos

* Addressing c omments

* Add manywheel special build for including pypi package (pytorch#1142)

* Add manywheel special build

Testing

Builder change

Testing

Adding manywheel cuda workflow

Simplify

Fix expr

* address comments

* checking for general setting

* Pass correct parameters for macos validations (pytorch#1143)

* Revert "Update check_binary.sh"

This reverts commit 6850bed.

* Revert "Update check_binary.sh"

This reverts commit 051b9d1.

* setup periodic test to run binary verification  pytorch/pytorch#84764: (pytorch#1144)

* add a reusable workflow to run all smoke tests/or smoke tests for a specific os/channel
* add workflows to schedule the periodic smoke tests for nightly and release channels

* Update aarch64 script to latest one (pytorch#1146)

* minor: fix the typo job name for windows binaries validation workflow (pytorch#1147)

* fix the typo in the the job name for the release binaries validation workflow (pytorch#1148)

issue was introduced in pytorch#1144

* Move to rc2 of 3.11 python (pytorch#1149)

Need it to get several convenience functions

* Integrates CUDA pip wheels (pytorch#1136)

* Refactors rpath to externally set var. Adds mechanism to add metadata

* Sets RUNPATH when using cudnn and cublas wheels

* Escapes dollar sign

* Fix rpath for cpu builds

Co-authored-by: atalman <[email protected]>

* Uses RPATH instead of RUNPATH so that user strictly uses pypi libs (pytorch#1150)

* Binary Validation Workflow - Adding check binary script (pytorch#1127)

* Update action.yml

* Update validate-macos-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Fix check binary for arm64 (pytorch#1155)

* Fix check binary for arm64

* Update check_binary.sh

Co-authored-by: Nikita Shulga <[email protected]>

Co-authored-by: Nikita Shulga <[email protected]>

* Fix for including nvtx dll and cudart (pytorch#1156)

* Fix for invluding nvtx dll and cudart

* Fix for include nvtx

* Fix spaces

* Back out inclusion of cudart (pytorch#1157)

* Add cuda and date check to smoke test (pytorch#1145)

* shorten binary validation workflow names, so they are more readable in the HUD and GH job view (pytorch#1159)

* Fix anaconda torchaudio smoke test (pytorch#1161)

* Fix anaconda torchaudio smoke test

* Format using ufmt

* Fix whels tests for torchaudio (pytorch#1162)

* Pin condaforge version

Most recent version fails with  invalid cert error when trying to update
python

* Option to run resnet classifier on specific device

* Fix typo

`.test/smoke_test` -> `test/smoke_test`

Noticed when pushed pytorch@3b93537 and no tests were run

* Test resnet classifier on CUDA (pytorch#1163)

* [ROCm] support for rocm5.3 wheel builds (pytorch#1160)

* Updates to support rocm5.3 wheel builds (#6)

* Changes to support ROCm 5.3

* Updated as per comments

* Installing python before magma build

- In ROCm 5.3 libtorch build are failing during magma build due to
  to missing python binary so added install statement

* Move python install to libtorch/Dockerfile (#8)

* Updating the condition for noRCCL build (#9)

* Updating the condition for noRCCL build

* Updated changes as per comments

* Use MIOpen branch for ROCm5.3; Change all conditions to -eq

* Use staging branch of MIOpen for ROCm5.3

* Fix merge conflict

Fix merge conflict

Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

* Validate python 3.11 (pytorch#1165)

* Validate python 3.11

* Validate linux binaries change

Add options

Import torchvision

Adding python 3.11 install

pass package to check nightly binaries date

Test

test

Add python 3.11 code

testing

Adding python 3.11 test

Add python 3.11 validation

Adding zlib develop install

Install zlib etc..

Adding zlib1g as well

testing

testing

Adding validate windows binary

Trying to workaround

testing

Refacor smoke test

Add import statement

fix datetime call

* Fix stripping dev

* fix import

* Strip pypi-cudnn from the version.py (pytorch#1167)

* Strip pypi-cudnn from the version.py

* small fix

* Regenerates RECORD file to reflect hash changes caused by sed'ing the version suffix (pytorch#1164)

* Add pypi cudnn package to tests (pytorch#1168)

* Add pypi cudnn package to tests

* Fix pypi installation check

* Fix pypi instructions setting

* Update DEVELOPER_DIR in build_pytorch.sh

Not sure why we are still expecting Xcode9 to be present there, update it to the same folder as wheel builds

May be fixes pytorch/pytorch#87637

* Fix to not use sccache if it's not setup properly (pytorch#1171)

* Revert "Fix to not use sccache if it's not setup properly (pytorch#1171)" (pytorch#1172)

This reverts commit 377efea.

* Remove cuda102 and cuda115 docker builds and regenerate manylinux docker (pytorch#1173)

* Rebuild manywheel

* Remove cuda102 and cuda115

* [aarch64] add mkldnn acl backend build support for pytorch cpu libary (pytorch#1104)

* Only push to Docker and Anaconda repo from main (pytorch#1175)

We currently allow push from any branch to go to Docker (and Anaconda) prod. This is a dangerous practice because it allows unfinished works to jump to prod and used by other workflows

* Release 1.13 script changes (pytorch#1177)

* Test ResNet on MPS (pytorch#1176)

After pytorch/pytorch#86954 is fixed, we should be able to test resnet on MPS

* Revert "Test ResNet on MPS (pytorch#1176)" (pytorch#1180)

This reverts commit efa1bc7.

* Add v1.13 versions

* Update CMake to 3.18, needed for C++17 compilation (pytorch#1178)

* release: separate out version suffixes for torch pypi promotion (pytorch#1179)

* Fixup wheel published to PyPI (pytorch#1181)

* Fixup wheel published to PyPI

* Update prep_binary_for_pypi.sh

* Fix folder deletion for pypi prep

Co-authored-by: Andrey Talman <[email protected]>

* Update cmake version to 3.18 for libtorch docker

* Pins cuda runtime to 111.7.99 (pytorch#1182)

* Fixes cuda pypi rpaths and libnvrtc name (pytorch#1183)

* Allow ROCm minor releases to use the same MIOpen branch as the major release (pytorch#1170)

* Allow ROCm minor releases to use the same MIOpen branch as the major release

* correct logic to ensure rocm5.4 doesn't fall in wrong condition

* add 11.8 workflow for docker image build (pytorch#1186)

* Using windows runners from test-infra for validation workflows (pytorch#1188)

* Testing new windows runners

test

Testing

Testing

testing

testing

test

Test

Testing

testing

Testing

Testing

test

Test

test

testing

testing

Test

testing

test

testing

testing

testing

testing

testing

testing

test

test

testing

testing

testing

testing

Test

test

test

testing

testing

testing

testing

testing

testing

testing

testing

testing

Refactor code

* Adding details for the test-infra issue

* Update current CUDA supported matrix

* add magma build for CUDA11.8 (pytorch#1189)

* Test setting job name (pytorch#1191)

* Use official Python-3.11 tag (pytorch#1195)

* remove CUDA 10.2-11.5 builds (pytorch#1194)

* remove CUDA 10.2-11.5 builds

* remove 11.5 and 11.3 builds

* build libtorch and manywheel for 11.8 (pytorch#1190)

* build libtorch and manywheel for 11.8

* Update common/install_magma.sh

* use magma-cuda build-1 by default; remove CUDA 10.2-11.5 builds

Co-authored-by: Andrey Talman <[email protected]>

* [Validation] Pass ref:main to general worker (pytorch#1197)

* Pass ref:main to general worker

* Try to pass reference to workflow

* Pass ref:main to general worker

* Test

* Pass reference as input parameter

* Make new variable not required

* Fix typo

* Add workflow for manywheel cpu-cxx11-abi (pytorch#1198)

* [Validation] Use linux_job for linux workers (pytorch#1199)

* Use linux_job for linux workers

Test

Testing

Test

testing

Tetsing

testing

Change linux binary action

test

Simplify version check

* Fix if statement

* Fix typo

* Fix cuda version check

Fix Audio and Vision version check

Add check binary to libtorch

test

test

testing

testing

testing

Testing

Testing

testing

* Use macos generic workers (pytorch#1201)

* Use macos generic workers

fix workflow

testing

Add arm64 builds

test

Remove validate binary action

* add check binary step

* fix ld_library path

* add package type

* Adding ref to validate binaries (pytorch#1204)

* ROCm5.3 nightly wheels (pytorch#1193)

* Enable ROCm5.3 nightly wheels

* Enable ROCm5.3 docker builds

* Update amdgpu repo url for ROCm5.3

* ROCm5.3 not supported on Ubuntu 18.04

* empty

* Another empty commit

* Try disabling MLIR build to shorten docker build time

* Clean up disk space

* MLIR project changed names from ROCm5.4

* Retrigger CI to get around flaky magma git access error

* One more cmake-3.18.4 update

* Use cmake-3.18 for ROCM builds

* More cmake ROCM tweaks

* cmake-3.18 installation on ROCM (take 3)

* add conda builds for CUDA 11.8 (pytorch#1205)

* Enable nightly CUDA 11.8 builds (pytorch#1206)

* enable nightly builds for CUDA 11.8

* add CUDA 11.8 version to manywheel, remove 11.3 and 11.5

* Windows CUDA 11.8 changes (pytorch#1207)

* Add continue on error to validation jobs (pytorch#1209)

* Add continue on error to validation jobs

* test

* Delete unmaintaned torchvision build scripts (pytorch#1210)

All build logic has long moved to torchvision repo and now is executed
by reusable workflow from https://github.com/pytorch/test-infra/tree/main/.github/workflows

* build_pytorch.sh replace tabs with spaces (pytorch#1211)

* Make PyTorch depend on TorchTrition (pytorch#1213)

Remove me when Triton is properly released elsewhere

* Remove smoke test script that is no longer used (pytorch#1212)

* Another tabs-to-spaces change

`s/\t/\ \ \ \ \ \ \ \ /`

* Disable continue on error (pytorch#1214)

* Add torchtrition dependency for wheels

* Make PyTorchConda depend on Triton (Take 2)

Multi-line environment variables are hard, so lets do it traditional way

* Revert "Add torchtrition dependency for wheels"

This reverts commit 475100b.

* Add TorchTrition dependency for wheels (take 2)

Now tests should be green thanks to pytorch/pytorch#90017

* Add sympy to pytorch linux dependencies

* Mitigate windows nightly build regressions

By pinning conda to 22.9.0

Fixes pytorch/pytorch#90059

* Consolidating validation scripts (pytorch#1219)

* Consolidating validation scripts

* Fix validate script name

* Correct script path

* Correct script path

* test

* testing

* testing

* testing

* testing

* test

* test

* test

* testing

* testc

* test hook

* adding wondows use case

* windows use case

* test

* testing

* Windows fixes

* more fixes

* Add package type

* testing more

* Truncate RECORD instead of delete (pytorch#1215)

* Refactor and fix windows smoke tests (pytorch#1218)

* Fix windows smoke test

* Fix first if statement

* Refactor not to cal install nightly package

* Revert "Refactor not to cal install nightly package"

This reverts commit ac580c8.

* Fix pip install command remove cu102

* Refacor the conda installation

* Add cuda profiler apu to cuda install 11.8 (pytorch#1221)

* Update CUDA upgrade runbook to mention subpackages changes

As per following doc: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html

* conda: Add CUDA_HOME, cuda binaries to path (pytorch#1224)

* Refactor macos-arm64 into separate group (pytorch#1226)

* Adding libcufft constraint (pytorch#1227)

* Adding libcufft constraint

* Adding rest of the dependencies

* Advance build number in pytorch-cuda (pytorch#1229)

* Make sympy mandatory dependency of PyTorch

Should fix 
https://github.com/pytorch/audio/actions/runs/3684598046/jobs/6234531675

* Revert me later: Fix conda package smoke tests

* Install `sympy` via pip rather than conda

Needs to be reverted as well

* Refactor smoke tests to configure module included in the release (pytorch#1223)

* Changes to prep for pypi script for release 1.13.1 (pytorch#1231)

* PyPi binary validation and size check (pytorch#1230)

* Validate binary size

* Validate binary size linux_job

* evaluate the fix from pytorch#1231

* Add an optional artifact upload, consolidate fixes to `prep_binary_for_pypi.sh`

* Adding new workflow to call from domain libraries to validate on domain libraries such as text (pytorch#1234)

* Testing new workflow

Fix naming

fix input

* Changed comments

* Ad ability to call validate domain library manually (pytorch#1235)

* Adding test for validate dm workflow and fixing dm validation workflow (pytorch#1236)

* Test manywheel packages (pytorch#1239)

Change only docker file

* Bump scripts in release (pytorch#1241)

* release: Strip whitespace from version_with_suffix (pytorch#1242)

* Cuda 11.8 and removal of dev packages (pytorch#1243)

* Adding more OS's to validate domain library workflow (pytorch#1238)

* Adding more OS's to validate domain library workflow

* conda and wheel togeather

* add macos workflows

* fix workflow

* Add target os variable to windows validation (pytorch#1244)

* Update MKL to 2022.1 (pytorch#1245)

As previous one occasionally crashes on AMD CPUs

May be addresses pytorch/pytorch#89817

Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library:
```
int mkl_serv_intel_cpu_true() {
	return 1;
}
```

* Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)

* Installs NCCL from redist, uses system NCCL, and adds pypi RPATH

* Cleans up nvrtc patches and adds it using main script

* Fixes typo

* Adds more dependencies and builds torch with dynamic linking

* NCCL dirs have to be specified. Otherwise picks up different version

* Handles 11.8

* Adds echo message for nccl 2.15

* Remove invalid git option (pytorch#1246)

* Revert "Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)" (pytorch#1247)

This reverts commit ee59264.

* Add with_cuda flag (pytorch#1249)

* Add GPU architecture env variables (pytorch#1250)

* Add cuda to jobname for validate domain library (pytorch#1252)

* Remove pylief dependency (pytorch#1255)

* Fix PEP503 for packages with dashes

* Rename `torchtriton` to `pytorch-triton`

Companion change for pytorch/pytorch#91539

* s3_management: Hide specific packages between dates (pytorch#1256)

* s3_management: Pin requirements.txt

Packaging got updated and that's not what we want

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: except ValueError

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: Use the correct format for strptime

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: Bump bad dates to october 17th (pytorch#1257)

* s3_management: hide torchtriton (pytorch#1258)

* s3_management: Add PACKAGE_ALLOW_LIST for indices (pytorch#1259)

* s3_management: Bump bad date end to 12/30 (pytorch#1260)

* Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1248)

* Installs NCCL from redist, uses system NCCL, and adds pypi RPATH

* Cleans up nvrtc patches and adds it using main script

* Fixes typo

* Adds more dependencies and builds torch with dynamic linking

* NCCL dirs have to be specified. Otherwise picks up different version

* Handles 11.8

* Adds echo message for nccl 2.15

* Fixes logic for 11.8 and adds missing names for DEPS_SONAME

* s3_management: Account for underscore packages

pytorch-triton is listed as pytorch_triton

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: simplify allowlist, correct underscores

Signed-off-by: Eli Uriegas <[email protected]>

* Fix cuda version in nightly (pytorch#1261)

* Adding py311 validations (pytorch#1262)

* Use MATRIX_* variables instead of redeefining new var each time (pytorch#1265)

* Fix validation domain library (pytorch#1266)

remove ref main

fix workflow

more refactor

* Nightly: do test install with the dependencies better and skip CUDA tests on cpu only box (pytorch#1264)

* Refactor PyTorch wheel and libtorch build scripts for ROCm (pytorch#1232)

* Refactor wheel and libtorch build scripts (#7)

* Update to so patching for ROCm

Wildcard used in grep to grab the actual numbered so file referenced
in patchelf. This allows the removal of specifying the so number in
DEPS_LIST & DEPS_SONAME

This commit also adds the functionality for trimming so names to
build_libtorch.sh from build_common.sh

* Refactor to remove switch statement in build_rocm.sh

This commit refactors build_rocm.sh and brings in a few major updates:
 - No longer required to specify the full .so name (with number) for ROCm libraries
       - The .so versions are copied and the patching code will fix the links to point to this version
 - No longer required to specify paths for ROCm libraries allowing the removal of the large switch
       - Paths are acquired programmatically with find
 - No longer required to specify both the path and filename for the OS specific libraries
       - Programatically extract file name from the path
 - Automatically extract Tensile/Kernels files for the architectures specified in PYTORCH_ROCM_ARCH
   and any non-arch specific files e.g. TensileLibrary.dat

* rocfft/hipfft link to libhiprtc.so in ROCm5.4 (#15)

Co-authored-by: Jack Taylor <[email protected]>

* add sm_90 to CUDA11.8 builds (pytorch#1263)

* add sm_90 to CUDA11.8 builds

* Manually invoke bash for Miniconda

* Revert "add sm_90 to CUDA11.8 builds (pytorch#1263)" (pytorch#1275)

This reverts commit e1453a4.

* Set ubuntu distribution correctly for ROCm5.3 and above (pytorch#1268)

* Fix unbound variable error (pytorch#1276)

Regression introduced (and ignored) by pytorch#1262
Test plan:
```
% bash -c 'set -u; if [[ -z "${FOO}" ]]; then echo "bar"; fi' 
bash: FOO: unbound variable
(base) nshulga@nshulga-mbp builder % bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi'
bar
(base) nshulga@nshulga-mbp builder % FOO=1 bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi'

```

* Manually invoke bash for miniconda (pytorch#1277)

Fixes build issues failing with:
```
./Miniconda3-latest-Linux-x86_64.sh: 438: ./Miniconda3-latest-Linux-x86_64.sh: [[: not found
```
as seen in e.g.: pytorch#1271

* Fix perm

Which somehow got changed by pytorch@62103bf

* add sm_90 to CUDA11.8 builds (pytorch#1278)

* libtinfo.so version update and logic fix for ROCm libtorch (pytorch#1270)

* Use libtinfo.so.6 for Ubuntu 2004

* Fix to origname grep

* Condition on ROCM_VERSION for libtinfo6

* Looks like it is not used anywhere. (pytorch#1273)

* Build Windows binaries with Visual Studio 2022 Build Tools (pytorch#1240)

* Build Windows binaries with Visual Studio 2022 Build Tools

* Unify casing in Batch files, remove VS 2017 installation

* Remove VS 2017 Conda scripts, unify casing in conda Batch scripts, minor Conda scripts tweaks

* Slim down `pytorch-cuda`

It should only contain runtime dependencies that PyTorch+domain
libraries depend on, namely:
 - cudart
 - cublas
 - cusparse
 - cufft
 - curand
 - nvtx
 - nvrtc
 - nvjpeg (for TorchVision)

This removes dependencies on NVCC, build/debug tools, etc which are not
needed for running the pytorch

Test Plan:
  `conda create -n tmp -c nvidia -c malfet cuda-toolkit==11.7` and
observe that only relevant packages are installed

Fixes pytorch/pytorch#91334

* [BE] Delete `unicode-flags` build options (pytorch#1284)

There were relevant only for Python<=3.3

* [BE] Define `openssl_flags` (pytorch#1285)

Rather than have two invocations of `./configure`

* Build with `--enabled-shared` if `patchelf` is found (pytorch#1283)

This is needed to make `manylinux-wheel` images usable for building new Triton binaries.

Test plan: Build docker and verify that following `CMakeLists.txt` finishes successfully:
```
cmake_minimum_required(VERSION 3.6)
find_package(Python3 REQUIRED COMPONENTS Interpreter Development)
message(WARNING Executable ${Python3_EXECUTABLE})
message(WARNING IncludeDirs ${Python3_INCLUDE_DIRS})
message(WARNING Libraries ${Python3_LIBRARIES})
```

* Update cudnn to 8.7.0.84 for CUDA 11.8 builds (pytorch#1271)

* update cudnn to 8.7.0.84 for CUDA 11.8 builds

* workaround for pytorch#1272

* Revert "workaround for pytorch#1272"

This reverts commit c0b10d8.

* update cudnn==8.7.0.84 for windows

* [BE] Remove references to Python<3.6 (pytorch#1287)

* Upgrade desired python versoin to 3.8

For libtorch builds

* Fix how libtorch picks the python version

* Tweak conda builds to support 3.11

Add `-c malfet` when building for 3.11 (though perhaps it's better to
move numpy to pytorch channel)

Tweak some build time dependencies

* Fix typo

* Skip triton dependency for 3.11 CUDA builds

* Update build-number to 3

* Add ability to override cuda archs for conda (pytorch#1282)

* [ROCm] reduce disk space used in image (pytorch#1288)

Fixes pytorch#1286

* Extend MacOS/Windows builds to 3.11

By installing dependencies from pip
Should be a no-op for <=3.10

* ci: Migrate to checkout@v3 (pytorch#1290)

checkout@v2 is deprecated moving to checkout@v3

Signed-off-by: Eli Uriegas <[email protected]>

* Fix typo

* Add 3.11 option for Windows builds

* Add python-3.11 download location for windows

* Add pypi with cudnn package test (pytorch#1289)

* Add pypi with cudnn package test

* Add pypi with cudnn package test

* test

* test

* More pypi cudnn changes

* test

* Fix pipy smoke test

* Remove debug comments

* Delete some ancient checks for MacOS builds

As we no longer build for Python-2.7 or 3.5

* Add libnvjpeg-dev package as fallback (pytorch#1294)

* Add libnvjpeg-dev package as fallback

* Move libnvjpeg and libnvjpeg-dev to required packages

* Update conda/pytorch-cuda/meta.yaml

---------

Co-authored-by: Nikita Shulga <[email protected]>

* Upgrade nightly wheels to rocm5.4.2 (pytorch#1225)

* Upgrade nightly wheels to rocm5.4

* Adding graphic architectures for ROCm 5.4

* Updated to use ROCm5.4.1

* Updated to use ROCm5.4.2

* Fixed syntax error

* Perform build on image with magma and miopen preinstalled

* Add dev packages for windows pytorch-cuda dependencies (pytorch#1295)

* Add dev packages for windows dependencies

* Adding architecture dependent builds

* Add notes around windows

* fix typo

* Bumping version to v3

* rocm libtorch prebuild magma; fix manylinux cmake version (pytorch#1296)

* Add manywheel:cpu-cxx11-abi checkup for check_binary.sh (pytorch#1251)

* Remove with_py311 flag (pytorch#1301)

* rocm manylinux now uses devtoolset 9 (pytorch#1300)

* fix ACL_ROOT_DIR setting and upgrade the ACL version to 22.11 (pytorch#1291)

* Add `-c malfet` for Windows builds as well

* Set torch._C._PYBIND11_BUILD_ABI version check only for GLIBCXX_USE_CXX11_ABI=0 (pytorch#1303)

* Adding limit windows builds logic (pytorch#1297)

* Adding limit windows builds logic

* Remove empty space

* Simplify mkl build dependencies (pytorch#1305)

On Linux and Mac PyTorch must be built against `mkl=2020.x` in order to be compatible with both `mkl-2021` and `mkl-2022`, that added `.so.1` and `.so.2` files respectively, that would make binary linked against those versions incompatible with the newer/older toolchains.

This is not an issue on Windows, as all mkl binaries there end with simple `.dll`

* "Fix" PyTorch CPU conda testing

It's still horribly broken, but make it a bit better by not installing
pytorch from default anaconda channel (which installs 1.12.1 that does
not have any dependencies 2.0 dev package supposed to have)

For example, see this runlog
https://github.com/pytorch/pytorch/actions/runs/4155371267/jobs/7189101147

* Update torch._C._PYBIND11_BUILD_ABI version check (pytorch#1306)

* Skip tests for manywheel built with _GLIBCXX_USE_CXX11_ABI=1

* Put back smoke test label (pytorch#1310)

* [aarch64] add support for torchdata wheel building (pytorch#1309)

* Python 3.11 validation workflow tests (pytorch#1304)

* Test windows py311

* Nightly binaries

* Fix py311 tests

* fix python calling

* Revert "Nightly binaries"

This reverts commit cbf80ca.

* add a scheduled workflow for the nightly pypi binary size validation (compliments pytorch/test-infra#2681) (pytorch#1312)

* Add regression test for pytorch/pytorch#94751

* Add 3.11 and `--pytorch-only` options

* Add `lit` to list of allowed packages

As it is now mandatory (albeit spurious) dependency of pytorch-triton

See https://pypi.org/project/lit/ for more details

* s3: Allow tar.gz as an accepted file extension (pytorch#1317)

* Changes for Python 3.11 and smoke Test RC cut (pytorch#1316)

* Smoke Test RC cut

* Validate binaries 3.11

* test

* Smoke test binaries

* Fix pytorch-cuda chan download

* Remove temp change

* Make sure we don't use GPU runners for any of libtorch validations (pytorch#1319)

* Make sure we don't use GPU runners for any of libtorch

* Make sure we don't use GPU runners for any of libtorch

* s3: Add pytorch_triton_rocm to index (pytorch#1323)

Signed-off-by: Eli Uriegas <[email protected]>

* s3: Add tqdm package req for text (pytorch#1324)

* Add `--analyze-stacks` option

That using `git rev-base`, prints total number of stacks, and its
average, mean and max depth

At the time of submission here is top 10 ghstack uses of pytorch:
```
ezyang has 462 stacks max depth is 15 avg depth is 1.70 mean is 1
awgu has 240 stacks max depth is 28 avg depth is 4.30 mean is 1
peterbell10 has 146 stacks max depth is 7 avg depth is 1.84 mean is 1
zou3519 has 128 stacks max depth is 7 avg depth is 1.98 mean is 1
jerryzh168 has 113 stacks max depth is 16 avg depth is 1.45 mean is 1
bdhirsh has 111 stacks max depth is 7 avg depth is 1.85 mean is 2
wconstab has 108 stacks max depth is 7 avg depth is 2.15 mean is 1
SherlockNoMad has 99 stacks max depth is 4 avg depth is 1.24 mean is 1
zasdfgbnm has 80 stacks max depth is 11 avg depth is 2.52 mean is 6
desertfire has 73 stacks max depth is 3 avg depth is 1.14 mean is 1
```

* Add filelock and networkx deps (pytorch#1327)

To match dependencies for wheel files defined in https://github.com/pytorch/pytorch/blob/ed1957dc1989417cb978d3070a4e3d20520674b4/setup.py#L1021-L1024

* Remove building magma from source

* Revert

* Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)

* Upgrade cmake version to 3.22.1 to build triton

* Pin patcheft version

* Fix comment typo

* Smoke test for cuda runtime errors (pytorch#1315)

* Add test for cuda runtime errors

* Add cuda exception smoke test

* Move cuda runtime error to end

* Move cuda runtime error to end

* Address comments

* Address comments

* Add Jinja2 Dependency (pytorch#1332)

As part of the effort to fix pytorch/pytorch#95986

* Add MarkupSafe to S3 Index (pytorch#1335)

* Remove rocm5.1 rocm5.2 from libtorch Dockerfile

* [aarch64] Adding CI Scripts to build aarch64 wheels (pytorch#1302)

* add aarch64 ci scripts

* added readme. get branch from /pytorch

* Add smoke tests conv,linalg,compile. And better version check. (pytorch#1333)

* Add smoke tests conv,linalg,compile

* Add version check

* Fix typo

Fix version check

Add not

* Add exception for python 3.11

* fix typo

* Try to exit after CUDA Runtime exception

* Restrict carsh test only to conda

* Restrict carsh test only to conda

* Fix tests

* Turn off cuda runtime issue

* tests

* more tests

* test

* remove compile step

* test

* disable some of the tests

* testing

* Remove extra index url

* test

* Fix tests

* Additional smoke tests

Remove release blocking changes

* Aarch64 changes for PyTorch release 2.0 (pytorch#1336)

* Aarch64 changes for PyTorch release 2.0

* Fix spacing

* Update aarch64_linux/build_aarch64_wheel.py

Co-authored-by: Nikita Shulga <[email protected]>

* Update aarch64_linux/build_aarch64_wheel.py

Co-authored-by: Nikita Shulga <[email protected]>

---------

Co-authored-by: Nikita Shulga <[email protected]>

* Aarch64 build py3.11 fix (pytorch#1341)

* Fix nightly smoke test (pytorch#1340)

* Fix nightly smoke test

* Fix nightly builds

* Release 2.0 release scripts changes (pytorch#1342)

* Release 2.0 release scripts changes

* Release script modifications

* Add more packages to allow list (pytorch#1344)

* Add `jinja2` dependency to conda package

To be consistent with wheels, see
https://github.com/pytorch/pytorch/95961

* Restrict jinja to py 3.10 or less (pytorch#1345)

* Update `torchtriton` version to 2.1.0

* And update trition version here as well

* added smoke test for max-autotune (pytorch#1349)

Co-authored-by: agunapal <[email protected]>

* Refactor conda backup script (pytorch#1350)

* Refacto conda backup

* Fix space

* Minor style

* Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" (pytorch#1351)

* Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)"

This reverts commit 18c5017.

* Selective revert

* Get cmake from pip

* Use 3.18.2 from conda

* Release script changes, add more release dependencies, bump version for aarch64 builds (pytorch#1352)

* Release script changes

* Add Jinja2 dependency

* Fix typo

* Add pytorch conda dependencies (pytorch#1353)

* Add latest dependencies for pytorch 2.0 release (pytorch#1357)

* Fix typo

* Revert "Revert me later: Fix conda package smoke tests"

This reverts commit d7f2a7c.

* [aarch64] update readme with the "--enable-mkldnn" option (pytorch#1362)

This needs to be enabled for official wheel building.

* Replace `--enable-mkldnn` with `--disable-mkldnn`

Also, change default to ubuntu-20.04

* Update AMIs

Using following images:
```
% aws ec2 describe-images --image-ids ami-078eece1d8119409f ami-052eac90edaa9d08f ami-0c6c29c5125214c77 --query "Images[].[ImageId, Description]"
[
    [
        "ami-078eece1d8119409f",
        "Canonical, Ubuntu, 18.04 LTS, arm64 bionic image build on 2023-03-02"
    ],
    [
        "ami-0c6c29c5125214c77",
        "Canonical, Ubuntu, 22.04 LTS, arm64 jammy image build on 2023-03-03"
    ],
    [
        "ami-052eac90edaa9d08f",
        "Canonical, Ubuntu, 20.04 LTS, arm64 focal image build on 2023-03-01"
    ]
]
```

* Update tags for domain libraries

* Add PyTorch version pinning to release wheels

* Fix flake8

* [BE] Introduce `build_domains` function

And call it to rebuild only domains if torch wheel is available

* Switch deprecated ubuntu-18.04 runner to ubuntu-latest (pytorch#1334)

* Switch deprecated ubuntu-18.04 runner to self-hosted 2xlarge

* Leave build-nvidia-docker for now

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <[email protected]>

* Use ephemeral runners

* Use ubuntu-latest

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <[email protected]>

* Switch from latest to 22.04 to pin the version

---------

Co-authored-by: Nikita Shulga <[email protected]>

* Introduce optional --build-number parameter

* Revert me later: Fix conda package smoke tests

(cherry picked from commit d7f2a7c)

Alas, it's still used and causes nightly build failures

* Fix aarch64 torchvision build (pytorch#1363)

* Fix torchvision image extension compilation

* Fix torchvision image extension compilation

* Set enable_mkldnn to pypi build

* Remove unused `enable_mkldnn` for configure_system

* [aarch64] Try to link statically with png/jpeg

Also, add testing (which is currently broken)

* Revert "Revert me later: Fix conda package smoke tests"

This reverts commit ce427de.

* [AARCH64] Fix image.so wheel

By adding explicit libz dependency

* [AARCH64] Pass `BUILD_S3` to torchdata

To make build consistent with Linux-x86_64

* Revert "[AARCH64] Pass `BUILD_S3` to torchdata"

This reverts commit ae8e825.

As it does not want to be built on aarch64

* Add portalocker (pytorch#1364)

* [BE] Error handling in build_aarch64_wheel

I've noticed that build errors in `build_ArmComputeLibrary` would be
ignored as semicolon is used between the commands, instead of &&
Also, replace nightly version evaluation by relying on torch, to rely on
individual libraries

* [AArch64] Pass `args.instance_type` to `start_instance`

* use c++17 when building windows smoke tests (pytorch#1365)

Summary:
We are seeing failures during CI dealing with some headers that have
nested namespaces. This is expected to remedy them.

One such example:
https://github.com/pytorch/pytorch/actions/runs/4510336715/jobs/7942660912

Test Plan: Test this with CI.

---------

Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Co-authored-by: Andrey Talman <[email protected]>
Co-authored-by: andysamfb <[email protected]>
Co-authored-by: izaitsevfb <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Syed Tousif Ahmed <[email protected]>
Co-authored-by: Syed Tousif Ahmed <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Wei Wang <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Huy Do <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Eli Uriegas <[email protected]>
Co-authored-by: ptrblck <[email protected]>
Co-authored-by: zhuhong61 <[email protected]>
Co-authored-by: Greg Roodt <[email protected]>
Co-authored-by: Eli Uriegas <[email protected]>
Co-authored-by: Dmytro Dzhulgakov <[email protected]>
Co-authored-by: albanD <[email protected]>
Co-authored-by: Radek Bartoň <[email protected]>
Co-authored-by: divchenko <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Bo Li <[email protected]>
Co-authored-by: Mike Schneider <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
Co-authored-by: agunapal <[email protected]>
Co-authored-by: dagitses <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants