ROCm5.3 nightly wheels #1193

jithunnair-amd · 2022-11-14T22:50:44Z

No description provided.

jithunnair-amd · 2022-11-22T07:14:32Z

https://github.com/pytorch/builder/actions/runs/3507244988/jobs/5874816804 failed with disk space error:

#67 11429.5 Disk Requirements:
#67 11429.5   At least 613MB more space needed on the / filesystem.

Rerunning to see if error is flaky.

jithunnair-amd · 2022-11-23T00:08:35Z

manywheel jobs succeeded but libtorch job failed with: https://github.com/pytorch/builder/actions/runs/3526778914/jobs/5915095539

#18 59.07 remote: unable to authorize current user, internal server error
#18 59.07 fatal: unable to access 'https://bitbucket.org/icl/magma.git/': The requested URL returned error: 500
#18 ERROR: executor failed running [/bin/sh -c bash ./install_rocm_magma.sh && rm install_rocm_magma.sh]: exit code: 128

jithunnair-amd · 2022-11-23T17:03:50Z

Finally the CI gods have smiled on me :) @seemethere @atalman @malfet Could we please merge this on priority, we have a bunch of dependent PRs for nightly wheel upgrades on pytorch/vision/audio etc.?

jithunnair-amd · 2022-11-23T17:05:30Z

libtorch/build_docker.sh

@@ -27,7 +27,7 @@ case ${GPU_ARCH_TYPE} in
    rocm)
        BASE_TARGET=rocm${GPU_ARCH_VERSION}
        DOCKER_TAG=rocm${GPU_ARCH_VERSION}
-        GPU_IMAGE=rocm/dev-ubuntu-18.04:${GPU_ARCH_VERSION}
+        GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}


ROCm5.3 doesn't support Ubuntu18.04

malfet · 2022-11-23T17:07:40Z

@jithunnair-amd one can not merge Draft PR, can he?

jithunnair-amd · 2022-11-23T17:08:04Z

sorry, I realized :) just moved it out of Draft

Dependent on PR pytorch/builder#1193 Pull Request resolved: #89101 Approved by: https://github.com/kit1980

commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b Author: Jakub Pietrak <[email protected]> Date: Thu Dec 1 13:32:03 2022 +0100 rm print commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760 Author: Jakub Pietrak <[email protected]> Date: Thu Dec 1 11:35:02 2022 +0100 pytorch_sparse.matmul to torch.sparse.matmul commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b Merge: 4562359b9f 3a858ba8e3 Author: Jakub Pietrak <[email protected]> Date: Mon Nov 28 14:09:42 2022 +0100 Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36 commit 4562359b9fb3de301690334a892d44911eda45c8 Merge: deba083400 b5616cd5f4 Author: Jakub Pietrak <[email protected]> Date: Mon Nov 28 12:22:11 2022 +0000 Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36 commit deba0834008ad95af7e3a6603223a0f8a5555967 Merge: 0e1a8522bb a97d0508cb Author: Jakub Pietrak <[email protected]> Date: Mon Nov 28 12:19:25 2022 +0000 Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36 commit 0e1a8522bb695387816a29bbfcf182962429b3ab Merge: 059a238619 75bfbc35ca Author: Jakub Pietrak <[email protected]> Date: Mon Nov 28 12:16:35 2022 +0000 Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36 commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8 Author: Michael Voznesensky <[email protected]> Date: Mon Nov 28 05:12:37 2022 +0000 Add simple assert to detect fake tensors on modules (#89723) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723 Approved by: https://github.com/ezyang commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a Author: Edward Z. Yang <[email protected]> Date: Sat Nov 26 13:52:28 2022 -0800 Beef up AOTAutograd logging with aot_id and input descriptions (#89710) A few things in this PR, that I found useful while debugging some recent issues: - We now allocate an aot_id to each aot_function/aot_module invocation, and print it whenever we report error messages and graph output logging. Check the comment for why this sort of thing is useful, and also why it's different from nth_graph. This number is now incorporated into aot_graph_name - I noticed that nth_graph only gets incremented when backwards is compiled. Because backwards is compiled lazily, this means that multiple forward graphs would have gotten the same ID! I change nth_graph to always increment to avoid confusion here. - I added a simple describe_input function, which makes use of num_params_buffers to tell the user if the input index they're looking at is a param/buffer or an input. With the help of https://github.com/pytorch/pytorch/pull/89709 we could give even more detailed information about inputs (we could also easily give detailed information about parameters if we stored a mapping of index to parameter name, but I didn't need this when debugging so I'll let someone else add it if they need it.) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710 Approved by: https://github.com/bdhirsh commit 5f8848f32901e35cead64d520885f718679c2bbe Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 15:26:55 2022 -0500 Don't suppress log messages for dynamo CI config (#89653) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653 Approved by: https://github.com/albanD, https://github.com/kit1980 commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238 Author: Edward Z. Yang <[email protected]> Date: Sun Nov 27 19:27:45 2022 -0500 Add single process version of dynamo distributed hf_Bert tests (#89721) It's a lot easier to debug problems in the Dynamo optimization pass if you aren't actually triggering a multiprocessing run. Keep these tests around. I think the other tests can probably get this treatment too, leaving this to future work. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721 Approved by: https://github.com/voznesenskym commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e Author: Edward Z. Yang <[email protected]> Date: Sat Nov 26 11:25:24 2022 -0800 Add debug asserts to AOTAutograd for input consistency with compilation (#89702) Fixes https://github.com/pytorch/torchdynamo/issues/1927 Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702 Approved by: https://github.com/bdhirsh commit 1f95f24d3003a35568a00b5e5e18439846089b0f Author: Edward Z. Yang <[email protected]> Date: Sat Nov 26 11:25:24 2022 -0800 Factor input deduplication into a separate function (#89701) It turns out that instead of having a giant blobby aot_dispatch_autograd function, we can factor it into a series of wrapper functions, each of which successively guarantees more invariants on the inner compilation function until the final inner function is quite trivial. How exactly you have to wrap the input user functions and the output compiled functions can be expressed concisely in Haskell, so I've included the Haskell formulation in code comments. This PR shows how to do this for input deduplication. Dealing with the rest of the view handling is left to future work. This PR should also be a slight performance improvement as deduplicating is skipped entirely when there are no duplicate inputs. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701 Approved by: https://github.com/bdhirsh commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c Author: Edward Z. Yang <[email protected]> Date: Sat Nov 26 14:28:56 2022 -0500 Implement guard_source on RandomValueSource (#89711) I audited the pattern matches on the enum and it didn't look like this one should apply there. Sorry, no test, I know this matters on symbolic-shapes branch but I haven't had time to extract out a minimal reproducer. Take my word for it. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711 Approved by: https://github.com/jansel commit 1da633f98a5da000083c0c47d9e192b2689f867b Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 13:57:17 2022 +0000 Access named parameters/buffers/etc via getattr rather than index (#89625) I'm not sure why this never caused problems before. The error manifests as `TypeError: 'MyModule' object is not subscriptable` Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625 Approved by: https://github.com/albanD commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0 Author: Horace He <[email protected]> Date: Thu Nov 24 02:17:37 2022 +0000 Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171 Approved by: https://github.com/ngimel commit b709078dc673cbd5025a1df3eae7f5c60acc2698 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:21 2022 -0800 [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926) There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.) Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926 Approved by: https://github.com/chaekit commit 143d2881a844934c95c4ada63b38179d97e65af3 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:19 2022 -0800 [Profiler] Memory profiler part 10: Mark optimizer state (#88925) This is also a fairly simple pass, since we're simply collecting values from the python tracer. Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925 Approved by: https://github.com/chaekit commit ae725d501e33ed6f823997bea03d99cdc8dae5ff Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:18 2022 -0800 [Profiler] Memory profiler part 9: Mark activations (#88924) This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass. Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924 Approved by: https://github.com/chaekit commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da Author: Yuxin Wu <[email protected]> Date: Sun Nov 27 05:55:24 2022 +0000 Let SyncBatchNorm fallback to BN if not using distributed training (#89706) Fixes #63662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706 Approved by: https://github.com/soumith commit 39449ea61d9a6644731687219282f610cbf7cf54 Author: PyTorch MergeBot <[email protected]> Date: Sun Nov 27 02:59:04 2022 +0000 [vision hash update] update the pinned vision hash (#89692) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692 Approved by: https://github.com/pytorchbot commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:16 2022 -0800 [Profiler] E2E expecttests for category assignment (#88653) Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.) The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler. Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653 Approved by: https://github.com/chaekit commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:14 2022 -0800 [Profiler] Memory profiler part 8: Mark parameters. (#87568) Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates. Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.) Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568 Approved by: https://github.com/chaekit commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:13 2022 -0800 [Profiler] Memory profiler part 7: Mark inputs (#87567) It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference. Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements. Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567 Approved by: https://github.com/chaekit commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:11 2022 -0800 [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566) Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories. We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage. Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node. Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.) Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566 Approved by: https://github.com/chaekit commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:09 2022 -0800 [Profiler] Memory profiler part 5: Data flow graph (#87006) The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs. It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node. Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006 Approved by: https://github.com/chaekit commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66 Author: Taylor Robie <[email protected]> Date: Sat Nov 26 10:33:08 2022 -0800 [Profiler] Memory profiler part 4: Select top level torch ops (#86880) In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is. Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880 Approved by: https://github.com/chaekit commit bb77accb4c996e3aab9ae4b665fb8464400c8194 Author: Jiong Gong <[email protected]> Date: Sat Nov 26 14:06:44 2022 +0000 [Inductor] Record cpp kernel in PyTorch Profiler (#89367) Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367 Approved by: https://github.com/jansel commit 36018a6ee63f140b95ad644d09920798b0c624f8 Author: Edward Z. Yang <[email protected]> Date: Fri Nov 25 13:48:35 2022 -0800 Don't suppress exceptions from backends (#89656) Taken from voz's https://github.com/pytorch/pytorch/pull/89392 Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656 Approved by: https://github.com/voznesenskym commit 3e20d023b1f442ebe59e76604395cd8d4abed52a Author: Natalia Gimelshein <[email protected]> Date: Sat Nov 26 03:08:23 2022 +0000 put descriptive kernel names behind config (#89697) Per title, generated kernel names are often long and confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697 Approved by: https://github.com/Chillee commit 591dfffa38848de54b7f5f4e49260847024c9281 Author: jlukehubbard <[email protected]> Date: Fri Nov 25 21:31:53 2022 +0000 update docstring for torch.linalg.lstsq (#89383) Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA. Fixes #85021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383 Approved by: https://github.com/lezcano commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2 Author: Edward Z. Yang <[email protected]> Date: Fri Nov 25 03:31:20 2022 +0000 Simplify aot_module_simplified by removing top_args/top_kwargs (#89666) This makes good on Chillee's CR comment at https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222 which was never done in the original PR. There is no logic change, just unpack the args/kwargs at the top level and remove the inner function indirection. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666 Approved by: https://github.com/voznesenskym commit 6168f22fae66da5703e087bcd10076921ca157e7 Author: Edward Z. Yang <[email protected]> Date: Fri Nov 25 03:31:19 2022 +0000 Don't support kwargs at runtime in aot_module_simplified (#89664) The preexisting logic here added in https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664 Approved by: https://github.com/voznesenskym commit b04dda4291f1d30b064572e4521e82fa2573af77 Author: Edward Z. Yang <[email protected]> Date: Fri Nov 25 03:31:19 2022 +0000 Delay verify correctness wrapping to call site. (#89662) There is only one call site for compiler_fn, so we can safely delay wrapping verify correctness to here. This will help later when we change the backend compiler calling convention to pass fake tensors (but I need to pass real tensors here.) This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392 but with less changes to the substantive logic. I only moved the relevant inner implementation; there are no changes otherwise. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662 Approved by: https://github.com/voznesenskym commit 61a3fe4b6409965223273c1098f9a77ff071efe1 Author: Natalia Gimelshein <[email protected]> Date: Fri Nov 25 19:42:38 2022 +0000 make inductor correctly propagate nans for maximum and minimum (#89612) Partially fixes https://github.com/pytorch/torchdynamo/issues/594 Also, small cleanup for `where` codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612 Approved by: https://github.com/soumith, https://github.com/jansel commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f Author: Ikko Ashimine <[email protected]> Date: Fri Nov 25 19:26:18 2022 +0000 Fix typo in segment_reduction_op_gpu.cu (#89647) menber -> member Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647 Approved by: https://github.com/kit1980 commit 2c0bd85c755043d696452ddab354f3ff6775738b Author: kshitij12345 <[email protected]> Date: Fri Nov 25 14:53:57 2022 +0000 complex: register c10::complex with py::cast (#89680) Fixes #77134 TODO: * [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?) ```c++ namespace py = pybind11; int main() { py::scoped_interpreter guard{}; // start the interpreter auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0)); assert( (c10::complex<double>(1.0, 2.0) == py::cast<c10::complex<double>>(casted_cdouble))); auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0)); assert( (c10::complex<double>(1.0, 2.0) == py::cast<c10::complex<double>>(casted_cfloat))); auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0)); assert( (c10::complex<double>(1.0, 2.0) == py::cast<c10::complex<double>>(casted_chalf))); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680 Approved by: https://github.com/ezyang commit a97d0508cb5259951bc48300fb914cebdf322bb9 Merge: 849be586e6 abb446af8c Author: Jakub Pietrak <[email protected]> Date: Fri Nov 25 15:24:54 2022 +0100 Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36 commit 849be586e649421ba58182feb9067a4ac65479e3 Merge: 059a238619 75bfbc35ca Author: Jakub Pietrak <[email protected]> Date: Fri Nov 25 14:25:40 2022 +0100 Merge branch 'gh/mingfeima/85/head' into pyg-36 commit abb446af8c65a49bbc3767e14605a73d244c176b Author: Alvaro Gaona <[email protected]> Date: Fri Nov 25 11:09:28 2022 +0000 Implement old windows in Python (#87082) Relates to #85366 - Bartlett, Blackman, Hamming, Hann. - Except Kaiser which will be in a different PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082 Approved by: https://github.com/mruberry, https://github.com/lezcano commit 059a238619b122f922c569c618919a277420e483 Merge: 26ba2e9751 95ea47ef0c Author: Jakub Pietrak <[email protected]> Date: Fri Nov 25 10:00:53 2022 +0100 Merge branch 'pytorch:master' into jpietrak/pyg-36 commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e Author: Jason Ansel <[email protected]> Date: Fri Nov 25 04:28:36 2022 +0000 torchdynamo to torch._dynamo in aot_autograd.py (#89385) Test Plan: Run torchbench models Differential Revision: D41429573 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385 Approved by: https://github.com/soumith, https://github.com/malfet commit 69043247819042db18ac9526c2d747fa61fe8880 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 12:00:13 2022 -0800 Remove fake_tensor_propagation (#89646) You always have to run dynamo with fake tensors. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646 Approved by: https://github.com/soumith commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 12:00:12 2022 -0800 xfail maml test, instead of running it without fake tensor prop (#89645) A previous version of this patch graph breaks when torch.tensor fails, but that causes ``` PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32 ``` to start failing. Probably another latent bug that needs investigating. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645 Approved by: https://github.com/albanD commit a048913e2530442360c36a48420079ca9ebca149 Author: PyTorch MergeBot <[email protected]> Date: Fri Nov 25 03:03:41 2022 +0000 [vision hash update] update the pinned vision hash (#89667) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667 Approved by: https://github.com/pytorchbot commit 3b3ebcd031b68762938806f541d7247a1521bb11 Author: XiaobingSuper <[email protected]> Date: Thu Nov 24 02:33:01 2022 -0500 TorchDynamo: weight prepack for single conv (#89209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0c4f3db7bf24e94125c6802718a1105ee548c953 Author: XiaobingSuper <[email protected]> Date: Thu Nov 24 02:32:59 2022 -0500 TorchDynamo: weight prepack for mkl linear (#89109) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109 Approved by: https://github.com/jgong5, https://github.com/jansel commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e Author: XiaobingSuper <[email protected]> Date: Thu Nov 24 02:32:55 2022 -0500 TorchDynamo: weight prepack for onednn convolution external call (#88988) This PR is about enabled weight prepack using the MKLDNN tensor: 1. enable fake tensor mode for MKLDNN tensor input. 2. make convolution fusion kernel support MKLDNN tensor input. 3. do the weight prepack at FX fusion step. For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0884fdaba0280e3f3ad2abc34c0940587f744886 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 14:31:00 2022 -0500 Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652) This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf. Testing to see if this fixes gmixer_24_224 mixer_b16_224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652 Approved by: https://github.com/eellison commit 4a16f8cdb26be3561742e86f184e59f65418fe63 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 09:00:09 2022 -0800 Reenable fake_tensor_propagation on test_cudnn_rnn (#89644) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644 Approved by: https://github.com/anjali411 commit fc7dcb684aa38da5b1534fc701657ee63af8909c Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 09:00:09 2022 -0800 Run optimizer tests with fake tensors (#89643) This is a slight regression: RAdam and Adagrad don't appear to trace at all under fake tensors. But I think this is a more accurate reflection of the current state of affairs. Along the way fix some problems on the fake tensor path. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643 Approved by: https://github.com/anjali411 commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 09:00:08 2022 -0800 Force test_rng_state to run with fake tensor prop (#89641) I'm not really sure what desertfire's intended follow up was on https://github.com/pytorch/pytorch/pull/87490 because when I remove the unsupported() call, dynamo tests pass. But the change here is conservative and I think strictly better than the current situation. The idea is to force fake tensor pop on for the test, and then just observe that we are doing a graph break. Clearly, export doesn't work, so I manually xfail it. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641 Approved by: https://github.com/anjali411 commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 09:00:08 2022 -0800 Easy: These tests work with fake_tensor_propagation on (#89640) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640 Approved by: https://github.com/anjali411, https://github.com/albanD commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 08:11:48 2022 -0800 Support unspecialized integers with dynamic shapes (#89639) Previously, we hackily wrapped unspecialized integers into tensors and treated them as tensor inputs. Sometimes, downstream operations would not be able to deal with the tensor input. Now, we wrap them into SymInt, so more correct overload selection occurs. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639 Approved by: https://github.com/anjali411 commit 0c96841a20f0ae9380ef26657914276a42c9c9d7 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 08:11:47 2022 -0800 Cond capture with fake tensors actually works; don't raise in this case (#89638) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638 Approved by: https://github.com/anjali411 commit d3c012f409a4e4d5a11070a90b5578da82778030 Author: kshitij12345 <[email protected]> Date: Thu Nov 24 21:41:20 2022 +0000 [test_nn] split pruning tests from test_nn (#89590) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590 Approved by: https://github.com/albanD commit 83666f167dcf023d301f16fad82b9afb374ad836 Author: Aleksandar Samardžić <[email protected]> Date: Thu Nov 24 14:44:12 2022 +0000 Added vectorized CPU code for uint8_t datatype. (#89284) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284 Approved by: https://github.com/lezcano, https://github.com/peterbell10 commit 9497552771ca59c68509398ab3094e590a3047c5 Author: Howard Huang <[email protected]> Date: Thu Nov 24 19:41:17 2022 +0000 Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521) Summary: Fixes https://github.com/pytorch/pytorch/issues/88568 `_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor` Test Plan: CI Differential Revision: D41479983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521 Approved by: https://github.com/wz337 commit 94a88b53ed37854379813abf9641d1637fe2688b Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 08:11:46 2022 -0800 Remove fake_tensors_available (#89637) As we are one repo now, they are always available. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637 Approved by: https://github.com/anjali411 commit 1c8b0779de76d0c76d34835047106ab37b41790b Author: Emilio Castillo <[email protected]> Date: Thu Nov 24 18:25:26 2022 +0000 Fix segfault when swapping custom allocator (#89613) Just screwed it before merging ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613 Approved by: https://github.com/albanD commit fd279fe85b8f5a8e74c615436f0b180621b6ef52 Author: Edward Z. Yang <[email protected]> Date: Thu Nov 24 09:23:05 2022 -0500 Make pytest work again on test/dynamo (#89631) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631 Approved by: https://github.com/anjali411 commit c3e85d879cdbd3973754760c6767c75276b1dca8 Author: albanD <[email protected]> Date: Thu Nov 24 17:11:42 2022 +0000 Mention discrepency between original impl and our impl of RAdam (#89575) Fixes https://github.com/pytorch/pytorch/issues/88836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575 Approved by: https://github.com/mruberry commit 860bae49e4925868a0221ec4345d08407280bac7 Author: Edward Z. Yang <[email protected]> Date: Wed Nov 23 08:04:31 2022 -0800 Suppress guards on as_strided call only. (#89569) See comment in meta_utils.py for the whole story. This doesn't have a substantive impact yet, but will in the next PR on the stack. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569 Approved by: https://github.com/albanD commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9 Author: mfkasim1 <[email protected]> Date: Thu Nov 24 11:11:51 2022 +0000 Added log1p for complex in c10 (#89214) One PR towards #89205. The content is mostly from PR #38465, but slightly changed the expression to make it faster. Here are some benchmarking code: ```c++ // main.cc template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) { // this PR T x = z.real(); T y = z.imag(); T theta = std::atan2(y, x + T(1)); T r = x * (x + T(2)) + y * y; return {T(0.5) * std::log1p(r), theta}; } template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) { // PR #38465 T x = z.real(); T y = z.imag(); std::complex<T> p1 = z + T(1); T r = std::abs(p1); T a = std::arg(p1); T rm1 = (x * x + y * y + x * T(2)) / (r + 1); return {std::log1p(rm1), a}; } template<typename T> inline std::complex<T> log1p_v2(const std::complex<T> &z) { // naive, but numerically inaccurate return std::log(T(1) + z); } int main() { int n = 1000000; std::complex<float> res(0.0, 0.0); std::complex<float> input(0.5, 2.0); auto start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v0(input); } auto end = std::chrono::system_clock::now(); auto elapsed = end - start; std::cout << "time for v0: " << elapsed.count() << '\n'; start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v1(input); } end = std::chrono::system_clock::now(); elapsed = end - start; std::cout << "time for v1: " << elapsed.count() << '\n'; start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v2(input); } end = std::chrono::system_clock::now(); elapsed = end - start; std::cout << "time for v2: " << elapsed.count() << '\n'; std::cout << res << '\n'; } ``` Compiling the script with command `g++ main.cc` produces the following results: ``` time for v0: 237812271 time for v1: 414524941 time for v2: 360585994 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214 Approved by: https://github.com/lezcano commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337 Author: Jiewen Tan <[email protected]> Date: Thu Nov 24 10:57:01 2022 +0000 [LTC] Refine MetricsArena::Reset (#89608) Summary: After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b. This is paired with pytorch/xla#4217. Test Plan: PJRT_DEVICE=CPU python xla/test/test_metrics.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608 Approved by: https://github.com/JackCaoG commit a8629a1c18fd13300ce69c1d6042004038885cf0 Author: Jithun Nair <[email protected]> Date: Thu Nov 24 10:53:20 2022 +0000 Upgrade nightly wheels to ROCm5.3 (#89101) Dependent on PR https://github.com/pytorch/builder/pull/1193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101 Approved by: https://github.com/kit1980 commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188 Author: Ivan Yashchuk <[email protected]> Date: Thu Nov 24 09:37:10 2022 +0000 Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132) I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing C++ code `1 - tensor`. Test: ``` python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132 Approved by: https://github.com/mruberry, https://github.com/jjsjann123 commit b515c1d96082214e81cc57ce2a1de9164b50206f Author: Hao Guan <[email protected]> Date: Thu Nov 24 08:14:24 2022 +0000 [QAT] Check the value of numel to avoid segfault (#81547) Fixes #78123 Segmentation fault RuntimeError: numel is out of the bound of input tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547 Approved by: https://github.com/kit1980 commit 22a1b5e243e852e1c423c697e51975d1545d2a1b Author: Vasiliy Kuznetsov <[email protected]> Date: Wed Nov 23 13:01:15 2022 -0800 quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431) Summary: This PR deprecates the `compute_dtype` field on observers, and replaces it with the `is_dynamic` field on observers. This is better aligned with the reference model spec. Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431 Approved by: https://github.com/jerryzh168 commit e4ccec6ecab9b48e804d58f60135f0950fca864f Author: Yanbo Liang <[email protected]> Date: Thu Nov 24 05:28:58 2022 +0000 [Dynamo] Fix bug of using customized torch.autograd.Function (#89397) Fixes https://github.com/pytorch/torchdynamo/issues/1899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397 Approved by: https://github.com/jansel commit 903ae4570e401e5c4e42dc4a44cae37f805044a4 Author: Michael Lazos <[email protected]> Date: Thu Nov 24 04:15:34 2022 +0000 Disable optimizer tracing, enable for tests only (#89500) Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500 Approved by: https://github.com/anijain2305 commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba Author: albanD <[email protected]> Date: Thu Nov 24 03:39:55 2022 +0000 Expose to python the backward AD view_func (#89586) This will be useful for other systems (AOTAutograd) that want to replay autograd views. FYI @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586 Approved by: https://github.com/soulitzer commit 4cb6bbbe27162c7b0835879131991d2155329718 Author: Nikita Karetnikov <[email protected]> Date: Thu Nov 24 01:02:28 2022 +0100 Symintify `embedding` (#89327) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327 Approved by: https://github.com/ezyang commit 9c867eae1a7fffb6f893717073150cff04a923a4 Author: Wu, Chunyuan <[email protected]> Date: Wed Nov 23 20:10:41 2022 +0000 nnc: fix Store if value is fp32 while buf is bf16 (#86788) Fixes https://github.com/pytorch/pytorch/issues/86533. For the below graph: ```bash [DUMP kernel.cpp:1690] TensorExprKernel graph: [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)): [DUMP kernel.cpp:1690] %1 : int = prim::Constant[value=0]() [DUMP kernel.cpp:1690] %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29 [DUMP kernel.cpp:1690] %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19 [DUMP kernel.cpp:1690] return (%3) ``` **Loop stmt before the fix:** The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16. ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f; [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` **Loop stmt after the fix:** ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8)); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f); [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788 Approved by: https://github.com/EikanWang, https://github.com/kit1980 commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac Author: Zhijing Li (Accelerator Enablement) <[email protected]> Date: Thu Nov 24 02:18:32 2022 +0000 Symintified layer_norm (#89466) Summary: As titled. Test Plan: ``` buck2 run mode/opt scripts/wwei6:test_executorch ``` Differential Revision: D41451390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466 Approved by: https://github.com/frank-wei, https://github.com/ezyang commit fdb2dd113d3aec0acb2a473de6be49940ab6a115 Author: Alexander Grund <[email protected]> Date: Thu Nov 24 01:52:11 2022 +0000 Install missing VSX headers (POWER) (#85547) E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547 Approved by: https://github.com/kit1980 commit e922bd4e523b0a30f6607f6497ac458571e00131 Author: Wei-Sheng Chin <[email protected]> Date: Thu Nov 24 01:30:09 2022 +0000 [ONNX] Move two headers from .h to .cc (#86852) As title. Header dependency should be as small as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852 Approved by: https://github.com/titaiwangms, https://github.com/BowenBao commit 23fe2ff910fd1577281a2210d1184aff705191b8 Author: Shunting Zhang <[email protected]> Date: Thu Nov 24 01:28:10 2022 +0000 verify the number of outputs of xla graph (#89536) This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it. List some code snippets here since their behavior is not straightforward at a first glance: ``` def forward(self, a, b, c): """ The XLA graph will only return the first 2 items """ return a + b, a + c, b ``` ``` def forward(self, a, b, c): """ Inplace update on b cause it to be returned in XLA graph """ b.zero_() return a + b, a + c, b ``` ``` def forward(self, a, b, c): """ Even if we return b twice, the XLA graph only return b once. """ b.zero_() return a + b, a + c, b, b ``` Here are what observed by the added tests: 1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them. Check ***test_direct_return*** 2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input. The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update*** Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536 Approved by: https://github.com/jansel commit 0bde5149819e9854bca1363aa6c9f52f7db2496e Author: Nikita Shulga <[email protected]> Date: Thu Nov 24 00:57:17 2022 +0000 Add `c10::` namespace in front of `optional` (#89605) Prep change for moving the codebase to C++17 standard Was part of https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605 Approved by: https://github.com/weiwangmeta, https://github.com/kit1980 commit e19a7165fd1a9a35fcac42706c20e658776c10ab Author: foram-chandra <[email protected]> Date: Thu Nov 24 00:34:26 2022 +0000 [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905) Fixes #65909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905 Approved by: https://github.com/albanD, https://github.com/kit1980 commit a00bd6f686d7a485f7bea5f971b7e793118842b8 Author: clee2000 <[email protected]> Date: Wed Nov 23 23:48:32 2022 +0000 Don't run auto request review on forked PRs (#89583) tested on https://github.com/pytorch/pytorch/pull/89581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583 Approved by: https://github.com/albanD, https://github.com/malfet commit 0a1a53083e331b3648ad4cb6f750d130e3530731 Author: Nikita Karetnikov <[email protected]> Date: Wed Nov 23 20:42:55 2022 +0000 [primTorch] Enable regex error testing for some refs (#87765) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765 Approved by: https://github.com/mruberry commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b Author: Nikita Shulga <[email protected]> Date: Wed Nov 23 23:23:24 2022 +0000 Update default cmake to 3.18 (#89570) Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh ` Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570 Approved by: https://github.com/atalman commit 8695f0cced016d43298b43a4baf30315061fdacd Author: Jane Xu <[email protected]> Date: Wed Nov 23 23:23:17 2022 +0000 Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697) Using the same repro from the issue (but with BatchNorm2D) Rectifies native_batch_norm schema by splitting the schema into 2: 1. one will have NON-optional alias-able running_mean and running_var inputs 2. the other will just not have those parameters at all (no_stats variation) **Calling for name suggestions!** I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit` CI should pass. Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697 Approved by: https://github.com/albanD commit a00efe55c3790789b967facf10c3f426faa98155 Author: Everton Constantino <[email protected]> Date: Wed Nov 23 22:46:29 2022 +0000 Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722) `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722 Approved by: https://github.com/davidberard98 commit b8d3afd88665de5f01f696333d0ff291bd94a57b Author: Huy Do <[email protected]> Date: Wed Nov 23 22:39:36 2022 +0000 Skip upload test stats for test reports from rerun disabled tests workflow (#89548) I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699. The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping). This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages. This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats. I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text). The size of the zipped file is not a big immediate problem [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check. The script can now finish when running locally: * `upload_test_stats` finishes around 3+ minutes ``` time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master ... Writing 8925 documents to S3 Done! Writing 1760 documents to S3 Done! Writing 1675249 documents to S3 Done! python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954 1 185.69s user 12.89s system 75% cpu 4:22.82 total ``` * `check_disabled_tests` finishes within 3 minutes ``` time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch ... python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 1 154.19s user 4.17s system 97% cpu 2:42.50 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548 Approved by: https://github.com/clee2000 commit f18f0c70ab10c400947e71be30794e04dcc22acf Author: Elias Ellison <[email protected]> Date: Wed Nov 23 19:02:51 2022 +0000 Dont clone unmutated args in triton autotuning (#89519) Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great. Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519 Approved by: https://github.com/ngimel, https://github.com/jansel commit ac19c5be82febc2140d4601c98daf45646a399ab Author: Peter Bell <[email protected]> Date: Tue Nov 22 22:26:21 2022 +0000 FFT: disable dimension wrapping for scalar tensors (#89234) Fixes #88985 By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1` for scalar tensors which leads to an invalid dimension being used to index into `tensor.sizes()` as in the code sample from the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234 Approved by: https://github.com/mruberry commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e Author: Pearu Peterson <[email protected]> Date: Wed Nov 23 12:05:37 2022 +0200 Sparse CSC/BSR/BSC serialization and pickle support (#89553) Fixes https://github.com/pytorch/pytorch/issues/89497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553 Approved by: https://github.com/cpuhrsch commit a8d6b82167ef417e21c807cb29d7eabea15014da Author: Elias Ellison <[email protected]> Date: Wed Nov 23 16:47:43 2022 +0000 Fix norm decomp when dtype is passed in (#89508) Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508 Approved by: https://github.com/anijain2305 commit 72110d783344c4121730b032ca0d269896604dcf Author: Elias Ellison <[email protected]> Date: Wed Nov 23 17:03:09 2022 +0000 Fix Upsample Decomp Striding For Small Channels (#89528) Fix for https://github.com/pytorch/torchdynamo/issues/623. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528 Approved by: https://github.com/ngimel, https://github.com/anijain2305 commit b7483be06afe8d4242adeb559cfbe6e0e89419d0 Author: Jerry Zhang <[email protected]> Date: Wed Nov 23 11:03:45 2022 -0800 [quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547) Summary: no functionality changes Test Plan: NA Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547 Approved by: https://github.com/vkuzo commit a188f05e8c1788d393c072868421991dfcb55b02 Author: Natalia Gimelshein <[email protected]> Date: Wed Nov 23 20:18:54 2022 +0000 Reland #89031 Added conv constraint that infers layouts (#89530) Relands #89031 Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530 Approved by: https://github.com/Chillee commit e800d27b10137727c68cb71bccabe3a93cf38e9e Author: William Wen <[email protected]> Date: Wed Nov 23 20:11:39 2022 +0000 [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580) Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580 Approved by: https://github.com/davidberard98 commit 953f39578a7019c4c34bc1dbd6cb0facb554af79 Author: Charlie West-Taylor <[email protected]> Date: Wed Nov 23 19:51:50 2022 +0000 Mark IPU device as not supports_as_strided (#89130) Currently causes issues in calls to `.to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130 Approved by: https://github.com/albanD commit 37e46a503502cdeda791cf684522ef83b5655328 Author: Yanbo Liang <[email protected]> Date: Wed Nov 23 19:44:46 2022 +0000 [Dynamo] Fix several bugs & code refactor in RangeVariable (#89322) Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py ``` E TypeError: 'list' object cannot be interpreted as an integer E E from user code: E File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward E idx = torch.LongTensor(range(y.size(0))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322 Approved by: https://github.com/jansel commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8 Author: Xilun Wu <[email protected]> Date: Wed Nov 23 19:43:28 2022 +0000 Thread PG: add allreduce to threaded pg (#89043) Summary: Goal Add `all_reduce` collective to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a). Code Motion Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup). What's Next Add a DDP test utilizing the new allreduce op. Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`. Test Plan: cd fbcode/caffe2 buck2 test mode/dev //caffe2/test/distributed:multi_threaded Differential Revision: D41046606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043 Approved by: https://github.com/wanchaol commit 27db806888c36b029f51197a40e5196cc10792db Author: Charlie West-Taylor <[email protected]> Date: Wed Nov 23 19:41:07 2022 +0000 Handle Tensor.__deepcopy__ via clone(), on IPU (#89129) Currently it falls through to a call to `storage()`, which the IPU doesn't support. I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however... **QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this? ```python self.is_sparse or self.device.type in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"] or not torch._C._has_storage(self) or (type(self) is not Tensor and self.data_ptr() == 0) ``` If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call. The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129 Approved by: https://github.com/albanD commit fa7a963f6536dd05c381fbf23270f4f009f9f113 Author: Sergii Dymchenko <[email protected]> Date: Wed Nov 23 19:39:47 2022 +0000 Remove BaseException TODO (#89540) After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540 Approved by: https://github.com/H-Huang commit 9eed6b7f9aa4f5fc65075de3189acc9add221660 Author: Yanbo Liang <[email protected]> Date: Wed Nov 23 19:39:43 2022 +0000 [Dynamo] Several fixes on TensorVariable & TorchVariable (#89486) This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests. * Support ```tensor.type()```. * Support ```tensor.get_device()```. * Support ```torch.nn.functional._Reduction.get_enum```. * Support ```torch._utils._get_device_index()```. * Fallback ```tensor.data_ptr()```. * ```FakeTensor``` always returns 0 * For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486 Approved by: https://github.com/jansel commit f03e6672fb6a694d6f03980e3f34d8181c7cc663 Author: Iris <[email protected]> Date: Wed Nov 23 19:39:01 2022 +0000 [Checkpoint][2D] Minor update for dedup_tensors.py (#89542) Rename variables for better readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542 Approved by: https://github.com/H-Huang commit 74703eb50299b26082bc2a357770739a68460199 Author: Iris <[email protected]> Date: Wed Nov 23 19:36:01 2022 +0000 [Checkpoint] Add a logger to dedup_tensors (#89503) Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503 Approved by: https://github.com/fduwjj commit 57353c9608263df98156a73aaa6ed35a2a2306ad Author: Brian Hirsh <[email protected]> Date: Wed Nov 23 08:29:08 2022 -0800 first draft of input mutation handling for aot autograd (#88817) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817 Approved by: https://github.com/ezyang, https://github.com/wconstab commit 902e4e3926a9333178510f032580e4acd56c40da Author: PyTorch MergeBot <[email protected]> Date: Wed Nov 23 19:05:13 2022 +0000 Revert "Fix the kineto daemon build condition (#89174)" This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd. Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil. commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a Author: Bin Bao <[email protected]> Date: Wed Nov 23 02:00:44 2022 +0000 [inductor] Update CI model tests (#89499) Summary: 1) Add model inference test 2) Switch model training test to use AMP Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499 Approved by: https://github.com/bertmaher commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6 Author: Jerry Zhang <[email protected]> Date: Tue Nov 22 20:29:26 2022 -0800 [quant][be] Remove unused util code (#89272) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272 Approved by: https://github.com/andrewor14 commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009 Author: Jerry Zhang <[email protected]> Date: Tue Nov 22 20:29:26 2022 -0800 [quant][be] Refactor the error checking code for quantize_per_channel op (#89271) Summary: at Test Plan: make sure it compiles Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271 Approve…

Dependent on PR pytorch/builder#1193 Pull Request resolved: pytorch#89101 Approved by: https://github.com/kit1980

* Make sure package_type is set (pytorch#1139) * Update check_binary.sh * Update check_binary.sh * Modifying smoke test to add more advanced validation as requested (pytorch#1124) * Modify smoke test matrix More vision smoke tests Temporary pointing to my repo for testing Try 2 use atalman builder Modify path Fixing commits Testing Testing Smoke test modifications Refactor test code Fix typo Fixing image read A little more refactoring Addressing comments Testing * Add same test for windows and macos * Addressing c omments * Add manywheel special build for including pypi package (pytorch#1142) * Add manywheel special build Testing Builder change Testing Adding manywheel cuda workflow Simplify Fix expr * address comments * checking for general setting * Pass correct parameters for macos validations (pytorch#1143) * Revert "Update check_binary.sh" This reverts commit 6850bed. * Revert "Update check_binary.sh" This reverts commit 051b9d1. * setup periodic test to run binary verification pytorch/pytorch#84764: (pytorch#1144) * add a reusable workflow to run all smoke tests/or smoke tests for a specific os/channel * add workflows to schedule the periodic smoke tests for nightly and release channels * Update aarch64 script to latest one (pytorch#1146) * minor: fix the typo job name for windows binaries validation workflow (pytorch#1147) * fix the typo in the the job name for the release binaries validation workflow (pytorch#1148) issue was introduced in pytorch#1144 * Move to rc2 of 3.11 python (pytorch#1149) Need it to get several convenience functions * Integrates CUDA pip wheels (pytorch#1136) * Refactors rpath to externally set var. Adds mechanism to add metadata * Sets RUNPATH when using cudnn and cublas wheels * Escapes dollar sign * Fix rpath for cpu builds Co-authored-by: atalman <[email protected]> * Uses RPATH instead of RUNPATH so that user strictly uses pypi libs (pytorch#1150) * Binary Validation Workflow - Adding check binary script (pytorch#1127) * Update action.yml * Update validate-macos-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Fix check binary for arm64 (pytorch#1155) * Fix check binary for arm64 * Update check_binary.sh Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Nikita Shulga <[email protected]> * Fix for including nvtx dll and cudart (pytorch#1156) * Fix for invluding nvtx dll and cudart * Fix for include nvtx * Fix spaces * Back out inclusion of cudart (pytorch#1157) * Add cuda and date check to smoke test (pytorch#1145) * shorten binary validation workflow names, so they are more readable in the HUD and GH job view (pytorch#1159) * Fix anaconda torchaudio smoke test (pytorch#1161) * Fix anaconda torchaudio smoke test * Format using ufmt * Fix whels tests for torchaudio (pytorch#1162) * Pin condaforge version Most recent version fails with invalid cert error when trying to update python * Option to run resnet classifier on specific device * Fix typo `.test/smoke_test` -> `test/smoke_test` Noticed when pushed pytorch@3b93537 and no tests were run * Test resnet classifier on CUDA (pytorch#1163) * [ROCm] support for rocm5.3 wheel builds (pytorch#1160) * Updates to support rocm5.3 wheel builds (#6) * Changes to support ROCm 5.3 * Updated as per comments * Installing python before magma build - In ROCm 5.3 libtorch build are failing during magma build due to to missing python binary so added install statement * Move python install to libtorch/Dockerfile (#8) * Updating the condition for noRCCL build (#9) * Updating the condition for noRCCL build * Updated changes as per comments * Use MIOpen branch for ROCm5.3; Change all conditions to -eq * Use staging branch of MIOpen for ROCm5.3 * Fix merge conflict Fix merge conflict Co-authored-by: Pruthvi Madugundu <[email protected]> Co-authored-by: Pruthvi Madugundu <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> * Validate python 3.11 (pytorch#1165) * Validate python 3.11 * Validate linux binaries change Add options Import torchvision Adding python 3.11 install pass package to check nightly binaries date Test test Add python 3.11 code testing Adding python 3.11 test Add python 3.11 validation Adding zlib develop install Install zlib etc.. Adding zlib1g as well testing testing Adding validate windows binary Trying to workaround testing Refacor smoke test Add import statement fix datetime call * Fix stripping dev * fix import * Strip pypi-cudnn from the version.py (pytorch#1167) * Strip pypi-cudnn from the version.py * small fix * Regenerates RECORD file to reflect hash changes caused by sed'ing the version suffix (pytorch#1164) * Add pypi cudnn package to tests (pytorch#1168) * Add pypi cudnn package to tests * Fix pypi installation check * Fix pypi instructions setting * Update DEVELOPER_DIR in build_pytorch.sh Not sure why we are still expecting Xcode9 to be present there, update it to the same folder as wheel builds May be fixes pytorch/pytorch#87637 * Fix to not use sccache if it's not setup properly (pytorch#1171) * Revert "Fix to not use sccache if it's not setup properly (pytorch#1171)" (pytorch#1172) This reverts commit 377efea. * Remove cuda102 and cuda115 docker builds and regenerate manylinux docker (pytorch#1173) * Rebuild manywheel * Remove cuda102 and cuda115 * [aarch64] add mkldnn acl backend build support for pytorch cpu libary (pytorch#1104) * Only push to Docker and Anaconda repo from main (pytorch#1175) We currently allow push from any branch to go to Docker (and Anaconda) prod. This is a dangerous practice because it allows unfinished works to jump to prod and used by other workflows * Release 1.13 script changes (pytorch#1177) * Test ResNet on MPS (pytorch#1176) After pytorch/pytorch#86954 is fixed, we should be able to test resnet on MPS * Revert "Test ResNet on MPS (pytorch#1176)" (pytorch#1180) This reverts commit efa1bc7. * Add v1.13 versions * Update CMake to 3.18, needed for C++17 compilation (pytorch#1178) * release: separate out version suffixes for torch pypi promotion (pytorch#1179) * Fixup wheel published to PyPI (pytorch#1181) * Fixup wheel published to PyPI * Update prep_binary_for_pypi.sh * Fix folder deletion for pypi prep Co-authored-by: Andrey Talman <[email protected]> * Update cmake version to 3.18 for libtorch docker * Pins cuda runtime to 111.7.99 (pytorch#1182) * Fixes cuda pypi rpaths and libnvrtc name (pytorch#1183) * Allow ROCm minor releases to use the same MIOpen branch as the major release (pytorch#1170) * Allow ROCm minor releases to use the same MIOpen branch as the major release * correct logic to ensure rocm5.4 doesn't fall in wrong condition * add 11.8 workflow for docker image build (pytorch#1186) * Using windows runners from test-infra for validation workflows (pytorch#1188) * Testing new windows runners test Testing Testing testing testing test Test Testing testing Testing Testing test Test test testing testing Test testing test testing testing testing testing testing testing test test testing testing testing testing Test test test testing testing testing testing testing testing testing testing testing Refactor code * Adding details for the test-infra issue * Update current CUDA supported matrix * add magma build for CUDA11.8 (pytorch#1189) * Test setting job name (pytorch#1191) * Use official Python-3.11 tag (pytorch#1195) * remove CUDA 10.2-11.5 builds (pytorch#1194) * remove CUDA 10.2-11.5 builds * remove 11.5 and 11.3 builds * build libtorch and manywheel for 11.8 (pytorch#1190) * build libtorch and manywheel for 11.8 * Update common/install_magma.sh * use magma-cuda build-1 by default; remove CUDA 10.2-11.5 builds Co-authored-by: Andrey Talman <[email protected]> * [Validation] Pass ref:main to general worker (pytorch#1197) * Pass ref:main to general worker * Try to pass reference to workflow * Pass ref:main to general worker * Test * Pass reference as input parameter * Make new variable not required * Fix typo * Add workflow for manywheel cpu-cxx11-abi (pytorch#1198) * [Validation] Use linux_job for linux workers (pytorch#1199) * Use linux_job for linux workers Test Testing Test testing Tetsing testing Change linux binary action test Simplify version check * Fix if statement * Fix typo * Fix cuda version check Fix Audio and Vision version check Add check binary to libtorch test test testing testing testing Testing Testing testing * Use macos generic workers (pytorch#1201) * Use macos generic workers fix workflow testing Add arm64 builds test Remove validate binary action * add check binary step * fix ld_library path * add package type * Adding ref to validate binaries (pytorch#1204) * ROCm5.3 nightly wheels (pytorch#1193) * Enable ROCm5.3 nightly wheels * Enable ROCm5.3 docker builds * Update amdgpu repo url for ROCm5.3 * ROCm5.3 not supported on Ubuntu 18.04 * empty * Another empty commit * Try disabling MLIR build to shorten docker build time * Clean up disk space * MLIR project changed names from ROCm5.4 * Retrigger CI to get around flaky magma git access error * One more cmake-3.18.4 update * Use cmake-3.18 for ROCM builds * More cmake ROCM tweaks * cmake-3.18 installation on ROCM (take 3) * add conda builds for CUDA 11.8 (pytorch#1205) * Enable nightly CUDA 11.8 builds (pytorch#1206) * enable nightly builds for CUDA 11.8 * add CUDA 11.8 version to manywheel, remove 11.3 and 11.5 * Windows CUDA 11.8 changes (pytorch#1207) * Add continue on error to validation jobs (pytorch#1209) * Add continue on error to validation jobs * test * Delete unmaintaned torchvision build scripts (pytorch#1210) All build logic has long moved to torchvision repo and now is executed by reusable workflow from https://github.com/pytorch/test-infra/tree/main/.github/workflows * build_pytorch.sh replace tabs with spaces (pytorch#1211) * Make PyTorch depend on TorchTrition (pytorch#1213) Remove me when Triton is properly released elsewhere * Remove smoke test script that is no longer used (pytorch#1212) * Another tabs-to-spaces change `s/\t/\ \ \ \ \ \ \ \ /` * Disable continue on error (pytorch#1214) * Add torchtrition dependency for wheels * Make PyTorchConda depend on Triton (Take 2) Multi-line environment variables are hard, so lets do it traditional way * Revert "Add torchtrition dependency for wheels" This reverts commit 475100b. * Add TorchTrition dependency for wheels (take 2) Now tests should be green thanks to pytorch/pytorch#90017 * Add sympy to pytorch linux dependencies * Mitigate windows nightly build regressions By pinning conda to 22.9.0 Fixes pytorch/pytorch#90059 * Consolidating validation scripts (pytorch#1219) * Consolidating validation scripts * Fix validate script name * Correct script path * Correct script path * test * testing * testing * testing * testing * test * test * test * testing * testc * test hook * adding wondows use case * windows use case * test * testing * Windows fixes * more fixes * Add package type * testing more * Truncate RECORD instead of delete (pytorch#1215) * Refactor and fix windows smoke tests (pytorch#1218) * Fix windows smoke test * Fix first if statement * Refactor not to cal install nightly package * Revert "Refactor not to cal install nightly package" This reverts commit ac580c8. * Fix pip install command remove cu102 * Refacor the conda installation * Add cuda profiler apu to cuda install 11.8 (pytorch#1221) * Update CUDA upgrade runbook to mention subpackages changes As per following doc: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html * conda: Add CUDA_HOME, cuda binaries to path (pytorch#1224) * Refactor macos-arm64 into separate group (pytorch#1226) * Adding libcufft constraint (pytorch#1227) * Adding libcufft constraint * Adding rest of the dependencies * Advance build number in pytorch-cuda (pytorch#1229) * Make sympy mandatory dependency of PyTorch Should fix https://github.com/pytorch/audio/actions/runs/3684598046/jobs/6234531675 * Revert me later: Fix conda package smoke tests * Install `sympy` via pip rather than conda Needs to be reverted as well * Refactor smoke tests to configure module included in the release (pytorch#1223) * Changes to prep for pypi script for release 1.13.1 (pytorch#1231) * PyPi binary validation and size check (pytorch#1230) * Validate binary size * Validate binary size linux_job * evaluate the fix from pytorch#1231 * Add an optional artifact upload, consolidate fixes to `prep_binary_for_pypi.sh` * Adding new workflow to call from domain libraries to validate on domain libraries such as text (pytorch#1234) * Testing new workflow Fix naming fix input * Changed comments * Ad ability to call validate domain library manually (pytorch#1235) * Adding test for validate dm workflow and fixing dm validation workflow (pytorch#1236) * Test manywheel packages (pytorch#1239) Change only docker file * Bump scripts in release (pytorch#1241) * release: Strip whitespace from version_with_suffix (pytorch#1242) * Cuda 11.8 and removal of dev packages (pytorch#1243) * Adding more OS's to validate domain library workflow (pytorch#1238) * Adding more OS's to validate domain library workflow * conda and wheel togeather * add macos workflows * fix workflow * Add target os variable to windows validation (pytorch#1244) * Update MKL to 2022.1 (pytorch#1245) As previous one occasionally crashes on AMD CPUs May be addresses pytorch/pytorch#89817 Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library: ``` int mkl_serv_intel_cpu_true() { return 1; } ``` * Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196) * Installs NCCL from redist, uses system NCCL, and adds pypi RPATH * Cleans up nvrtc patches and adds it using main script * Fixes typo * Adds more dependencies and builds torch with dynamic linking * NCCL dirs have to be specified. Otherwise picks up different version * Handles 11.8 * Adds echo message for nccl 2.15 * Remove invalid git option (pytorch#1246) * Revert "Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)" (pytorch#1247) This reverts commit ee59264. * Add with_cuda flag (pytorch#1249) * Add GPU architecture env variables (pytorch#1250) * Add cuda to jobname for validate domain library (pytorch#1252) * Remove pylief dependency (pytorch#1255) * Fix PEP503 for packages with dashes * Rename `torchtriton` to `pytorch-triton` Companion change for pytorch/pytorch#91539 * s3_management: Hide specific packages between dates (pytorch#1256) * s3_management: Pin requirements.txt Packaging got updated and that's not what we want Signed-off-by: Eli Uriegas <[email protected]> * s3_management: except ValueError Signed-off-by: Eli Uriegas <[email protected]> * s3_management: Use the correct format for strptime Signed-off-by: Eli Uriegas <[email protected]> * s3_management: Bump bad dates to october 17th (pytorch#1257) * s3_management: hide torchtriton (pytorch#1258) * s3_management: Add PACKAGE_ALLOW_LIST for indices (pytorch#1259) * s3_management: Bump bad date end to 12/30 (pytorch#1260) * Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1248) * Installs NCCL from redist, uses system NCCL, and adds pypi RPATH * Cleans up nvrtc patches and adds it using main script * Fixes typo * Adds more dependencies and builds torch with dynamic linking * NCCL dirs have to be specified. Otherwise picks up different version * Handles 11.8 * Adds echo message for nccl 2.15 * Fixes logic for 11.8 and adds missing names for DEPS_SONAME * s3_management: Account for underscore packages pytorch-triton is listed as pytorch_triton Signed-off-by: Eli Uriegas <[email protected]> * s3_management: simplify allowlist, correct underscores Signed-off-by: Eli Uriegas <[email protected]> * Fix cuda version in nightly (pytorch#1261) * Adding py311 validations (pytorch#1262) * Use MATRIX_* variables instead of redeefining new var each time (pytorch#1265) * Fix validation domain library (pytorch#1266) remove ref main fix workflow more refactor * Nightly: do test install with the dependencies better and skip CUDA tests on cpu only box (pytorch#1264) * Refactor PyTorch wheel and libtorch build scripts for ROCm (pytorch#1232) * Refactor wheel and libtorch build scripts (#7) * Update to so patching for ROCm Wildcard used in grep to grab the actual numbered so file referenced in patchelf. This allows the removal of specifying the so number in DEPS_LIST & DEPS_SONAME This commit also adds the functionality for trimming so names to build_libtorch.sh from build_common.sh * Refactor to remove switch statement in build_rocm.sh This commit refactors build_rocm.sh and brings in a few major updates: - No longer required to specify the full .so name (with number) for ROCm libraries - The .so versions are copied and the patching code will fix the links to point to this version - No longer required to specify paths for ROCm libraries allowing the removal of the large switch - Paths are acquired programmatically with find - No longer required to specify both the path and filename for the OS specific libraries - Programatically extract file name from the path - Automatically extract Tensile/Kernels files for the architectures specified in PYTORCH_ROCM_ARCH and any non-arch specific files e.g. TensileLibrary.dat * rocfft/hipfft link to libhiprtc.so in ROCm5.4 (#15) Co-authored-by: Jack Taylor <[email protected]> * add sm_90 to CUDA11.8 builds (pytorch#1263) * add sm_90 to CUDA11.8 builds * Manually invoke bash for Miniconda * Revert "add sm_90 to CUDA11.8 builds (pytorch#1263)" (pytorch#1275) This reverts commit e1453a4. * Set ubuntu distribution correctly for ROCm5.3 and above (pytorch#1268) * Fix unbound variable error (pytorch#1276) Regression introduced (and ignored) by pytorch#1262 Test plan: ``` % bash -c 'set -u; if [[ -z "${FOO}" ]]; then echo "bar"; fi' bash: FOO: unbound variable (base) nshulga@nshulga-mbp builder % bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi' bar (base) nshulga@nshulga-mbp builder % FOO=1 bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi' ``` * Manually invoke bash for miniconda (pytorch#1277) Fixes build issues failing with: ``` ./Miniconda3-latest-Linux-x86_64.sh: 438: ./Miniconda3-latest-Linux-x86_64.sh: [[: not found ``` as seen in e.g.: pytorch#1271 * Fix perm Which somehow got changed by pytorch@62103bf * add sm_90 to CUDA11.8 builds (pytorch#1278) * libtinfo.so version update and logic fix for ROCm libtorch (pytorch#1270) * Use libtinfo.so.6 for Ubuntu 2004 * Fix to origname grep * Condition on ROCM_VERSION for libtinfo6 * Looks like it is not used anywhere. (pytorch#1273) * Build Windows binaries with Visual Studio 2022 Build Tools (pytorch#1240) * Build Windows binaries with Visual Studio 2022 Build Tools * Unify casing in Batch files, remove VS 2017 installation * Remove VS 2017 Conda scripts, unify casing in conda Batch scripts, minor Conda scripts tweaks * Slim down `pytorch-cuda` It should only contain runtime dependencies that PyTorch+domain libraries depend on, namely: - cudart - cublas - cusparse - cufft - curand - nvtx - nvrtc - nvjpeg (for TorchVision) This removes dependencies on NVCC, build/debug tools, etc which are not needed for running the pytorch Test Plan: `conda create -n tmp -c nvidia -c malfet cuda-toolkit==11.7` and observe that only relevant packages are installed Fixes pytorch/pytorch#91334 * [BE] Delete `unicode-flags` build options (pytorch#1284) There were relevant only for Python<=3.3 * [BE] Define `openssl_flags` (pytorch#1285) Rather than have two invocations of `./configure` * Build with `--enabled-shared` if `patchelf` is found (pytorch#1283) This is needed to make `manylinux-wheel` images usable for building new Triton binaries. Test plan: Build docker and verify that following `CMakeLists.txt` finishes successfully: ``` cmake_minimum_required(VERSION 3.6) find_package(Python3 REQUIRED COMPONENTS Interpreter Development) message(WARNING Executable ${Python3_EXECUTABLE}) message(WARNING IncludeDirs ${Python3_INCLUDE_DIRS}) message(WARNING Libraries ${Python3_LIBRARIES}) ``` * Update cudnn to 8.7.0.84 for CUDA 11.8 builds (pytorch#1271) * update cudnn to 8.7.0.84 for CUDA 11.8 builds * workaround for pytorch#1272 * Revert "workaround for pytorch#1272" This reverts commit c0b10d8. * update cudnn==8.7.0.84 for windows * [BE] Remove references to Python<3.6 (pytorch#1287) * Upgrade desired python versoin to 3.8 For libtorch builds * Fix how libtorch picks the python version * Tweak conda builds to support 3.11 Add `-c malfet` when building for 3.11 (though perhaps it's better to move numpy to pytorch channel) Tweak some build time dependencies * Fix typo * Skip triton dependency for 3.11 CUDA builds * Update build-number to 3 * Add ability to override cuda archs for conda (pytorch#1282) * [ROCm] reduce disk space used in image (pytorch#1288) Fixes pytorch#1286 * Extend MacOS/Windows builds to 3.11 By installing dependencies from pip Should be a no-op for <=3.10 * ci: Migrate to checkout@v3 (pytorch#1290) checkout@v2 is deprecated moving to checkout@v3 Signed-off-by: Eli Uriegas <[email protected]> * Fix typo * Add 3.11 option for Windows builds * Add python-3.11 download location for windows * Add pypi with cudnn package test (pytorch#1289) * Add pypi with cudnn package test * Add pypi with cudnn package test * test * test * More pypi cudnn changes * test * Fix pipy smoke test * Remove debug comments * Delete some ancient checks for MacOS builds As we no longer build for Python-2.7 or 3.5 * Add libnvjpeg-dev package as fallback (pytorch#1294) * Add libnvjpeg-dev package as fallback * Move libnvjpeg and libnvjpeg-dev to required packages * Update conda/pytorch-cuda/meta.yaml --------- Co-authored-by: Nikita Shulga <[email protected]> * Upgrade nightly wheels to rocm5.4.2 (pytorch#1225) * Upgrade nightly wheels to rocm5.4 * Adding graphic architectures for ROCm 5.4 * Updated to use ROCm5.4.1 * Updated to use ROCm5.4.2 * Fixed syntax error * Perform build on image with magma and miopen preinstalled * Add dev packages for windows pytorch-cuda dependencies (pytorch#1295) * Add dev packages for windows dependencies * Adding architecture dependent builds * Add notes around windows * fix typo * Bumping version to v3 * rocm libtorch prebuild magma; fix manylinux cmake version (pytorch#1296) * Add manywheel:cpu-cxx11-abi checkup for check_binary.sh (pytorch#1251) * Remove with_py311 flag (pytorch#1301) * rocm manylinux now uses devtoolset 9 (pytorch#1300) * fix ACL_ROOT_DIR setting and upgrade the ACL version to 22.11 (pytorch#1291) * Add `-c malfet` for Windows builds as well * Set torch._C._PYBIND11_BUILD_ABI version check only for GLIBCXX_USE_CXX11_ABI=0 (pytorch#1303) * Adding limit windows builds logic (pytorch#1297) * Adding limit windows builds logic * Remove empty space * Simplify mkl build dependencies (pytorch#1305) On Linux and Mac PyTorch must be built against `mkl=2020.x` in order to be compatible with both `mkl-2021` and `mkl-2022`, that added `.so.1` and `.so.2` files respectively, that would make binary linked against those versions incompatible with the newer/older toolchains. This is not an issue on Windows, as all mkl binaries there end with simple `.dll` * "Fix" PyTorch CPU conda testing It's still horribly broken, but make it a bit better by not installing pytorch from default anaconda channel (which installs 1.12.1 that does not have any dependencies 2.0 dev package supposed to have) For example, see this runlog https://github.com/pytorch/pytorch/actions/runs/4155371267/jobs/7189101147 * Update torch._C._PYBIND11_BUILD_ABI version check (pytorch#1306) * Skip tests for manywheel built with _GLIBCXX_USE_CXX11_ABI=1 * Put back smoke test label (pytorch#1310) * [aarch64] add support for torchdata wheel building (pytorch#1309) * Python 3.11 validation workflow tests (pytorch#1304) * Test windows py311 * Nightly binaries * Fix py311 tests * fix python calling * Revert "Nightly binaries" This reverts commit cbf80ca. * add a scheduled workflow for the nightly pypi binary size validation (compliments pytorch/test-infra#2681) (pytorch#1312) * Add regression test for pytorch/pytorch#94751 * Add 3.11 and `--pytorch-only` options * Add `lit` to list of allowed packages As it is now mandatory (albeit spurious) dependency of pytorch-triton See https://pypi.org/project/lit/ for more details * s3: Allow tar.gz as an accepted file extension (pytorch#1317) * Changes for Python 3.11 and smoke Test RC cut (pytorch#1316) * Smoke Test RC cut * Validate binaries 3.11 * test * Smoke test binaries * Fix pytorch-cuda chan download * Remove temp change * Make sure we don't use GPU runners for any of libtorch validations (pytorch#1319) * Make sure we don't use GPU runners for any of libtorch * Make sure we don't use GPU runners for any of libtorch * s3: Add pytorch_triton_rocm to index (pytorch#1323) Signed-off-by: Eli Uriegas <[email protected]> * s3: Add tqdm package req for text (pytorch#1324) * Add `--analyze-stacks` option That using `git rev-base`, prints total number of stacks, and its average, mean and max depth At the time of submission here is top 10 ghstack uses of pytorch: ``` ezyang has 462 stacks max depth is 15 avg depth is 1.70 mean is 1 awgu has 240 stacks max depth is 28 avg depth is 4.30 mean is 1 peterbell10 has 146 stacks max depth is 7 avg depth is 1.84 mean is 1 zou3519 has 128 stacks max depth is 7 avg depth is 1.98 mean is 1 jerryzh168 has 113 stacks max depth is 16 avg depth is 1.45 mean is 1 bdhirsh has 111 stacks max depth is 7 avg depth is 1.85 mean is 2 wconstab has 108 stacks max depth is 7 avg depth is 2.15 mean is 1 SherlockNoMad has 99 stacks max depth is 4 avg depth is 1.24 mean is 1 zasdfgbnm has 80 stacks max depth is 11 avg depth is 2.52 mean is 6 desertfire has 73 stacks max depth is 3 avg depth is 1.14 mean is 1 ``` * Add filelock and networkx deps (pytorch#1327) To match dependencies for wheel files defined in https://github.com/pytorch/pytorch/blob/ed1957dc1989417cb978d3070a4e3d20520674b4/setup.py#L1021-L1024 * Remove building magma from source * Revert * Upgrade cmake version to 3.22.1 to build triton (pytorch#1331) * Upgrade cmake version to 3.22.1 to build triton * Pin patcheft version * Fix comment typo * Smoke test for cuda runtime errors (pytorch#1315) * Add test for cuda runtime errors * Add cuda exception smoke test * Move cuda runtime error to end * Move cuda runtime error to end * Address comments * Address comments * Add Jinja2 Dependency (pytorch#1332) As part of the effort to fix pytorch/pytorch#95986 * Add MarkupSafe to S3 Index (pytorch#1335) * Remove rocm5.1 rocm5.2 from libtorch Dockerfile * [aarch64] Adding CI Scripts to build aarch64 wheels (pytorch#1302) * add aarch64 ci scripts * added readme. get branch from /pytorch * Add smoke tests conv,linalg,compile. And better version check. (pytorch#1333) * Add smoke tests conv,linalg,compile * Add version check * Fix typo Fix version check Add not * Add exception for python 3.11 * fix typo * Try to exit after CUDA Runtime exception * Restrict carsh test only to conda * Restrict carsh test only to conda * Fix tests * Turn off cuda runtime issue * tests * more tests * test * remove compile step * test * disable some of the tests * testing * Remove extra index url * test * Fix tests * Additional smoke tests Remove release blocking changes * Aarch64 changes for PyTorch release 2.0 (pytorch#1336) * Aarch64 changes for PyTorch release 2.0 * Fix spacing * Update aarch64_linux/build_aarch64_wheel.py Co-authored-by: Nikita Shulga <[email protected]> * Update aarch64_linux/build_aarch64_wheel.py Co-authored-by: Nikita Shulga <[email protected]> --------- Co-authored-by: Nikita Shulga <[email protected]> * Aarch64 build py3.11 fix (pytorch#1341) * Fix nightly smoke test (pytorch#1340) * Fix nightly smoke test * Fix nightly builds * Release 2.0 release scripts changes (pytorch#1342) * Release 2.0 release scripts changes * Release script modifications * Add more packages to allow list (pytorch#1344) * Add `jinja2` dependency to conda package To be consistent with wheels, see https://github.com/pytorch/pytorch/95961 * Restrict jinja to py 3.10 or less (pytorch#1345) * Update `torchtriton` version to 2.1.0 * And update trition version here as well * added smoke test for max-autotune (pytorch#1349) Co-authored-by: agunapal <[email protected]> * Refactor conda backup script (pytorch#1350) * Refacto conda backup * Fix space * Minor style * Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" (pytorch#1351) * Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" This reverts commit 18c5017. * Selective revert * Get cmake from pip * Use 3.18.2 from conda * Release script changes, add more release dependencies, bump version for aarch64 builds (pytorch#1352) * Release script changes * Add Jinja2 dependency * Fix typo * Add pytorch conda dependencies (pytorch#1353) * Add latest dependencies for pytorch 2.0 release (pytorch#1357) * Fix typo * Revert "Revert me later: Fix conda package smoke tests" This reverts commit d7f2a7c. * [aarch64] update readme with the "--enable-mkldnn" option (pytorch#1362) This needs to be enabled for official wheel building. * Replace `--enable-mkldnn` with `--disable-mkldnn` Also, change default to ubuntu-20.04 * Update AMIs Using following images: ``` % aws ec2 describe-images --image-ids ami-078eece1d8119409f ami-052eac90edaa9d08f ami-0c6c29c5125214c77 --query "Images[].[ImageId, Description]" [ [ "ami-078eece1d8119409f", "Canonical, Ubuntu, 18.04 LTS, arm64 bionic image build on 2023-03-02" ], [ "ami-0c6c29c5125214c77", "Canonical, Ubuntu, 22.04 LTS, arm64 jammy image build on 2023-03-03" ], [ "ami-052eac90edaa9d08f", "Canonical, Ubuntu, 20.04 LTS, arm64 focal image build on 2023-03-01" ] ] ``` * Update tags for domain libraries * Add PyTorch version pinning to release wheels * Fix flake8 * [BE] Introduce `build_domains` function And call it to rebuild only domains if torch wheel is available * Switch deprecated ubuntu-18.04 runner to ubuntu-latest (pytorch#1334) * Switch deprecated ubuntu-18.04 runner to self-hosted 2xlarge * Leave build-nvidia-docker for now * Apply suggestions from code review Co-authored-by: Nikita Shulga <[email protected]> * Use ephemeral runners * Use ubuntu-latest * Apply suggestions from code review Co-authored-by: Nikita Shulga <[email protected]> * Switch from latest to 22.04 to pin the version --------- Co-authored-by: Nikita Shulga <[email protected]> * Introduce optional --build-number parameter * Revert me later: Fix conda package smoke tests (cherry picked from commit d7f2a7c) Alas, it's still used and causes nightly build failures * Fix aarch64 torchvision build (pytorch#1363) * Fix torchvision image extension compilation * Fix torchvision image extension compilation * Set enable_mkldnn to pypi build * Remove unused `enable_mkldnn` for configure_system * [aarch64] Try to link statically with png/jpeg Also, add testing (which is currently broken) * Revert "Revert me later: Fix conda package smoke tests" This reverts commit ce427de. * [AARCH64] Fix image.so wheel By adding explicit libz dependency * [AARCH64] Pass `BUILD_S3` to torchdata To make build consistent with Linux-x86_64 * Revert "[AARCH64] Pass `BUILD_S3` to torchdata" This reverts commit ae8e825. As it does not want to be built on aarch64 * Add portalocker (pytorch#1364) * [BE] Error handling in build_aarch64_wheel I've noticed that build errors in `build_ArmComputeLibrary` would be ignored as semicolon is used between the commands, instead of && Also, replace nightly version evaluation by relying on torch, to rely on individual libraries * [AArch64] Pass `args.instance_type` to `start_instance` * use c++17 when building windows smoke tests (pytorch#1365) Summary: We are seeing failures during CI dealing with some headers that have nested namespaces. This is expected to remedy them. One such example: https://github.com/pytorch/pytorch/actions/runs/4510336715/jobs/7942660912 Test Plan: Test this with CI. --------- Signed-off-by: Eli Uriegas <[email protected]> Signed-off-by: Eli Uriegas <[email protected]> Co-authored-by: Andrey Talman <[email protected]> Co-authored-by: andysamfb <[email protected]> Co-authored-by: izaitsevfb <[email protected]> Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Syed Tousif Ahmed <[email protected]> Co-authored-by: Syed Tousif Ahmed <[email protected]> Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Wei Wang <[email protected]> Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Pruthvi Madugundu <[email protected]> Co-authored-by: Pruthvi Madugundu <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: snadampal <[email protected]> Co-authored-by: Eli Uriegas <[email protected]> Co-authored-by: ptrblck <[email protected]> Co-authored-by: zhuhong61 <[email protected]> Co-authored-by: Greg Roodt <[email protected]> Co-authored-by: Eli Uriegas <[email protected]> Co-authored-by: Dmytro Dzhulgakov <[email protected]> Co-authored-by: albanD <[email protected]> Co-authored-by: Radek Bartoň <[email protected]> Co-authored-by: divchenko <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Bo Li <[email protected]> Co-authored-by: Mike Schneider <[email protected]> Co-authored-by: Ankith Gunapal <[email protected]> Co-authored-by: agunapal <[email protected]> Co-authored-by: dagitses <[email protected]>

Enable ROCm5.3 nightly wheels

68b93b9

pytorch-bot bot added the module: rocm label Nov 14, 2022

facebook-github-bot added the cla signed label Nov 14, 2022

jithunnair-amd added 3 commits November 14, 2022 23:01

Enable ROCm5.3 docker builds

e278838

Update amdgpu repo url for ROCm5.3

73e280a

ROCm5.3 not supported on Ubuntu 18.04

2f49c3a

jithunnair-amd mentioned this pull request Nov 16, 2022

Upgrade nightly wheels to ROCm5.3 pytorch/pytorch#89101

Closed

jithunnair-amd added 3 commits November 16, 2022 23:04

empty

241f059

Another empty commit

5a1d744

Try disabling MLIR build to shorten docker build time

c1a5043

Clean up disk space

2e96468

jithunnair-amd force-pushed the upgrade_nightly_wheels_to_rocm5.3 branch from 78d1b28 to 2e96468 Compare November 22, 2022 19:48

jithunnair-amd added 2 commits November 22, 2022 20:16

MLIR project changed names from ROCm5.4

a029dd8

Retrigger CI to get around flaky magma git access error

4c63c8c

jithunnair-amd commented Nov 23, 2022

View reviewed changes

jithunnair-amd requested review from seemethere, malfet and atalman November 23, 2022 17:05

jithunnair-amd marked this pull request as ready for review November 23, 2022 17:07

malfet approved these changes Nov 23, 2022

View reviewed changes

malfet merged commit 1342fb5 into pytorch:main Nov 23, 2022

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Nov 24, 2022

Upgrade nightly wheels to ROCm5.3 (#89101)

a8629a1

Dependent on PR pytorch/builder#1193 Pull Request resolved: #89101 Approved by: https://github.com/kit1980

jithunnair-amd mentioned this pull request Dec 2, 2022

Update scripts to ROCm5.3 pytorch/test-infra#1219

Merged

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022

Upgrade nightly wheels to ROCm5.3 (pytorch#89101)

4d7a4d4

Dependent on PR pytorch/builder#1193 Pull Request resolved: pytorch#89101 Approved by: https://github.com/kit1980

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm5.3 nightly wheels #1193

ROCm5.3 nightly wheels #1193

jithunnair-amd commented Nov 14, 2022

jithunnair-amd commented Nov 22, 2022

jithunnair-amd commented Nov 23, 2022 •

edited

Loading

jithunnair-amd commented Nov 23, 2022 •

edited

Loading

jithunnair-amd Nov 23, 2022

malfet commented Nov 23, 2022

jithunnair-amd commented Nov 23, 2022

ROCm5.3 nightly wheels #1193

ROCm5.3 nightly wheels #1193

Conversation

jithunnair-amd commented Nov 14, 2022

jithunnair-amd commented Nov 22, 2022

jithunnair-amd commented Nov 23, 2022 • edited Loading

jithunnair-amd commented Nov 23, 2022 • edited Loading

jithunnair-amd Nov 23, 2022

Choose a reason for hiding this comment

malfet commented Nov 23, 2022

jithunnair-amd commented Nov 23, 2022

jithunnair-amd commented Nov 23, 2022 •

edited

Loading

jithunnair-amd commented Nov 23, 2022 •

edited

Loading