Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm5.3 nightly wheels #1193

Merged
merged 10 commits into from
Nov 23, 2022

Conversation

jithunnair-amd
Copy link
Collaborator

No description provided.

@jithunnair-amd
Copy link
Collaborator Author

https://github.com/pytorch/builder/actions/runs/3507244988/jobs/5874816804 failed with disk space error:

#67 11429.5 Disk Requirements:
#67 11429.5   At least 613MB more space needed on the / filesystem.

Rerunning to see if error is flaky.

@jithunnair-amd jithunnair-amd force-pushed the upgrade_nightly_wheels_to_rocm5.3 branch from 78d1b28 to 2e96468 Compare November 22, 2022 19:48
@jithunnair-amd
Copy link
Collaborator Author

jithunnair-amd commented Nov 23, 2022

manywheel jobs succeeded but libtorch job failed with: https://github.com/pytorch/builder/actions/runs/3526778914/jobs/5915095539

#18 59.07 remote: unable to authorize current user, internal server error
#18 59.07 fatal: unable to access 'https://bitbucket.org/icl/magma.git/': The requested URL returned error: 500
#18 ERROR: executor failed running [/bin/sh -c bash ./install_rocm_magma.sh && rm install_rocm_magma.sh]: exit code: 128

@jithunnair-amd
Copy link
Collaborator Author

jithunnair-amd commented Nov 23, 2022

Finally the CI gods have smiled on me :) @seemethere @atalman @malfet Could we please merge this on priority, we have a bunch of dependent PRs for nightly wheel upgrades on pytorch/vision/audio etc.?

@@ -27,7 +27,7 @@ case ${GPU_ARCH_TYPE} in
rocm)
BASE_TARGET=rocm${GPU_ARCH_VERSION}
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-ubuntu-18.04:${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCm5.3 doesn't support Ubuntu18.04

@jithunnair-amd jithunnair-amd marked this pull request as ready for review November 23, 2022 17:07
@malfet
Copy link
Contributor

malfet commented Nov 23, 2022

@jithunnair-amd one can not merge Draft PR, can he?

@jithunnair-amd
Copy link
Collaborator Author

sorry, I realized :) just moved it out of Draft

@malfet malfet merged commit 1342fb5 into pytorch:main Nov 23, 2022
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Nov 24, 2022
JakubPietrakIntel added a commit to JakubPietrakIntel/pytorch that referenced this pull request Dec 7, 2022
commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b
Author: Jakub Pietrak <[email protected]>
Date:   Thu Dec 1 13:32:03 2022 +0100

    rm print

commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760
Author: Jakub Pietrak <[email protected]>
Date:   Thu Dec 1 11:35:02 2022 +0100

    pytorch_sparse.matmul to torch.sparse.matmul

commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b
Merge: 4562359b9f 3a858ba8e3
Author: Jakub Pietrak <[email protected]>
Date:   Mon Nov 28 14:09:42 2022 +0100

    Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36

commit 4562359b9fb3de301690334a892d44911eda45c8
Merge: deba083400 b5616cd5f4
Author: Jakub Pietrak <[email protected]>
Date:   Mon Nov 28 12:22:11 2022 +0000

    Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36

commit deba0834008ad95af7e3a6603223a0f8a5555967
Merge: 0e1a8522bb a97d0508cb
Author: Jakub Pietrak <[email protected]>
Date:   Mon Nov 28 12:19:25 2022 +0000

    Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36

commit 0e1a8522bb695387816a29bbfcf182962429b3ab
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <[email protected]>
Date:   Mon Nov 28 12:16:35 2022 +0000

    Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36

commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8
Author: Michael Voznesensky <[email protected]>
Date:   Mon Nov 28 05:12:37 2022 +0000

    Add simple assert to detect fake tensors on modules (#89723)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723
    Approved by: https://github.com/ezyang

commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a
Author: Edward Z. Yang <[email protected]>
Date:   Sat Nov 26 13:52:28 2022 -0800

    Beef up AOTAutograd logging with aot_id and input descriptions (#89710)

    A few things in this PR, that I found useful while debugging some
    recent issues:

    - We now allocate an aot_id to each aot_function/aot_module invocation,
      and print it whenever we report error messages and graph output
      logging.  Check the comment for why this sort of thing is useful,
      and also why it's different from nth_graph.  This number is now
      incorporated into aot_graph_name

    - I noticed that nth_graph only gets incremented when backwards is
      compiled.  Because backwards is compiled lazily, this means that
      multiple forward graphs would have gotten the same ID!  I change
      nth_graph to always increment to avoid confusion here.

    - I added a simple describe_input function, which makes use of
      num_params_buffers to tell the user if the input index they're
      looking at is a param/buffer or an input.  With the help of
      https://github.com/pytorch/pytorch/pull/89709 we could give
      even more detailed information about inputs  (we could also
      easily give detailed information about parameters if we stored
      a mapping of index to parameter name, but I didn't need this
      when debugging so I'll let someone else add it if they need
      it.)

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710
    Approved by: https://github.com/bdhirsh

commit 5f8848f32901e35cead64d520885f718679c2bbe
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 15:26:55 2022 -0500

    Don't suppress log messages for dynamo CI config (#89653)

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653
    Approved by: https://github.com/albanD, https://github.com/kit1980

commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238
Author: Edward Z. Yang <[email protected]>
Date:   Sun Nov 27 19:27:45 2022 -0500

    Add single process version of dynamo distributed hf_Bert tests (#89721)

    It's a lot easier to debug problems in the Dynamo optimization pass if
    you aren't actually triggering a multiprocessing run.  Keep these tests
    around.

    I think the other tests can probably get this treatment too, leaving
    this to future work.

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721
    Approved by: https://github.com/voznesenskym

commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e
Author: Edward Z. Yang <[email protected]>
Date:   Sat Nov 26 11:25:24 2022 -0800

    Add debug asserts to AOTAutograd for input consistency with compilation (#89702)

    Fixes https://github.com/pytorch/torchdynamo/issues/1927

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702
    Approved by: https://github.com/bdhirsh

commit 1f95f24d3003a35568a00b5e5e18439846089b0f
Author: Edward Z. Yang <[email protected]>
Date:   Sat Nov 26 11:25:24 2022 -0800

    Factor input deduplication into a separate function (#89701)

    It turns out that instead of having a giant blobby aot_dispatch_autograd
    function, we can factor it into a series of wrapper functions, each
    of which successively guarantees more invariants on the inner
    compilation function until the final inner function is quite trivial.
    How exactly you have to wrap the input user functions and the output
    compiled functions can be expressed concisely in Haskell, so I've
    included the Haskell formulation in code comments.

    This PR shows how to do this for input deduplication.  Dealing with the
    rest of the view handling is left to future work.

    This PR should also be a slight performance improvement as deduplicating
    is skipped entirely when there are no duplicate inputs.

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701
    Approved by: https://github.com/bdhirsh

commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c
Author: Edward Z. Yang <[email protected]>
Date:   Sat Nov 26 14:28:56 2022 -0500

    Implement guard_source on RandomValueSource (#89711)

    I audited the pattern matches on the enum and it didn't
    look like this one should apply there.

    Sorry, no test, I know this matters on symbolic-shapes branch
    but I haven't had time to extract out a minimal reproducer.
    Take my word for it.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711
    Approved by: https://github.com/jansel

commit 1da633f98a5da000083c0c47d9e192b2689f867b
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 13:57:17 2022 +0000

    Access named parameters/buffers/etc via getattr rather than index (#89625)

    I'm not sure why this never caused problems before.  The error
    manifests as `TypeError: 'MyModule' object is not subscriptable`

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625
    Approved by: https://github.com/albanD

commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0
Author: Horace He <[email protected]>
Date:   Thu Nov 24 02:17:37 2022 +0000

    Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171
    Approved by: https://github.com/ngimel

commit b709078dc673cbd5025a1df3eae7f5c60acc2698
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:21 2022 -0800

    [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926)

    There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.)

    Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926
    Approved by: https://github.com/chaekit

commit 143d2881a844934c95c4ada63b38179d97e65af3
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:19 2022 -0800

    [Profiler] Memory profiler part 10: Mark optimizer state (#88925)

    This is also a fairly simple pass, since we're simply collecting values from the python tracer.

    Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925
    Approved by: https://github.com/chaekit

commit ae725d501e33ed6f823997bea03d99cdc8dae5ff
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:18 2022 -0800

    [Profiler] Memory profiler part 9: Mark activations (#88924)

    This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass.

    Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924
    Approved by: https://github.com/chaekit

commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da
Author: Yuxin Wu <[email protected]>
Date:   Sun Nov 27 05:55:24 2022 +0000

    Let SyncBatchNorm fallback to BN if not using distributed training (#89706)

    Fixes #63662
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706
    Approved by: https://github.com/soumith

commit 39449ea61d9a6644731687219282f610cbf7cf54
Author: PyTorch MergeBot <[email protected]>
Date:   Sun Nov 27 02:59:04 2022 +0000

    [vision hash update] update the pinned vision hash (#89692)

    This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
    Update the pinned vision hash.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692
    Approved by: https://github.com/pytorchbot

commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:16 2022 -0800

    [Profiler] E2E expecttests for category assignment (#88653)

    Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.)

    The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler.

    Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653
    Approved by: https://github.com/chaekit

commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:14 2022 -0800

    [Profiler] Memory profiler part 8: Mark parameters. (#87568)

    Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates.

    Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.)

    Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568
    Approved by: https://github.com/chaekit

commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:13 2022 -0800

    [Profiler] Memory profiler part 7: Mark inputs (#87567)

    It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference.

    Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements.

    Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567
    Approved by: https://github.com/chaekit

commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:11 2022 -0800

    [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566)

    Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories.

    We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage.

    Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node.

    Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.)

    Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566
    Approved by: https://github.com/chaekit

commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:09 2022 -0800

    [Profiler] Memory profiler part 5: Data flow graph (#87006)

    The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs.

    It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node.

    Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006
    Approved by: https://github.com/chaekit

commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66
Author: Taylor Robie <[email protected]>
Date:   Sat Nov 26 10:33:08 2022 -0800

    [Profiler] Memory profiler part 4: Select top level torch ops (#86880)

    In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is.

    Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880
    Approved by: https://github.com/chaekit

commit bb77accb4c996e3aab9ae4b665fb8464400c8194
Author: Jiong Gong <[email protected]>
Date:   Sat Nov 26 14:06:44 2022 +0000

    [Inductor] Record cpp kernel in PyTorch Profiler (#89367)

    Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367
    Approved by: https://github.com/jansel

commit 36018a6ee63f140b95ad644d09920798b0c624f8
Author: Edward Z. Yang <[email protected]>
Date:   Fri Nov 25 13:48:35 2022 -0800

    Don't suppress exceptions from backends (#89656)

    Taken from voz's https://github.com/pytorch/pytorch/pull/89392

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656
    Approved by: https://github.com/voznesenskym

commit 3e20d023b1f442ebe59e76604395cd8d4abed52a
Author: Natalia Gimelshein <[email protected]>
Date:   Sat Nov 26 03:08:23 2022 +0000

    put descriptive kernel names behind config (#89697)

    Per title, generated kernel names are often long and confusing.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697
    Approved by: https://github.com/Chillee

commit 591dfffa38848de54b7f5f4e49260847024c9281
Author: jlukehubbard <[email protected]>
Date:   Fri Nov 25 21:31:53 2022 +0000

    update docstring for torch.linalg.lstsq (#89383)

    Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA.

    Fixes #85021

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383
    Approved by: https://github.com/lezcano

commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2
Author: Edward Z. Yang <[email protected]>
Date:   Fri Nov 25 03:31:20 2022 +0000

    Simplify aot_module_simplified by removing top_args/top_kwargs (#89666)

    This makes good on Chillee's CR comment at
    https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222
    which was never done in the original PR.

    There is no logic change, just unpack the args/kwargs at the top
    level and remove the inner function indirection.

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666
    Approved by: https://github.com/voznesenskym

commit 6168f22fae66da5703e087bcd10076921ca157e7
Author: Edward Z. Yang <[email protected]>
Date:   Fri Nov 25 03:31:19 2022 +0000

    Don't support kwargs at runtime in aot_module_simplified (#89664)

    The preexisting logic here added in
    https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs
    was non-empty, then the inner compiled function supports kwargs.  Naively, this
    would leave you to expect that there is some sort of correlation between
    top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
    is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
    kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
    But (1) we don't support this (the function to be compiled only takes a list
    of tensors) and (2) even if we did support it, conditioning on whether or not
    you had passed AOTAutograd configuration kwargs to support kwargs at runtime
    is bonkers.

    So delete it.

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664
    Approved by: https://github.com/voznesenskym

commit b04dda4291f1d30b064572e4521e82fa2573af77
Author: Edward Z. Yang <[email protected]>
Date:   Fri Nov 25 03:31:19 2022 +0000

    Delay verify correctness wrapping to call site. (#89662)

    There is only one call site for compiler_fn, so we can safely delay
    wrapping verify correctness to here.  This will help later when we
    change the backend compiler calling convention to pass fake tensors
    (but I need to pass real tensors here.)

    This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392
    but with less changes to the substantive logic.  I only moved the relevant
    inner implementation; there are no changes otherwise.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662
    Approved by: https://github.com/voznesenskym

commit 61a3fe4b6409965223273c1098f9a77ff071efe1
Author: Natalia Gimelshein <[email protected]>
Date:   Fri Nov 25 19:42:38 2022 +0000

    make inductor correctly propagate nans for maximum and minimum (#89612)

    Partially fixes https://github.com/pytorch/torchdynamo/issues/594
    Also, small cleanup for `where` codegen

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612
    Approved by: https://github.com/soumith, https://github.com/jansel

commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f
Author: Ikko Ashimine <[email protected]>
Date:   Fri Nov 25 19:26:18 2022 +0000

    Fix typo in segment_reduction_op_gpu.cu (#89647)

    menber -> member

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647
    Approved by: https://github.com/kit1980

commit 2c0bd85c755043d696452ddab354f3ff6775738b
Author: kshitij12345 <[email protected]>
Date:   Fri Nov 25 14:53:57 2022 +0000

    complex: register c10::complex with py::cast (#89680)

    Fixes #77134

    TODO:
    * [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?)

    ```c++

    namespace py = pybind11;

    int main() {
        py::scoped_interpreter guard{}; // start the interpreter
        auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_cdouble)));

        auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_cfloat)));

        auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_chalf)));
    }

    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680
    Approved by: https://github.com/ezyang

commit a97d0508cb5259951bc48300fb914cebdf322bb9
Merge: 849be586e6 abb446af8c
Author: Jakub Pietrak <[email protected]>
Date:   Fri Nov 25 15:24:54 2022 +0100

    Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36

commit 849be586e649421ba58182feb9067a4ac65479e3
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <[email protected]>
Date:   Fri Nov 25 14:25:40 2022 +0100

    Merge branch 'gh/mingfeima/85/head' into pyg-36

commit abb446af8c65a49bbc3767e14605a73d244c176b
Author: Alvaro Gaona <[email protected]>
Date:   Fri Nov 25 11:09:28 2022 +0000

    Implement old windows in Python (#87082)

    Relates to #85366

    - Bartlett, Blackman, Hamming, Hann.
    - Except Kaiser which will be in a different PR

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082
    Approved by: https://github.com/mruberry, https://github.com/lezcano

commit 059a238619b122f922c569c618919a277420e483
Merge: 26ba2e9751 95ea47ef0c
Author: Jakub Pietrak <[email protected]>
Date:   Fri Nov 25 10:00:53 2022 +0100

    Merge branch 'pytorch:master' into jpietrak/pyg-36

commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e
Author: Jason Ansel <[email protected]>
Date:   Fri Nov 25 04:28:36 2022 +0000

    torchdynamo to torch._dynamo in aot_autograd.py (#89385)

    Test Plan: Run torchbench models

    Differential Revision: D41429573

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385
    Approved by: https://github.com/soumith, https://github.com/malfet

commit 69043247819042db18ac9526c2d747fa61fe8880
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 12:00:13 2022 -0800

    Remove fake_tensor_propagation (#89646)

    You always have to run dynamo with fake tensors.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646
    Approved by: https://github.com/soumith

commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 12:00:12 2022 -0800

    xfail maml test, instead of running it without fake tensor prop (#89645)

    A previous version of this patch graph breaks when torch.tensor fails, but that causes

    ```
    PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32
    ```

    to start failing. Probably another latent bug that needs investigating.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645
    Approved by: https://github.com/albanD

commit a048913e2530442360c36a48420079ca9ebca149
Author: PyTorch MergeBot <[email protected]>
Date:   Fri Nov 25 03:03:41 2022 +0000

    [vision hash update] update the pinned vision hash (#89667)

    This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
    Update the pinned vision hash.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667
    Approved by: https://github.com/pytorchbot

commit 3b3ebcd031b68762938806f541d7247a1521bb11
Author: XiaobingSuper <[email protected]>
Date:   Thu Nov 24 02:33:01 2022 -0500

     TorchDynamo: weight prepack for single conv (#89209)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 0c4f3db7bf24e94125c6802718a1105ee548c953
Author: XiaobingSuper <[email protected]>
Date:   Thu Nov 24 02:32:59 2022 -0500

    TorchDynamo: weight prepack for mkl linear (#89109)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e
Author: XiaobingSuper <[email protected]>
Date:   Thu Nov 24 02:32:55 2022 -0500

    TorchDynamo: weight prepack for onednn convolution external call (#88988)

    This PR is about enabled weight prepack using the MKLDNN tensor:
    1.  enable fake tensor mode for MKLDNN tensor input.
    2.  make convolution fusion kernel support MKLDNN tensor input.
    3. do the weight prepack at FX fusion step.

    For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 0884fdaba0280e3f3ad2abc34c0940587f744886
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 14:31:00 2022 -0500

    Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652)

    This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf.

    Testing to see if this fixes gmixer_24_224 mixer_b16_224

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652
    Approved by: https://github.com/eellison

commit 4a16f8cdb26be3561742e86f184e59f65418fe63
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 09:00:09 2022 -0800

    Reenable fake_tensor_propagation on test_cudnn_rnn (#89644)

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644
    Approved by: https://github.com/anjali411

commit fc7dcb684aa38da5b1534fc701657ee63af8909c
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 09:00:09 2022 -0800

    Run optimizer tests with fake tensors (#89643)

    This is a slight regression: RAdam and Adagrad don't appear to
    trace at all under fake tensors.  But I think this is a more accurate
    reflection of the current state of affairs.

    Along the way fix some problems on the fake tensor path.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643
    Approved by: https://github.com/anjali411

commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 09:00:08 2022 -0800

    Force test_rng_state to run with fake tensor prop (#89641)

    I'm not really sure what desertfire's intended follow up was
    on https://github.com/pytorch/pytorch/pull/87490 because when I remove
    the unsupported() call, dynamo tests pass.  But the change here is
    conservative and I think strictly better than the current situation.
    The idea is to force fake tensor pop on for the test, and then just
    observe that we are doing a graph break.  Clearly, export doesn't work,
    so I manually xfail it.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641
    Approved by: https://github.com/anjali411

commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 09:00:08 2022 -0800

    Easy: These tests work with fake_tensor_propagation on (#89640)

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640
    Approved by: https://github.com/anjali411, https://github.com/albanD

commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 08:11:48 2022 -0800

    Support unspecialized integers with dynamic shapes (#89639)

    Previously, we hackily wrapped unspecialized integers into
    tensors and treated them as tensor inputs.  Sometimes, downstream
    operations would not be able to deal with the tensor input.  Now,
    we wrap them into SymInt, so more correct overload selection occurs.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639
    Approved by: https://github.com/anjali411

commit 0c96841a20f0ae9380ef26657914276a42c9c9d7
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 08:11:47 2022 -0800

    Cond capture with fake tensors actually works; don't raise in this case (#89638)

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638
    Approved by: https://github.com/anjali411

commit d3c012f409a4e4d5a11070a90b5578da82778030
Author: kshitij12345 <[email protected]>
Date:   Thu Nov 24 21:41:20 2022 +0000

    [test_nn] split pruning tests from test_nn (#89590)

    Ref: https://github.com/pytorch/pytorch/issues/63085

    Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590
    Approved by: https://github.com/albanD

commit 83666f167dcf023d301f16fad82b9afb374ad836
Author: Aleksandar Samardžić <[email protected]>
Date:   Thu Nov 24 14:44:12 2022 +0000

    Added vectorized CPU code for uint8_t datatype. (#89284)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284
    Approved by: https://github.com/lezcano, https://github.com/peterbell10

commit 9497552771ca59c68509398ab3094e590a3047c5
Author: Howard Huang <[email protected]>
Date:   Thu Nov 24 19:41:17 2022 +0000

    Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521)

    Summary: Fixes https://github.com/pytorch/pytorch/issues/88568

    `_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor`

    Test Plan: CI

    Differential Revision: D41479983

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521
    Approved by: https://github.com/wz337

commit 94a88b53ed37854379813abf9641d1637fe2688b
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 08:11:46 2022 -0800

    Remove fake_tensors_available (#89637)

    As we are one repo now, they are always available.

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637
    Approved by: https://github.com/anjali411

commit 1c8b0779de76d0c76d34835047106ab37b41790b
Author: Emilio Castillo <[email protected]>
Date:   Thu Nov 24 18:25:26 2022 +0000

    Fix segfault when swapping custom allocator (#89613)

    Just screwed it before merging ...

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613
    Approved by: https://github.com/albanD

commit fd279fe85b8f5a8e74c615436f0b180621b6ef52
Author: Edward Z. Yang <[email protected]>
Date:   Thu Nov 24 09:23:05 2022 -0500

    Make pytest work again on test/dynamo (#89631)

    Signed-off-by: Edward Z. Yang <[email protected]>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631
    Approved by: https://github.com/anjali411

commit c3e85d879cdbd3973754760c6767c75276b1dca8
Author: albanD <[email protected]>
Date:   Thu Nov 24 17:11:42 2022 +0000

    Mention discrepency between original impl and our impl of RAdam (#89575)

    Fixes https://github.com/pytorch/pytorch/issues/88836

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575
    Approved by: https://github.com/mruberry

commit 860bae49e4925868a0221ec4345d08407280bac7
Author: Edward Z. Yang <[email protected]>
Date:   Wed Nov 23 08:04:31 2022 -0800

    Suppress guards on as_strided call only. (#89569)

    See comment in meta_utils.py for the whole story.

    This doesn't have a substantive impact yet, but will in the next
    PR on the stack.

    Signed-off-by: Edward Z. Yang <[email protected]>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569
    Approved by: https://github.com/albanD

commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9
Author: mfkasim1 <[email protected]>
Date:   Thu Nov 24 11:11:51 2022 +0000

    Added log1p for complex in c10 (#89214)

    One PR towards #89205.
    The content is mostly from PR #38465, but slightly changed the expression to make it faster.

    Here are some benchmarking code:
    ```c++

    // main.cc

    template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) {
        // this PR
        T x = z.real();
        T y = z.imag();
        T theta = std::atan2(y, x + T(1));
        T r = x * (x + T(2)) + y * y;
        return {T(0.5) * std::log1p(r), theta};
    }

    template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) {
        // PR #38465
        T x = z.real();
        T y = z.imag();
        std::complex<T> p1 = z + T(1);
        T r = std::abs(p1);
        T a = std::arg(p1);
        T rm1 = (x * x + y * y + x * T(2)) / (r + 1);
        return {std::log1p(rm1), a};
    }

    template<typename T>
    inline std::complex<T> log1p_v2(const std::complex<T> &z) {
        // naive, but numerically inaccurate
        return std::log(T(1) + z);
    }

    int main() {
        int n = 1000000;
        std::complex<float> res(0.0, 0.0);
        std::complex<float> input(0.5, 2.0);
        auto start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v0(input);
        }
        auto end = std::chrono::system_clock::now();
        auto elapsed = end - start;
        std::cout << "time for v0: " << elapsed.count() << '\n';

        start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v1(input);
        }
        end = std::chrono::system_clock::now();
        elapsed = end - start;
        std::cout << "time for v1: " << elapsed.count() << '\n';

        start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v2(input);
        }
        end = std::chrono::system_clock::now();
        elapsed = end - start;
        std::cout << "time for v2: " << elapsed.count() << '\n';
        std::cout << res << '\n';
    }
    ```

    Compiling the script with command `g++ main.cc` produces the following results:
    ```
    time for v0: 237812271
    time for v1: 414524941
    time for v2: 360585994
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214
    Approved by: https://github.com/lezcano

commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337
Author: Jiewen Tan <[email protected]>
Date:   Thu Nov 24 10:57:01 2022 +0000

    [LTC] Refine MetricsArena::Reset (#89608)

    Summary:
    After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b.

    This is paired with pytorch/xla#4217.

    Test Plan:
    PJRT_DEVICE=CPU python xla/test/test_metrics.py

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608
    Approved by: https://github.com/JackCaoG

commit a8629a1c18fd13300ce69c1d6042004038885cf0
Author: Jithun Nair <[email protected]>
Date:   Thu Nov 24 10:53:20 2022 +0000

    Upgrade nightly wheels to ROCm5.3 (#89101)

    Dependent on PR https://github.com/pytorch/builder/pull/1193

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101
    Approved by: https://github.com/kit1980

commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188
Author: Ivan Yashchuk <[email protected]>
Date:   Thu Nov 24 09:37:10 2022 +0000

    Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132)

    I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing  C++ code `1 - tensor`.

    Test:
    ```
    python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor"
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132
    Approved by: https://github.com/mruberry, https://github.com/jjsjann123

commit b515c1d96082214e81cc57ce2a1de9164b50206f
Author: Hao Guan <[email protected]>
Date:   Thu Nov 24 08:14:24 2022 +0000

    [QAT] Check the value of numel to avoid segfault (#81547)

    Fixes #78123

    Segmentation fault

    RuntimeError: numel is out of the bound of input tensor
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547
    Approved by: https://github.com/kit1980

commit 22a1b5e243e852e1c423c697e51975d1545d2a1b
Author: Vasiliy Kuznetsov <[email protected]>
Date:   Wed Nov 23 13:01:15 2022 -0800

    quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)

    Summary:

    This PR deprecates the `compute_dtype` field on observers, and replaces
    it with the `is_dynamic` field on observers.  This is better aligned
    with the reference model spec.

    Test plan:

    ```
    python test/test_quantization.py TestQuantizeFx
    python test/test_quantization.py TestQuantizeFxOps
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
    Approved by: https://github.com/jerryzh168

commit e4ccec6ecab9b48e804d58f60135f0950fca864f
Author: Yanbo Liang <[email protected]>
Date:   Thu Nov 24 05:28:58 2022 +0000

    [Dynamo] Fix bug of using customized torch.autograd.Function (#89397)

    Fixes https://github.com/pytorch/torchdynamo/issues/1899

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397
    Approved by: https://github.com/jansel

commit 903ae4570e401e5c4e42dc4a44cae37f805044a4
Author: Michael Lazos <[email protected]>
Date:   Thu Nov 24 04:15:34 2022 +0000

    Disable optimizer tracing, enable for tests only (#89500)

    Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500
    Approved by: https://github.com/anijain2305

commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba
Author: albanD <[email protected]>
Date:   Thu Nov 24 03:39:55 2022 +0000

    Expose to python the backward AD view_func (#89586)

    This will be useful for other systems (AOTAutograd) that want to replay autograd views.

    FYI @bdhirsh
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586
    Approved by: https://github.com/soulitzer

commit 4cb6bbbe27162c7b0835879131991d2155329718
Author: Nikita Karetnikov <[email protected]>
Date:   Thu Nov 24 01:02:28 2022 +0100

    Symintify `embedding` (#89327)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
    Approved by: https://github.com/ezyang

commit 9c867eae1a7fffb6f893717073150cff04a923a4
Author: Wu, Chunyuan <[email protected]>
Date:   Wed Nov 23 20:10:41 2022 +0000

    nnc: fix Store if value is fp32 while buf is bf16 (#86788)

    Fixes https://github.com/pytorch/pytorch/issues/86533.
    For the below graph:
    ```bash
    [DUMP kernel.cpp:1690] TensorExprKernel graph:
    [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)):
    [DUMP kernel.cpp:1690]   %1 : int = prim::Constant[value=0]()
    [DUMP kernel.cpp:1690]   %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29
    [DUMP kernel.cpp:1690]   %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19
    [DUMP kernel.cpp:1690]   return (%3)
    ```

    **Loop stmt before the fix:**
    The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16.
    ```bash
    [DEBUG llvm_codegen.cpp:489] After HalfRewriter {
    [DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8);
    [DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
    [DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f;
    [DEBUG llvm_codegen.cpp:489]   }
    [DEBUG llvm_codegen.cpp:489] }
    ```

    **Loop stmt after the fix:**
    ```bash
    [DEBUG llvm_codegen.cpp:489] After HalfRewriter {
    [DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8));
    [DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
    [DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f);
    [DEBUG llvm_codegen.cpp:489]   }
    [DEBUG llvm_codegen.cpp:489] }
    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788
    Approved by: https://github.com/EikanWang, https://github.com/kit1980

commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac
Author: Zhijing Li (Accelerator Enablement) <[email protected]>
Date:   Thu Nov 24 02:18:32 2022 +0000

    Symintified layer_norm (#89466)

    Summary: As titled.

    Test Plan:
    ```
    buck2 run mode/opt scripts/wwei6:test_executorch
    ```

    Differential Revision: D41451390

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466
    Approved by: https://github.com/frank-wei, https://github.com/ezyang

commit fdb2dd113d3aec0acb2a473de6be49940ab6a115
Author: Alexander Grund <[email protected]>
Date:   Thu Nov 24 01:52:11 2022 +0000

    Install missing VSX headers (POWER) (#85547)

    E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547
    Approved by: https://github.com/kit1980

commit e922bd4e523b0a30f6607f6497ac458571e00131
Author: Wei-Sheng Chin <[email protected]>
Date:   Thu Nov 24 01:30:09 2022 +0000

    [ONNX] Move two headers from .h to .cc (#86852)

    As title. Header dependency should be as small as possible.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852
    Approved by: https://github.com/titaiwangms, https://github.com/BowenBao

commit 23fe2ff910fd1577281a2210d1184aff705191b8
Author: Shunting Zhang <[email protected]>
Date:   Thu Nov 24 01:28:10 2022 +0000

    verify the number of outputs of xla graph (#89536)

    This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it.

    List some code snippets here since their behavior is not straightforward at a first glance:
    ```
        def forward(self, a, b, c):
            """
            The XLA graph will only return the first 2 items
            """
            return a + b, a + c, b
    ```

    ```
        def forward(self, a, b, c):
            """
            Inplace update on b cause it to be returned in XLA graph
            """
            b.zero_()
            return a + b, a + c, b
    ```

    ```
        def forward(self, a, b, c):
            """
            Even if we return b twice, the XLA graph only return b once.
            """
            b.zero_()
            return a + b, a + c, b, b
    ```

    Here are what observed by the added tests:

    1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them.  Check ***test_direct_return***
    2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input.  The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update***

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536
    Approved by: https://github.com/jansel

commit 0bde5149819e9854bca1363aa6c9f52f7db2496e
Author: Nikita Shulga <[email protected]>
Date:   Thu Nov 24 00:57:17 2022 +0000

    Add `c10::` namespace in front of `optional` (#89605)

    Prep change for moving the codebase to C++17 standard
    Was part of https://github.com/pytorch/pytorch/pull/85969

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605
    Approved by: https://github.com/weiwangmeta, https://github.com/kit1980

commit e19a7165fd1a9a35fcac42706c20e658776c10ab
Author: foram-chandra <[email protected]>
Date:   Thu Nov 24 00:34:26 2022 +0000

    [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905)

    Fixes #65909

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905
    Approved by: https://github.com/albanD, https://github.com/kit1980

commit a00bd6f686d7a485f7bea5f971b7e793118842b8
Author: clee2000 <[email protected]>
Date:   Wed Nov 23 23:48:32 2022 +0000

    Don't run auto request review on forked PRs (#89583)

    tested on https://github.com/pytorch/pytorch/pull/89581
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583
    Approved by: https://github.com/albanD, https://github.com/malfet

commit 0a1a53083e331b3648ad4cb6f750d130e3530731
Author: Nikita Karetnikov <[email protected]>
Date:   Wed Nov 23 20:42:55 2022 +0000

    [primTorch] Enable regex error testing for some refs (#87765)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765
    Approved by: https://github.com/mruberry

commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b
Author: Nikita Shulga <[email protected]>
Date:   Wed Nov 23 23:23:24 2022 +0000

    Update default cmake to 3.18 (#89570)

    Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh `
    Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570
    Approved by: https://github.com/atalman

commit 8695f0cced016d43298b43a4baf30315061fdacd
Author: Jane Xu <[email protected]>
Date:   Wed Nov 23 23:23:17 2022 +0000

    Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697)

    Using the same repro from the issue (but with BatchNorm2D)

    Rectifies native_batch_norm schema by splitting the schema into 2:
    1. one will have NON-optional alias-able running_mean and running_var inputs
    2. the other will just not have those parameters at all (no_stats variation)

    **Calling for name suggestions!**
    I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
    CI should pass.
    Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
    Approved by: https://github.com/albanD

commit a00efe55c3790789b967facf10c3f426faa98155
Author: Everton Constantino <[email protected]>
Date:   Wed Nov 23 22:46:29 2022 +0000

    Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722)

    `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722
    Approved by: https://github.com/davidberard98

commit b8d3afd88665de5f01f696333d0ff291bd94a57b
Author: Huy Do <[email protected]>
Date:   Wed Nov 23 22:39:36 2022 +0000

    Skip upload test stats for test reports from rerun disabled tests workflow (#89548)

    I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699.  The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping).  This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages.

    This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats.

    I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text).  The size of the zipped file is not a big immediate problem

    [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check.  The script can now finish when running locally:

    * `upload_test_stats` finishes around 3+ minutes
    ```
    time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master
    ...
    Writing 8925 documents to S3
    Done!
    Writing 1760 documents to S3
    Done!
    Writing 1675249 documents to S3
    Done!
    python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954  1    185.69s user 12.89s system 75% cpu 4:22.82 total
    ```

    * `check_disabled_tests` finishes within 3 minutes
    ```
    time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch
    ...
    python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954  1    154.19s user 4.17s system 97% cpu 2:42.50 total
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548
    Approved by: https://github.com/clee2000

commit f18f0c70ab10c400947e71be30794e04dcc22acf
Author: Elias Ellison <[email protected]>
Date:   Wed Nov 23 19:02:51 2022 +0000

    Dont clone unmutated args in triton autotuning (#89519)

    Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great.

    Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519
    Approved by: https://github.com/ngimel, https://github.com/jansel

commit ac19c5be82febc2140d4601c98daf45646a399ab
Author: Peter Bell <[email protected]>
Date:   Tue Nov 22 22:26:21 2022 +0000

    FFT: disable dimension wrapping for scalar tensors (#89234)

    Fixes #88985

    By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1`
    for scalar tensors which leads to an invalid dimension being used to
    index into `tensor.sizes()` as in the code sample from the issue.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234
    Approved by: https://github.com/mruberry

commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e
Author: Pearu Peterson <[email protected]>
Date:   Wed Nov 23 12:05:37 2022 +0200

    Sparse CSC/BSR/BSC serialization and pickle support (#89553)

    Fixes https://github.com/pytorch/pytorch/issues/89497

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
    Approved by: https://github.com/cpuhrsch

commit a8d6b82167ef417e21c807cb29d7eabea15014da
Author: Elias Ellison <[email protected]>
Date:   Wed Nov 23 16:47:43 2022 +0000

    Fix norm decomp when dtype is passed in (#89508)

    Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508
    Approved by: https://github.com/anijain2305

commit 72110d783344c4121730b032ca0d269896604dcf
Author: Elias Ellison <[email protected]>
Date:   Wed Nov 23 17:03:09 2022 +0000

    Fix Upsample Decomp Striding For Small Channels (#89528)

    Fix for https://github.com/pytorch/torchdynamo/issues/623.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528
    Approved by: https://github.com/ngimel, https://github.com/anijain2305

commit b7483be06afe8d4242adeb559cfbe6e0e89419d0
Author: Jerry Zhang <[email protected]>
Date:   Wed Nov 23 11:03:45 2022 -0800

    [quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547)

    Summary:
    no functionality changes

    Test Plan:
    NA

    Reviewers:

    Subscribers:

    Tasks:

    Tags:

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547
    Approved by: https://github.com/vkuzo

commit a188f05e8c1788d393c072868421991dfcb55b02
Author: Natalia Gimelshein <[email protected]>
Date:   Wed Nov 23 20:18:54 2022 +0000

    Reland #89031 Added conv constraint that infers layouts (#89530)

    Relands #89031
    Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530
    Approved by: https://github.com/Chillee

commit e800d27b10137727c68cb71bccabe3a93cf38e9e
Author: William Wen <[email protected]>
Date:   Wed Nov 23 20:11:39 2022 +0000

    [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580)

    Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580
    Approved by: https://github.com/davidberard98

commit 953f39578a7019c4c34bc1dbd6cb0facb554af79
Author: Charlie West-Taylor <[email protected]>
Date:   Wed Nov 23 19:51:50 2022 +0000

    Mark IPU device as not supports_as_strided (#89130)

    Currently causes issues in calls to `.to`.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130
    Approved by: https://github.com/albanD

commit 37e46a503502cdeda791cf684522ef83b5655328
Author: Yanbo Liang <[email protected]>
Date:   Wed Nov 23 19:44:46 2022 +0000

    [Dynamo] Fix several bugs & code refactor in RangeVariable (#89322)

    Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py
    ```
    E       TypeError: 'list' object cannot be interpreted as an integer
    E
    E       from user code:
    E          File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward
    E           idx = torch.LongTensor(range(y.size(0)))
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322
    Approved by: https://github.com/jansel

commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8
Author: Xilun Wu <[email protected]>
Date:   Wed Nov 23 19:43:28 2022 +0000

    Thread PG: add allreduce to threaded pg (#89043)

    Summary:
    Goal
    Add `all_reduce` collective  to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a).

    Code Motion
    Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup).

    What's Next
    Add a DDP test utilizing the new allreduce op.
    Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`.

    Test Plan:
    cd fbcode/caffe2
    buck2 test mode/dev //caffe2/test/distributed:multi_threaded

    Differential Revision: D41046606

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043
    Approved by: https://github.com/wanchaol

commit 27db806888c36b029f51197a40e5196cc10792db
Author: Charlie West-Taylor <[email protected]>
Date:   Wed Nov 23 19:41:07 2022 +0000

    Handle Tensor.__deepcopy__ via clone(), on IPU (#89129)

    Currently it falls through to a call to `storage()`, which the IPU doesn't support.

    I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however...

    **QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this?

    ```python
    self.is_sparse
    or self.device.type
    in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"]
    or not torch._C._has_storage(self)
    or (type(self) is not Tensor and self.data_ptr() == 0)
    ```

    If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call.

    The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129
    Approved by: https://github.com/albanD

commit fa7a963f6536dd05c381fbf23270f4f009f9f113
Author: Sergii Dymchenko <[email protected]>
Date:   Wed Nov 23 19:39:47 2022 +0000

    Remove BaseException TODO (#89540)

    After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540
    Approved by: https://github.com/H-Huang

commit 9eed6b7f9aa4f5fc65075de3189acc9add221660
Author: Yanbo Liang <[email protected]>
Date:   Wed Nov 23 19:39:43 2022 +0000

    [Dynamo] Several fixes on TensorVariable & TorchVariable (#89486)

    This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests.
    * Support ```tensor.type()```.
    * Support ```tensor.get_device()```.
    * Support ```torch.nn.functional._Reduction.get_enum```.
    * Support ```torch._utils._get_device_index()```.
    * Fallback ```tensor.data_ptr()```.
      * ```FakeTensor``` always returns 0
      * For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486
    Approved by: https://github.com/jansel

commit f03e6672fb6a694d6f03980e3f34d8181c7cc663
Author: Iris <[email protected]>
Date:   Wed Nov 23 19:39:01 2022 +0000

    [Checkpoint][2D] Minor update for dedup_tensors.py (#89542)

    Rename variables for better readability.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542
    Approved by: https://github.com/H-Huang

commit 74703eb50299b26082bc2a357770739a68460199
Author: Iris <[email protected]>
Date:   Wed Nov 23 19:36:01 2022 +0000

    [Checkpoint] Add a logger to dedup_tensors (#89503)

    Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan).

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503
    Approved by: https://github.com/fduwjj

commit 57353c9608263df98156a73aaa6ed35a2a2306ad
Author: Brian Hirsh <[email protected]>
Date:   Wed Nov 23 08:29:08 2022 -0800

    first draft of input mutation handling for aot autograd (#88817)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817
    Approved by: https://github.com/ezyang, https://github.com/wconstab

commit 902e4e3926a9333178510f032580e4acd56c40da
Author: PyTorch MergeBot <[email protected]>
Date:   Wed Nov 23 19:05:13 2022 +0000

    Revert "Fix the kineto daemon build condition (#89174)"

    This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd.

    Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil.

commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a
Author: Bin Bao <[email protected]>
Date:   Wed Nov 23 02:00:44 2022 +0000

    [inductor] Update CI model tests (#89499)

    Summary:
    1) Add model inference test
    2) Switch model training test to use AMP

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499
    Approved by: https://github.com/bertmaher

commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6
Author: Jerry Zhang <[email protected]>
Date:   Tue Nov 22 20:29:26 2022 -0800

    [quant][be] Remove unused util code (#89272)

    Summary:
    att

    Test Plan:
    python test/test_quantization.py TestQuantizeFx

    Reviewers:

    Subscribers:

    Tasks:

    Tags:

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272
    Approved by: https://github.com/andrewor14

commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009
Author: Jerry Zhang <[email protected]>
Date:   Tue Nov 22 20:29:26 2022 -0800

    [quant][be] Refactor the error checking code for quantize_per_channel op (#89271)

    Summary:
    at

    Test Plan:
    make sure it compiles

    Reviewers:

    Subscribers:

    Tasks:

    Tags:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271
    Approve…
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
jithunnair-amd added a commit to ROCm/builder that referenced this pull request Apr 11, 2023
* Make sure package_type is set (pytorch#1139)

* Update check_binary.sh

* Update check_binary.sh

* Modifying smoke test to add more advanced validation as requested (pytorch#1124)

* Modify smoke test matrix

More vision smoke tests

Temporary pointing to my repo for testing

Try 2 use atalman builder

Modify path

Fixing commits

Testing

Testing

Smoke test modifications

Refactor test code

Fix typo

Fixing image read

A little more refactoring

Addressing comments

Testing

* Add same test for windows and macos

* Addressing c omments

* Add manywheel special build for including pypi package (pytorch#1142)

* Add manywheel special build

Testing

Builder change

Testing

Adding manywheel cuda workflow

Simplify

Fix expr

* address comments

* checking for general setting

* Pass correct parameters for macos validations (pytorch#1143)

* Revert "Update check_binary.sh"

This reverts commit 6850bed.

* Revert "Update check_binary.sh"

This reverts commit 051b9d1.

* setup periodic test to run binary verification  pytorch/pytorch#84764: (pytorch#1144)

* add a reusable workflow to run all smoke tests/or smoke tests for a specific os/channel
* add workflows to schedule the periodic smoke tests for nightly and release channels

* Update aarch64 script to latest one (pytorch#1146)

* minor: fix the typo job name for windows binaries validation workflow (pytorch#1147)

* fix the typo in the the job name for the release binaries validation workflow (pytorch#1148)

issue was introduced in pytorch#1144

* Move to rc2 of 3.11 python (pytorch#1149)

Need it to get several convenience functions

* Integrates CUDA pip wheels (pytorch#1136)

* Refactors rpath to externally set var. Adds mechanism to add metadata

* Sets RUNPATH when using cudnn and cublas wheels

* Escapes dollar sign

* Fix rpath for cpu builds

Co-authored-by: atalman <[email protected]>

* Uses RPATH instead of RUNPATH so that user strictly uses pypi libs (pytorch#1150)

* Binary Validation Workflow - Adding check binary script (pytorch#1127)

* Update action.yml

* Update validate-macos-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Fix check binary for arm64 (pytorch#1155)

* Fix check binary for arm64

* Update check_binary.sh

Co-authored-by: Nikita Shulga <[email protected]>

Co-authored-by: Nikita Shulga <[email protected]>

* Fix for including nvtx dll and cudart (pytorch#1156)

* Fix for invluding nvtx dll and cudart

* Fix for include nvtx

* Fix spaces

* Back out inclusion of cudart (pytorch#1157)

* Add cuda and date check to smoke test (pytorch#1145)

* shorten binary validation workflow names, so they are more readable in the HUD and GH job view (pytorch#1159)

* Fix anaconda torchaudio smoke test (pytorch#1161)

* Fix anaconda torchaudio smoke test

* Format using ufmt

* Fix whels tests for torchaudio (pytorch#1162)

* Pin condaforge version

Most recent version fails with  invalid cert error when trying to update
python

* Option to run resnet classifier on specific device

* Fix typo

`.test/smoke_test` -> `test/smoke_test`

Noticed when pushed pytorch@3b93537 and no tests were run

* Test resnet classifier on CUDA (pytorch#1163)

* [ROCm] support for rocm5.3 wheel builds (pytorch#1160)

* Updates to support rocm5.3 wheel builds (#6)

* Changes to support ROCm 5.3

* Updated as per comments

* Installing python before magma build

- In ROCm 5.3 libtorch build are failing during magma build due to
  to missing python binary so added install statement

* Move python install to libtorch/Dockerfile (#8)

* Updating the condition for noRCCL build (#9)

* Updating the condition for noRCCL build

* Updated changes as per comments

* Use MIOpen branch for ROCm5.3; Change all conditions to -eq

* Use staging branch of MIOpen for ROCm5.3

* Fix merge conflict

Fix merge conflict

Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

* Validate python 3.11 (pytorch#1165)

* Validate python 3.11

* Validate linux binaries change

Add options

Import torchvision

Adding python 3.11 install

pass package to check nightly binaries date

Test

test

Add python 3.11 code

testing

Adding python 3.11 test

Add python 3.11 validation

Adding zlib develop install

Install zlib etc..

Adding zlib1g as well

testing

testing

Adding validate windows binary

Trying to workaround

testing

Refacor smoke test

Add import statement

fix datetime call

* Fix stripping dev

* fix import

* Strip pypi-cudnn from the version.py (pytorch#1167)

* Strip pypi-cudnn from the version.py

* small fix

* Regenerates RECORD file to reflect hash changes caused by sed'ing the version suffix (pytorch#1164)

* Add pypi cudnn package to tests (pytorch#1168)

* Add pypi cudnn package to tests

* Fix pypi installation check

* Fix pypi instructions setting

* Update DEVELOPER_DIR in build_pytorch.sh

Not sure why we are still expecting Xcode9 to be present there, update it to the same folder as wheel builds

May be fixes pytorch/pytorch#87637

* Fix to not use sccache if it's not setup properly (pytorch#1171)

* Revert "Fix to not use sccache if it's not setup properly (pytorch#1171)" (pytorch#1172)

This reverts commit 377efea.

* Remove cuda102 and cuda115 docker builds and regenerate manylinux docker (pytorch#1173)

* Rebuild manywheel

* Remove cuda102 and cuda115

* [aarch64] add mkldnn acl backend build support for pytorch cpu libary (pytorch#1104)

* Only push to Docker and Anaconda repo from main (pytorch#1175)

We currently allow push from any branch to go to Docker (and Anaconda) prod. This is a dangerous practice because it allows unfinished works to jump to prod and used by other workflows

* Release 1.13 script changes (pytorch#1177)

* Test ResNet on MPS (pytorch#1176)

After pytorch/pytorch#86954 is fixed, we should be able to test resnet on MPS

* Revert "Test ResNet on MPS (pytorch#1176)" (pytorch#1180)

This reverts commit efa1bc7.

* Add v1.13 versions

* Update CMake to 3.18, needed for C++17 compilation (pytorch#1178)

* release: separate out version suffixes for torch pypi promotion (pytorch#1179)

* Fixup wheel published to PyPI (pytorch#1181)

* Fixup wheel published to PyPI

* Update prep_binary_for_pypi.sh

* Fix folder deletion for pypi prep

Co-authored-by: Andrey Talman <[email protected]>

* Update cmake version to 3.18 for libtorch docker

* Pins cuda runtime to 111.7.99 (pytorch#1182)

* Fixes cuda pypi rpaths and libnvrtc name (pytorch#1183)

* Allow ROCm minor releases to use the same MIOpen branch as the major release (pytorch#1170)

* Allow ROCm minor releases to use the same MIOpen branch as the major release

* correct logic to ensure rocm5.4 doesn't fall in wrong condition

* add 11.8 workflow for docker image build (pytorch#1186)

* Using windows runners from test-infra for validation workflows (pytorch#1188)

* Testing new windows runners

test

Testing

Testing

testing

testing

test

Test

Testing

testing

Testing

Testing

test

Test

test

testing

testing

Test

testing

test

testing

testing

testing

testing

testing

testing

test

test

testing

testing

testing

testing

Test

test

test

testing

testing

testing

testing

testing

testing

testing

testing

testing

Refactor code

* Adding details for the test-infra issue

* Update current CUDA supported matrix

* add magma build for CUDA11.8 (pytorch#1189)

* Test setting job name (pytorch#1191)

* Use official Python-3.11 tag (pytorch#1195)

* remove CUDA 10.2-11.5 builds (pytorch#1194)

* remove CUDA 10.2-11.5 builds

* remove 11.5 and 11.3 builds

* build libtorch and manywheel for 11.8 (pytorch#1190)

* build libtorch and manywheel for 11.8

* Update common/install_magma.sh

* use magma-cuda build-1 by default; remove CUDA 10.2-11.5 builds

Co-authored-by: Andrey Talman <[email protected]>

* [Validation] Pass ref:main to general worker (pytorch#1197)

* Pass ref:main to general worker

* Try to pass reference to workflow

* Pass ref:main to general worker

* Test

* Pass reference as input parameter

* Make new variable not required

* Fix typo

* Add workflow for manywheel cpu-cxx11-abi (pytorch#1198)

* [Validation] Use linux_job for linux workers (pytorch#1199)

* Use linux_job for linux workers

Test

Testing

Test

testing

Tetsing

testing

Change linux binary action

test

Simplify version check

* Fix if statement

* Fix typo

* Fix cuda version check

Fix Audio and Vision version check

Add check binary to libtorch

test

test

testing

testing

testing

Testing

Testing

testing

* Use macos generic workers (pytorch#1201)

* Use macos generic workers

fix workflow

testing

Add arm64 builds

test

Remove validate binary action

* add check binary step

* fix ld_library path

* add package type

* Adding ref to validate binaries (pytorch#1204)

* ROCm5.3 nightly wheels (pytorch#1193)

* Enable ROCm5.3 nightly wheels

* Enable ROCm5.3 docker builds

* Update amdgpu repo url for ROCm5.3

* ROCm5.3 not supported on Ubuntu 18.04

* empty

* Another empty commit

* Try disabling MLIR build to shorten docker build time

* Clean up disk space

* MLIR project changed names from ROCm5.4

* Retrigger CI to get around flaky magma git access error

* One more cmake-3.18.4 update

* Use cmake-3.18 for ROCM builds

* More cmake ROCM tweaks

* cmake-3.18 installation on ROCM (take 3)

* add conda builds for CUDA 11.8 (pytorch#1205)

* Enable nightly CUDA 11.8 builds (pytorch#1206)

* enable nightly builds for CUDA 11.8

* add CUDA 11.8 version to manywheel, remove 11.3 and 11.5

* Windows CUDA 11.8 changes (pytorch#1207)

* Add continue on error to validation jobs (pytorch#1209)

* Add continue on error to validation jobs

* test

* Delete unmaintaned torchvision build scripts (pytorch#1210)

All build logic has long moved to torchvision repo and now is executed
by reusable workflow from https://github.com/pytorch/test-infra/tree/main/.github/workflows

* build_pytorch.sh replace tabs with spaces (pytorch#1211)

* Make PyTorch depend on TorchTrition (pytorch#1213)

Remove me when Triton is properly released elsewhere

* Remove smoke test script that is no longer used (pytorch#1212)

* Another tabs-to-spaces change

`s/\t/\ \ \ \ \ \ \ \ /`

* Disable continue on error (pytorch#1214)

* Add torchtrition dependency for wheels

* Make PyTorchConda depend on Triton (Take 2)

Multi-line environment variables are hard, so lets do it traditional way

* Revert "Add torchtrition dependency for wheels"

This reverts commit 475100b.

* Add TorchTrition dependency for wheels (take 2)

Now tests should be green thanks to pytorch/pytorch#90017

* Add sympy to pytorch linux dependencies

* Mitigate windows nightly build regressions

By pinning conda to 22.9.0

Fixes pytorch/pytorch#90059

* Consolidating validation scripts (pytorch#1219)

* Consolidating validation scripts

* Fix validate script name

* Correct script path

* Correct script path

* test

* testing

* testing

* testing

* testing

* test

* test

* test

* testing

* testc

* test hook

* adding wondows use case

* windows use case

* test

* testing

* Windows fixes

* more fixes

* Add package type

* testing more

* Truncate RECORD instead of delete (pytorch#1215)

* Refactor and fix windows smoke tests (pytorch#1218)

* Fix windows smoke test

* Fix first if statement

* Refactor not to cal install nightly package

* Revert "Refactor not to cal install nightly package"

This reverts commit ac580c8.

* Fix pip install command remove cu102

* Refacor the conda installation

* Add cuda profiler apu to cuda install 11.8 (pytorch#1221)

* Update CUDA upgrade runbook to mention subpackages changes

As per following doc: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html

* conda: Add CUDA_HOME, cuda binaries to path (pytorch#1224)

* Refactor macos-arm64 into separate group (pytorch#1226)

* Adding libcufft constraint (pytorch#1227)

* Adding libcufft constraint

* Adding rest of the dependencies

* Advance build number in pytorch-cuda (pytorch#1229)

* Make sympy mandatory dependency of PyTorch

Should fix 
https://github.com/pytorch/audio/actions/runs/3684598046/jobs/6234531675

* Revert me later: Fix conda package smoke tests

* Install `sympy` via pip rather than conda

Needs to be reverted as well

* Refactor smoke tests to configure module included in the release (pytorch#1223)

* Changes to prep for pypi script for release 1.13.1 (pytorch#1231)

* PyPi binary validation and size check (pytorch#1230)

* Validate binary size

* Validate binary size linux_job

* evaluate the fix from pytorch#1231

* Add an optional artifact upload, consolidate fixes to `prep_binary_for_pypi.sh`

* Adding new workflow to call from domain libraries to validate on domain libraries such as text (pytorch#1234)

* Testing new workflow

Fix naming

fix input

* Changed comments

* Ad ability to call validate domain library manually (pytorch#1235)

* Adding test for validate dm workflow and fixing dm validation workflow (pytorch#1236)

* Test manywheel packages (pytorch#1239)

Change only docker file

* Bump scripts in release (pytorch#1241)

* release: Strip whitespace from version_with_suffix (pytorch#1242)

* Cuda 11.8 and removal of dev packages (pytorch#1243)

* Adding more OS's to validate domain library workflow (pytorch#1238)

* Adding more OS's to validate domain library workflow

* conda and wheel togeather

* add macos workflows

* fix workflow

* Add target os variable to windows validation (pytorch#1244)

* Update MKL to 2022.1 (pytorch#1245)

As previous one occasionally crashes on AMD CPUs

May be addresses pytorch/pytorch#89817

Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library:
```
int mkl_serv_intel_cpu_true() {
	return 1;
}
```

* Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)

* Installs NCCL from redist, uses system NCCL, and adds pypi RPATH

* Cleans up nvrtc patches and adds it using main script

* Fixes typo

* Adds more dependencies and builds torch with dynamic linking

* NCCL dirs have to be specified. Otherwise picks up different version

* Handles 11.8

* Adds echo message for nccl 2.15

* Remove invalid git option (pytorch#1246)

* Revert "Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)" (pytorch#1247)

This reverts commit ee59264.

* Add with_cuda flag (pytorch#1249)

* Add GPU architecture env variables (pytorch#1250)

* Add cuda to jobname for validate domain library (pytorch#1252)

* Remove pylief dependency (pytorch#1255)

* Fix PEP503 for packages with dashes

* Rename `torchtriton` to `pytorch-triton`

Companion change for pytorch/pytorch#91539

* s3_management: Hide specific packages between dates (pytorch#1256)

* s3_management: Pin requirements.txt

Packaging got updated and that's not what we want

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: except ValueError

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: Use the correct format for strptime

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: Bump bad dates to october 17th (pytorch#1257)

* s3_management: hide torchtriton (pytorch#1258)

* s3_management: Add PACKAGE_ALLOW_LIST for indices (pytorch#1259)

* s3_management: Bump bad date end to 12/30 (pytorch#1260)

* Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1248)

* Installs NCCL from redist, uses system NCCL, and adds pypi RPATH

* Cleans up nvrtc patches and adds it using main script

* Fixes typo

* Adds more dependencies and builds torch with dynamic linking

* NCCL dirs have to be specified. Otherwise picks up different version

* Handles 11.8

* Adds echo message for nccl 2.15

* Fixes logic for 11.8 and adds missing names for DEPS_SONAME

* s3_management: Account for underscore packages

pytorch-triton is listed as pytorch_triton

Signed-off-by: Eli Uriegas <[email protected]>

* s3_management: simplify allowlist, correct underscores

Signed-off-by: Eli Uriegas <[email protected]>

* Fix cuda version in nightly (pytorch#1261)

* Adding py311 validations (pytorch#1262)

* Use MATRIX_* variables instead of redeefining new var each time (pytorch#1265)

* Fix validation domain library (pytorch#1266)

remove ref main

fix workflow

more refactor

* Nightly: do test install with the dependencies better and skip CUDA tests on cpu only box (pytorch#1264)

* Refactor PyTorch wheel and libtorch build scripts for ROCm (pytorch#1232)

* Refactor wheel and libtorch build scripts (#7)

* Update to so patching for ROCm

Wildcard used in grep to grab the actual numbered so file referenced
in patchelf. This allows the removal of specifying the so number in
DEPS_LIST & DEPS_SONAME

This commit also adds the functionality for trimming so names to
build_libtorch.sh from build_common.sh

* Refactor to remove switch statement in build_rocm.sh

This commit refactors build_rocm.sh and brings in a few major updates:
 - No longer required to specify the full .so name (with number) for ROCm libraries
       - The .so versions are copied and the patching code will fix the links to point to this version
 - No longer required to specify paths for ROCm libraries allowing the removal of the large switch
       - Paths are acquired programmatically with find
 - No longer required to specify both the path and filename for the OS specific libraries
       - Programatically extract file name from the path
 - Automatically extract Tensile/Kernels files for the architectures specified in PYTORCH_ROCM_ARCH
   and any non-arch specific files e.g. TensileLibrary.dat

* rocfft/hipfft link to libhiprtc.so in ROCm5.4 (#15)

Co-authored-by: Jack Taylor <[email protected]>

* add sm_90 to CUDA11.8 builds (pytorch#1263)

* add sm_90 to CUDA11.8 builds

* Manually invoke bash for Miniconda

* Revert "add sm_90 to CUDA11.8 builds (pytorch#1263)" (pytorch#1275)

This reverts commit e1453a4.

* Set ubuntu distribution correctly for ROCm5.3 and above (pytorch#1268)

* Fix unbound variable error (pytorch#1276)

Regression introduced (and ignored) by pytorch#1262
Test plan:
```
% bash -c 'set -u; if [[ -z "${FOO}" ]]; then echo "bar"; fi' 
bash: FOO: unbound variable
(base) nshulga@nshulga-mbp builder % bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi'
bar
(base) nshulga@nshulga-mbp builder % FOO=1 bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi'

```

* Manually invoke bash for miniconda (pytorch#1277)

Fixes build issues failing with:
```
./Miniconda3-latest-Linux-x86_64.sh: 438: ./Miniconda3-latest-Linux-x86_64.sh: [[: not found
```
as seen in e.g.: pytorch#1271

* Fix perm

Which somehow got changed by pytorch@62103bf

* add sm_90 to CUDA11.8 builds (pytorch#1278)

* libtinfo.so version update and logic fix for ROCm libtorch (pytorch#1270)

* Use libtinfo.so.6 for Ubuntu 2004

* Fix to origname grep

* Condition on ROCM_VERSION for libtinfo6

* Looks like it is not used anywhere. (pytorch#1273)

* Build Windows binaries with Visual Studio 2022 Build Tools (pytorch#1240)

* Build Windows binaries with Visual Studio 2022 Build Tools

* Unify casing in Batch files, remove VS 2017 installation

* Remove VS 2017 Conda scripts, unify casing in conda Batch scripts, minor Conda scripts tweaks

* Slim down `pytorch-cuda`

It should only contain runtime dependencies that PyTorch+domain
libraries depend on, namely:
 - cudart
 - cublas
 - cusparse
 - cufft
 - curand
 - nvtx
 - nvrtc
 - nvjpeg (for TorchVision)

This removes dependencies on NVCC, build/debug tools, etc which are not
needed for running the pytorch

Test Plan:
  `conda create -n tmp -c nvidia -c malfet cuda-toolkit==11.7` and
observe that only relevant packages are installed

Fixes pytorch/pytorch#91334

* [BE] Delete `unicode-flags` build options (pytorch#1284)

There were relevant only for Python<=3.3

* [BE] Define `openssl_flags` (pytorch#1285)

Rather than have two invocations of `./configure`

* Build with `--enabled-shared` if `patchelf` is found (pytorch#1283)

This is needed to make `manylinux-wheel` images usable for building new Triton binaries.

Test plan: Build docker and verify that following `CMakeLists.txt` finishes successfully:
```
cmake_minimum_required(VERSION 3.6)
find_package(Python3 REQUIRED COMPONENTS Interpreter Development)
message(WARNING Executable ${Python3_EXECUTABLE})
message(WARNING IncludeDirs ${Python3_INCLUDE_DIRS})
message(WARNING Libraries ${Python3_LIBRARIES})
```

* Update cudnn to 8.7.0.84 for CUDA 11.8 builds (pytorch#1271)

* update cudnn to 8.7.0.84 for CUDA 11.8 builds

* workaround for pytorch#1272

* Revert "workaround for pytorch#1272"

This reverts commit c0b10d8.

* update cudnn==8.7.0.84 for windows

* [BE] Remove references to Python<3.6 (pytorch#1287)

* Upgrade desired python versoin to 3.8

For libtorch builds

* Fix how libtorch picks the python version

* Tweak conda builds to support 3.11

Add `-c malfet` when building for 3.11 (though perhaps it's better to
move numpy to pytorch channel)

Tweak some build time dependencies

* Fix typo

* Skip triton dependency for 3.11 CUDA builds

* Update build-number to 3

* Add ability to override cuda archs for conda (pytorch#1282)

* [ROCm] reduce disk space used in image (pytorch#1288)

Fixes pytorch#1286

* Extend MacOS/Windows builds to 3.11

By installing dependencies from pip
Should be a no-op for <=3.10

* ci: Migrate to checkout@v3 (pytorch#1290)

checkout@v2 is deprecated moving to checkout@v3

Signed-off-by: Eli Uriegas <[email protected]>

* Fix typo

* Add 3.11 option for Windows builds

* Add python-3.11 download location for windows

* Add pypi with cudnn package test (pytorch#1289)

* Add pypi with cudnn package test

* Add pypi with cudnn package test

* test

* test

* More pypi cudnn changes

* test

* Fix pipy smoke test

* Remove debug comments

* Delete some ancient checks for MacOS builds

As we no longer build for Python-2.7 or 3.5

* Add libnvjpeg-dev package as fallback (pytorch#1294)

* Add libnvjpeg-dev package as fallback

* Move libnvjpeg and libnvjpeg-dev to required packages

* Update conda/pytorch-cuda/meta.yaml

---------

Co-authored-by: Nikita Shulga <[email protected]>

* Upgrade nightly wheels to rocm5.4.2 (pytorch#1225)

* Upgrade nightly wheels to rocm5.4

* Adding graphic architectures for ROCm 5.4

* Updated to use ROCm5.4.1

* Updated to use ROCm5.4.2

* Fixed syntax error

* Perform build on image with magma and miopen preinstalled

* Add dev packages for windows pytorch-cuda dependencies (pytorch#1295)

* Add dev packages for windows dependencies

* Adding architecture dependent builds

* Add notes around windows

* fix typo

* Bumping version to v3

* rocm libtorch prebuild magma; fix manylinux cmake version (pytorch#1296)

* Add manywheel:cpu-cxx11-abi checkup for check_binary.sh (pytorch#1251)

* Remove with_py311 flag (pytorch#1301)

* rocm manylinux now uses devtoolset 9 (pytorch#1300)

* fix ACL_ROOT_DIR setting and upgrade the ACL version to 22.11 (pytorch#1291)

* Add `-c malfet` for Windows builds as well

* Set torch._C._PYBIND11_BUILD_ABI version check only for GLIBCXX_USE_CXX11_ABI=0 (pytorch#1303)

* Adding limit windows builds logic (pytorch#1297)

* Adding limit windows builds logic

* Remove empty space

* Simplify mkl build dependencies (pytorch#1305)

On Linux and Mac PyTorch must be built against `mkl=2020.x` in order to be compatible with both `mkl-2021` and `mkl-2022`, that added `.so.1` and `.so.2` files respectively, that would make binary linked against those versions incompatible with the newer/older toolchains.

This is not an issue on Windows, as all mkl binaries there end with simple `.dll`

* "Fix" PyTorch CPU conda testing

It's still horribly broken, but make it a bit better by not installing
pytorch from default anaconda channel (which installs 1.12.1 that does
not have any dependencies 2.0 dev package supposed to have)

For example, see this runlog
https://github.com/pytorch/pytorch/actions/runs/4155371267/jobs/7189101147

* Update torch._C._PYBIND11_BUILD_ABI version check (pytorch#1306)

* Skip tests for manywheel built with _GLIBCXX_USE_CXX11_ABI=1

* Put back smoke test label (pytorch#1310)

* [aarch64] add support for torchdata wheel building (pytorch#1309)

* Python 3.11 validation workflow tests (pytorch#1304)

* Test windows py311

* Nightly binaries

* Fix py311 tests

* fix python calling

* Revert "Nightly binaries"

This reverts commit cbf80ca.

* add a scheduled workflow for the nightly pypi binary size validation (compliments pytorch/test-infra#2681) (pytorch#1312)

* Add regression test for pytorch/pytorch#94751

* Add 3.11 and `--pytorch-only` options

* Add `lit` to list of allowed packages

As it is now mandatory (albeit spurious) dependency of pytorch-triton

See https://pypi.org/project/lit/ for more details

* s3: Allow tar.gz as an accepted file extension (pytorch#1317)

* Changes for Python 3.11 and smoke Test RC cut (pytorch#1316)

* Smoke Test RC cut

* Validate binaries 3.11

* test

* Smoke test binaries

* Fix pytorch-cuda chan download

* Remove temp change

* Make sure we don't use GPU runners for any of libtorch validations (pytorch#1319)

* Make sure we don't use GPU runners for any of libtorch

* Make sure we don't use GPU runners for any of libtorch

* s3: Add pytorch_triton_rocm to index (pytorch#1323)

Signed-off-by: Eli Uriegas <[email protected]>

* s3: Add tqdm package req for text (pytorch#1324)

* Add `--analyze-stacks` option

That using `git rev-base`, prints total number of stacks, and its
average, mean and max depth

At the time of submission here is top 10 ghstack uses of pytorch:
```
ezyang has 462 stacks max depth is 15 avg depth is 1.70 mean is 1
awgu has 240 stacks max depth is 28 avg depth is 4.30 mean is 1
peterbell10 has 146 stacks max depth is 7 avg depth is 1.84 mean is 1
zou3519 has 128 stacks max depth is 7 avg depth is 1.98 mean is 1
jerryzh168 has 113 stacks max depth is 16 avg depth is 1.45 mean is 1
bdhirsh has 111 stacks max depth is 7 avg depth is 1.85 mean is 2
wconstab has 108 stacks max depth is 7 avg depth is 2.15 mean is 1
SherlockNoMad has 99 stacks max depth is 4 avg depth is 1.24 mean is 1
zasdfgbnm has 80 stacks max depth is 11 avg depth is 2.52 mean is 6
desertfire has 73 stacks max depth is 3 avg depth is 1.14 mean is 1
```

* Add filelock and networkx deps (pytorch#1327)

To match dependencies for wheel files defined in https://github.com/pytorch/pytorch/blob/ed1957dc1989417cb978d3070a4e3d20520674b4/setup.py#L1021-L1024

* Remove building magma from source

* Revert

* Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)

* Upgrade cmake version to 3.22.1 to build triton

* Pin patcheft version

* Fix comment typo

* Smoke test for cuda runtime errors (pytorch#1315)

* Add test for cuda runtime errors

* Add cuda exception smoke test

* Move cuda runtime error to end

* Move cuda runtime error to end

* Address comments

* Address comments

* Add Jinja2 Dependency (pytorch#1332)

As part of the effort to fix pytorch/pytorch#95986

* Add MarkupSafe to S3 Index (pytorch#1335)

* Remove rocm5.1 rocm5.2 from libtorch Dockerfile

* [aarch64] Adding CI Scripts to build aarch64 wheels (pytorch#1302)

* add aarch64 ci scripts

* added readme. get branch from /pytorch

* Add smoke tests conv,linalg,compile. And better version check. (pytorch#1333)

* Add smoke tests conv,linalg,compile

* Add version check

* Fix typo

Fix version check

Add not

* Add exception for python 3.11

* fix typo

* Try to exit after CUDA Runtime exception

* Restrict carsh test only to conda

* Restrict carsh test only to conda

* Fix tests

* Turn off cuda runtime issue

* tests

* more tests

* test

* remove compile step

* test

* disable some of the tests

* testing

* Remove extra index url

* test

* Fix tests

* Additional smoke tests

Remove release blocking changes

* Aarch64 changes for PyTorch release 2.0 (pytorch#1336)

* Aarch64 changes for PyTorch release 2.0

* Fix spacing

* Update aarch64_linux/build_aarch64_wheel.py

Co-authored-by: Nikita Shulga <[email protected]>

* Update aarch64_linux/build_aarch64_wheel.py

Co-authored-by: Nikita Shulga <[email protected]>

---------

Co-authored-by: Nikita Shulga <[email protected]>

* Aarch64 build py3.11 fix (pytorch#1341)

* Fix nightly smoke test (pytorch#1340)

* Fix nightly smoke test

* Fix nightly builds

* Release 2.0 release scripts changes (pytorch#1342)

* Release 2.0 release scripts changes

* Release script modifications

* Add more packages to allow list (pytorch#1344)

* Add `jinja2` dependency to conda package

To be consistent with wheels, see
https://github.com/pytorch/pytorch/95961

* Restrict jinja to py 3.10 or less (pytorch#1345)

* Update `torchtriton` version to 2.1.0

* And update trition version here as well

* added smoke test for max-autotune (pytorch#1349)

Co-authored-by: agunapal <[email protected]>

* Refactor conda backup script (pytorch#1350)

* Refacto conda backup

* Fix space

* Minor style

* Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" (pytorch#1351)

* Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)"

This reverts commit 18c5017.

* Selective revert

* Get cmake from pip

* Use 3.18.2 from conda

* Release script changes, add more release dependencies, bump version for aarch64 builds (pytorch#1352)

* Release script changes

* Add Jinja2 dependency

* Fix typo

* Add pytorch conda dependencies (pytorch#1353)

* Add latest dependencies for pytorch 2.0 release (pytorch#1357)

* Fix typo

* Revert "Revert me later: Fix conda package smoke tests"

This reverts commit d7f2a7c.

* [aarch64] update readme with the "--enable-mkldnn" option (pytorch#1362)

This needs to be enabled for official wheel building.

* Replace `--enable-mkldnn` with `--disable-mkldnn`

Also, change default to ubuntu-20.04

* Update AMIs

Using following images:
```
% aws ec2 describe-images --image-ids ami-078eece1d8119409f ami-052eac90edaa9d08f ami-0c6c29c5125214c77 --query "Images[].[ImageId, Description]"
[
    [
        "ami-078eece1d8119409f",
        "Canonical, Ubuntu, 18.04 LTS, arm64 bionic image build on 2023-03-02"
    ],
    [
        "ami-0c6c29c5125214c77",
        "Canonical, Ubuntu, 22.04 LTS, arm64 jammy image build on 2023-03-03"
    ],
    [
        "ami-052eac90edaa9d08f",
        "Canonical, Ubuntu, 20.04 LTS, arm64 focal image build on 2023-03-01"
    ]
]
```

* Update tags for domain libraries

* Add PyTorch version pinning to release wheels

* Fix flake8

* [BE] Introduce `build_domains` function

And call it to rebuild only domains if torch wheel is available

* Switch deprecated ubuntu-18.04 runner to ubuntu-latest (pytorch#1334)

* Switch deprecated ubuntu-18.04 runner to self-hosted 2xlarge

* Leave build-nvidia-docker for now

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <[email protected]>

* Use ephemeral runners

* Use ubuntu-latest

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <[email protected]>

* Switch from latest to 22.04 to pin the version

---------

Co-authored-by: Nikita Shulga <[email protected]>

* Introduce optional --build-number parameter

* Revert me later: Fix conda package smoke tests

(cherry picked from commit d7f2a7c)

Alas, it's still used and causes nightly build failures

* Fix aarch64 torchvision build (pytorch#1363)

* Fix torchvision image extension compilation

* Fix torchvision image extension compilation

* Set enable_mkldnn to pypi build

* Remove unused `enable_mkldnn` for configure_system

* [aarch64] Try to link statically with png/jpeg

Also, add testing (which is currently broken)

* Revert "Revert me later: Fix conda package smoke tests"

This reverts commit ce427de.

* [AARCH64] Fix image.so wheel

By adding explicit libz dependency

* [AARCH64] Pass `BUILD_S3` to torchdata

To make build consistent with Linux-x86_64

* Revert "[AARCH64] Pass `BUILD_S3` to torchdata"

This reverts commit ae8e825.

As it does not want to be built on aarch64

* Add portalocker (pytorch#1364)

* [BE] Error handling in build_aarch64_wheel

I've noticed that build errors in `build_ArmComputeLibrary` would be
ignored as semicolon is used between the commands, instead of &&
Also, replace nightly version evaluation by relying on torch, to rely on
individual libraries

* [AArch64] Pass `args.instance_type` to `start_instance`

* use c++17 when building windows smoke tests (pytorch#1365)

Summary:
We are seeing failures during CI dealing with some headers that have
nested namespaces. This is expected to remedy them.

One such example:
https://github.com/pytorch/pytorch/actions/runs/4510336715/jobs/7942660912

Test Plan: Test this with CI.

---------

Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Co-authored-by: Andrey Talman <[email protected]>
Co-authored-by: andysamfb <[email protected]>
Co-authored-by: izaitsevfb <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Syed Tousif Ahmed <[email protected]>
Co-authored-by: Syed Tousif Ahmed <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Wei Wang <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Pruthvi Madugundu <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Huy Do <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Eli Uriegas <[email protected]>
Co-authored-by: ptrblck <[email protected]>
Co-authored-by: zhuhong61 <[email protected]>
Co-authored-by: Greg Roodt <[email protected]>
Co-authored-by: Eli Uriegas <[email protected]>
Co-authored-by: Dmytro Dzhulgakov <[email protected]>
Co-authored-by: albanD <[email protected]>
Co-authored-by: Radek Bartoň <[email protected]>
Co-authored-by: divchenko <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Bo Li <[email protected]>
Co-authored-by: Mike Schneider <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
Co-authored-by: agunapal <[email protected]>
Co-authored-by: dagitses <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants