[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs #1605

vivekkhandelwal1 · 2022-11-17T12:52:34Z

This commit fixes the aten.mean and aten.mean.dim op decomposition for supporting large-sized inputs.
This commit also fixes the formatting for the file stats.py

Signed-Off By: Vivek Khandelwal[email protected]

pashu123

LGTM!

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

python/torch_mlir_e2e_test/test_suite/stats.py

pashu123

This is in terms of f16 support → We shouldn't be up casting f16's to f64's, casting to f32's should be enough.

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

This commit fixes the aten.mean and aten.mean.dim op decomposition for supporting large-sized inputs. This commit also fixes the formatting for the file stats.py Signed-Off By: Vivek Khandelwal<[email protected]>

ramiro050 · 2022-11-18T18:55:13Z

This is in terms of f16 support → We shouldn't be up casting f16's to f64's, casting to f32's should be enough.

@vivekkhandelwal1 @pashu123, is this still needed?

vivekkhandelwal1 · 2022-11-20T11:28:54Z

This is in terms of f16 support → We shouldn't be up casting f16's to f64's, casting to f32's should be enough.

@vivekkhandelwal1 @pashu123, is this still needed?

Not right now. Since the fp16 tests don't work on the refbackend, this support would be added in a separate PR.

vivekkhandelwal1 · 2022-11-21T06:25:15Z

@ramiro050 Is this patch good to merge?

ramiro050

LGTM!

tanyokwok · 2022-11-29T05:17:36Z

Hi @vivekkhandelwal1, we found a great performance decay with the commit since all mean/sum such reduction computations are now in f64.

I am curious why we force the computation precision to f64. Can we use an option to enable f64 in your scenario and leave f32 computation by default?

cc @ramiro050 @silvasean

Affected issue: alibaba/BladeDISC#799

silvasean · 2022-11-29T10:08:39Z

AFAICT PyTorch uses f32 for this computation instead of f64. Why do we need f64? Since it is causing a major regression I think we should revert this patch.

We don't want an option to control f32 or f64 here. We should just be consistent with PyTorch's behavior.

vivekkhandelwal1 · 2022-11-29T15:21:59Z

Hi @vivekkhandelwal1, we found a great performance decay with the commit since all mean/sum such reduction computations are now in f64.

I am curious why we force the computation precision to f64. Can we use an option to enable f64 in your scenario and leave f32 computation by default?

cc @ramiro050 @silvasean

Affected issue: alibaba/BladeDISC#799

If we revert this conversion to f64 then the tests with the large input size fail.

silvasean · 2022-11-29T18:38:26Z

I tried reverting the .cpp file changes in this patch locally and all the e2e test cases in the patch continue to pass. We should just revert this patch since it does not appear to contain an e2e test that fails before but passes after.

qedawkins · 2022-11-30T06:44:26Z

In the past I referenced this to see how PyTorch was doing the mean and variance computation on CPU when debugging a correctness issue for var.correction. This culminated in the VarCorrectionLargeInputModule_basic test added in #1100, but if the large input tests pass both with and without double precision then the tests must not be adequately testing the precision differences.

Also summation order by the backend matters here, e.g. say if we want to sum 1000 elements, we might first sum together the elements in groups of 4 and then fully reduce using these partial sums. This can avoid underflow problems with lower precision, but doesn't mean that summing with single precision in this lowering is correct. My guess is that PyTorch + CUDA won't do double precision accumulation though.

AFAICT PyTorch uses f32 for this computation instead of f64. Why do we need f64? Since it is causing a major regression I think we should revert this patch.

We don't want an option to control f32 or f64 here. We should just be consistent with PyTorch's behavior.

In short, agreed that PyTorch's behavior should have the final say. If they are no longer doing double precision accumulation for these ops then we can happily drop this patch. For context, this precision issue first showed up for aten.var.correction with V-Diffusion, I'm unsure which model had problems for the mean.dim op though.

EDIT: I changed the lines I linked to

tanyokwok · 2022-12-05T10:33:26Z

Also summation order by the backend matters here, e.g. say if we want to sum 1000 elements, we might first sum together the elements in groups of 4 and then fully reduce using these partial sums.

@qedawkins
I agree with you that the summation order and backend implementations do matter. Accumulation and computation types are chosen according to backends and algorithms, and data types. The example case you mentioned is just a special case; it's not general(Since it is only selected when reducing squares for a variance on all dimensions). And cuda's implementation is different: ReduceMomentKernel.cu#L13-L42. My two cents:

I think Torch-MLIR conversions on torch-dialect won't tell one which summation order and accumulation types should be used. It should be selected by backends.
The updates in the PR changed the accumulation and computation types of AtenMeanOp, which is inconsistent with what PyTorch does.

cc @vivekkhandelwal1 @silvasean

silvasean · 2022-12-05T11:06:26Z

I agree @tanyokwok. Or, to put it another way: if an e2e test would fail when comparing different upstream PyTorch backends (e.g. CPU vs CUDA), then we do not want to include it in our e2e test suite.

@vivekkhandelwal1 can you revert this patch?

vivekkhandelwal1 · 2022-12-05T12:34:18Z

I tried reverting the .cpp file changes in this patch locally and all the e2e test cases in the patch continue to pass. We should just revert this patch since it does not appear to contain an e2e test that fails before but passes after.

Hi @silvasean, if you run the following test:

class MeanLargeInputModule(torch.nn.Module):
    def __init__(self):
        super().__init__()

    @export
    @annotate_args([
        None,
        ([-1, -1, -1, -1], torch.float32, True),
    ])
    def forward(self, x):
        return torch.ops.aten.mean(x)


@register_test_case(module_factory=lambda: MeanLargeInputModule())
def MeanLargeInputModule_basic(module, tu: TestUtils):
    module.forward(tu.rand(3, 4, 256, 1024, low=100, high=200))

without this patch, then it won't pass. If you still want this patch to be reverted I can do that, or I can add this test for the other ops too.

silvasean · 2022-12-05T13:01:06Z

Yes, I think we want to revert this patch. Does the test pass on CUDA?

vivekkhandelwal1 · 2022-12-05T13:15:12Z

Yes, I think we want to revert this patch. Does the test pass on CUDA?

How can it be tested on CUDA?

silvasean · 2022-12-05T13:16:24Z

We don't have a pre-built e2e config for CUDA, but just write a standalone script that runs the one op on cuda and CPU and compares them.

silvasean · 2022-12-07T17:43:19Z

@vivekkhandelwal1 can you please revert this patch?

powderluv · 2022-12-07T19:04:50Z

Is the revert for lack of CUDA e2e tests (and lack of a GPU CI ?) or the earlier regression by moving to f64 ?
(Either way ok to revert until we figure out the way forward but trying to understand how we want to support CUDA models that are a mainstay of any SOTA model).

silvasean · 2022-12-08T11:09:37Z

My understanding is that CUDA and CPU backends have provide different precision guarantees, and this test enforces the CPU precision guarantee. We generally do not want e2e tests that would fail if comparing PyTorch's native CPU and CUDA. When we have a GPU CI we can add a native torch CUDA e2e test config which enforces this, but for now the policy is enforced manually.

I documented the rationale for removing this here:

https://github.com/llvm/torch-mlir/blob/main/docs/adding_an_e2e_test.md#special-kinds-of-tests

The testing of functions with special numerical precision considerations can also be tricky. Our rule of thumb is that if a test would fail across two upstream PyTorch backends (e.g. CPU and CUDA) due to different numerical precision choices, then it should not be included in our e2e test suite. See https://github.com/llvm/torch-mlir/pull/1605 for context.

vivekkhandelwal1 · 2022-12-09T14:01:13Z

@silvasean @tanyokwok #1692 is merged now.

vivekkhandelwal1 marked this pull request as ready for review November 17, 2022 12:52

vivekkhandelwal1 requested review from ramiro050 and pashu123 November 17, 2022 12:52

pashu123 approved these changes Nov 17, 2022

View reviewed changes

ramiro050 requested changes Nov 17, 2022

View reviewed changes

pashu123 requested changes Nov 18, 2022

View reviewed changes

pashu123 reviewed Nov 18, 2022

View reviewed changes

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp Outdated Show resolved Hide resolved

vivekkhandelwal1 force-pushed the aten-var_mean branch from fbed9b1 to e912166 Compare November 18, 2022 05:19

[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs

850e0a5

This commit fixes the aten.mean and aten.mean.dim op decomposition for supporting large-sized inputs. This commit also fixes the formatting for the file stats.py Signed-Off By: Vivek Khandelwal<[email protected]>

vivekkhandelwal1 force-pushed the aten-var_mean branch from e912166 to 850e0a5 Compare November 18, 2022 12:57

vivekkhandelwal1 requested review from ramiro050 and pashu123 November 18, 2022 13:09

ramiro050 approved these changes Nov 21, 2022

View reviewed changes

vivekkhandelwal1 merged commit 55c7e66 into llvm:main Nov 22, 2022

vivekkhandelwal1 deleted the aten-var_mean branch November 22, 2022 03:08

vivekkhandelwal1 mentioned this pull request Nov 22, 2022

MeanDim Tests causing a crash for MHLO #1626

Closed

qedawkins mentioned this pull request Nov 22, 2022

Force double precision in aten.mean.dim #1308

Closed

tanyokwok pushed a commit to alibaba/BladeDISC that referenced this pull request Dec 5, 2022

drop llvm/torch-mlir#1605

e883151

tanyokwok pushed a commit to alibaba/BladeDISC that referenced this pull request Dec 5, 2022

drop llvm/torch-mlir#1605 (#834)

2867368

tanyokwok mentioned this pull request Dec 5, 2022

[MLIR][TORCH] Fix input type conversion for the mean ops #1672

Closed

vivekkhandelwal1 mentioned this pull request Dec 7, 2022

Revert "[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs" #1692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs #1605

[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs #1605

vivekkhandelwal1 commented Nov 17, 2022

pashu123 left a comment

pashu123 left a comment

ramiro050 commented Nov 18, 2022

vivekkhandelwal1 commented Nov 20, 2022

vivekkhandelwal1 commented Nov 21, 2022

ramiro050 left a comment

tanyokwok commented Nov 29, 2022

silvasean commented Nov 29, 2022

vivekkhandelwal1 commented Nov 29, 2022 •

edited

Loading

silvasean commented Nov 29, 2022

qedawkins commented Nov 30, 2022 •

edited

Loading

tanyokwok commented Dec 5, 2022 •

edited

Loading

silvasean commented Dec 5, 2022

vivekkhandelwal1 commented Dec 5, 2022

silvasean commented Dec 5, 2022

vivekkhandelwal1 commented Dec 5, 2022

silvasean commented Dec 5, 2022

silvasean commented Dec 7, 2022

powderluv commented Dec 7, 2022

silvasean commented Dec 8, 2022 •

edited

Loading

vivekkhandelwal1 commented Dec 9, 2022

[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs #1605

[MLIR][TORCH] Fix mean and mean.dim op for large-sized inputs #1605

Conversation

vivekkhandelwal1 commented Nov 17, 2022

pashu123 left a comment

Choose a reason for hiding this comment

pashu123 left a comment

Choose a reason for hiding this comment

ramiro050 commented Nov 18, 2022

vivekkhandelwal1 commented Nov 20, 2022

vivekkhandelwal1 commented Nov 21, 2022

ramiro050 left a comment

Choose a reason for hiding this comment

tanyokwok commented Nov 29, 2022

silvasean commented Nov 29, 2022

vivekkhandelwal1 commented Nov 29, 2022 • edited Loading

silvasean commented Nov 29, 2022

qedawkins commented Nov 30, 2022 • edited Loading

tanyokwok commented Dec 5, 2022 • edited Loading

silvasean commented Dec 5, 2022

vivekkhandelwal1 commented Dec 5, 2022

silvasean commented Dec 5, 2022

vivekkhandelwal1 commented Dec 5, 2022

silvasean commented Dec 5, 2022

silvasean commented Dec 7, 2022

powderluv commented Dec 7, 2022

silvasean commented Dec 8, 2022 • edited Loading

vivekkhandelwal1 commented Dec 9, 2022

vivekkhandelwal1 commented Nov 29, 2022 •

edited

Loading

qedawkins commented Nov 30, 2022 •

edited

Loading

tanyokwok commented Dec 5, 2022 •

edited

Loading

silvasean commented Dec 8, 2022 •

edited

Loading