aten::empty_like #2654

apbose · 2024-02-23T01:23:00Z

No description provided.

apbose · 2024-02-27T07:14:12Z

Had a doubt on this one. Does this require a test. In the following test:

def test_lowering_empty_like(self):
        class emptyLike(torch.nn.Module):
            def __init__(self, *args, **kwargs) -> None:
                super().__init__(*args, **kwargs)

            def forward(self, x):
                y = torch.ops.aten.empty_like.default(x)
                return y

        # Operations expected to be removed in the traced graph after decompositions
        expected_ops = {}
        unexpected_ops = {torch.ops.aten.empty_like.default}

        inputs = [torch.randn(2, 3).cuda()]

        #inputs = [torch.empty((2,3), dtype=torch.int32, device = 'cuda')]

        fx_graph = torch.fx.symbolic_trace((emptyLike()))
        unexpected_ops_seen, expected_ops_unseen = lower_graph_testing(
            fx_graph,
            inputs,
            expected_ops=expected_ops,
            unexpected_ops=unexpected_ops,
            min_block_size=1,
        )

        torch._dynamo.reset()

        # Validate that the results between Torch and Torch-TRT are similar
        optimized_model = torch_tensorrt.compile(
            fx_graph,
            "torch_compile",
            inputs,
            min_block_size=1,
            pass_through_build_failures=True,
        )
        optimized_model_results = optimized_model(*inputs).detach().cpu()
        torch_model_results = fx_graph(*inputs).detach().cpu()

        max_diff = float(
            torch.max(torch.abs(optimized_model_results - torch_model_results))
        )
        self.assertAlmostEqual(
            max_diff,
            0,
            DECIMALS_OF_AGREEMENT,
            f"empty_like TRT outputs don't match with the original model.",
        )

Is the above required since both the optimized_model torchTRT compiled model and fx_graph will have the same lowering pass applied?
Also when I compile the above I see

  File "/home/abose/Documents/work/torchTRT_empty_2_26/TensorRT/tests/py/dynamo/testing_utilities.py", line 55, in fx_dynamo_testing_backend
    trt_compiled = custom_backend(
  File "/home/abose/Documents/work/torchTRT_empty_2_26/TensorRT/tests/py/dynamo/testing_utilities.py", line 73, in compile_module_testing
    partitioned_module, _ = partitioning.fast_partition(
  File "/home/abose/Documents/work/torchTRT/torch_trt/lib/python3.8/site-packages/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py", line 280, in
partition
    partitioned_graph = partitioner.partition_graph()
  File "/home/abose/Documents/work/torchTRT/torch_trt/lib/python3.8/site-packages/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py", line 197, in
partition_graph
    subgraphs = self.put_nodes_into_subgraphs()
  File "/home/abose/Documents/work/torchTRT/torch_trt/lib/python3.8/site-packages/torch/fx/passes/splitter_base.py", line 805, in put_nodes_into_subgraphs
    raise FxNetSplitterInternalError("Couldn't create subgraphs")
torch._dynamo.exc.BackendCompilerFailed: backend='functools.partial(<function fx_dynamo_testing_backend at 0x7f5c946045e0>, store_intermediate_graphs=[], min_
block_size=1, torch_executed_ops=set(), use_fast_partitioner=True)' raised:
FxNetSplitterInternalError: Couldn't create subgraphs

Is this expected? Is it something to do with no splits happening for the above graph?

gs-olive · 2024-02-27T18:06:39Z

I'm not sure what the empty_like lowers to, but potentially you could add another operation in the nn.Module so that the graph is non-empty. It is likely the case that the graph is completely empty, so the partitioning fails. Since this decomposition is Torch-provided, we shouldn't need a test, however it is important to verify that whatever the operator is lowered to, is also supported by Torch-TRT

apbose · 2024-03-01T01:44:28Z

I do not think that the graph would be empty since it would reduce to the lowering operations of aten::size and torch.Tensor() of the corresponding size getting created. So the graph once lowered should lead to these operations, though I need to confirm.
Ok I will add another operation to the module and verify the lowering.

apbose · 2024-03-06T08:42:45Z

I verified the above test case with three cases-

Case 1:

class emptyLike(torch.nn.Module):
            def __init__(self, *args, **kwargs) -> None:
                super().__init__(*args, **kwargs)

            def forward(self, x):
                y = torch.ops.aten.empty_like.default(x)
                return y

Without decomposition of empty_like
a. Before AOT trace

%l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_]
%empty_like_default : [num_users=1] = call_function[target=torch.ops.aten.empty_like.default](args = (%l_x,), kwargs = {})
 return (empty_like_default,)

b. After AOT trace

%arg0_1 : [num_users=1] = placeholder[target=arg0_1]
%clone : [num_users=1] = call_function[target=torch.ops.aten.clone.default](args = (%arg0_1,), kwargs = {})
%empty_like : [num_users=1] = call_function[target=torch.ops.aten.empty_like.default](args = (%clone,), kwargs = {})
return (empty_like,)

c. After lowering passes

%arg0_1 : [num_users=1] = placeholder[target=arg0_1]
%empty_like : [num_users=1] = call_function[target=torch.ops.aten.empty_like.default](args = (%arg0_1,), kwargs = {})
return (empty_like,)

This is the graph for partition

With the decomposition of empty_like
a. Before AOT trace

%l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_]
%empty_like_default : [num_users=1] = call_function[target=torch.ops.aten.empty_like.default](args = (%l_x,), kwargs = {})
 return (empty_like_default,)

b. After AOT trace

%arg0_1 : [num_users=0] = placeholder[target=arg0_1]
%empty_like : [num_users=1] = call_function[target=torch.ops.aten.empty_permuted.default](args = ([2,3],[0,1]), kwargs = {})
return (empty_like,)

c. After lowering passes

%arg0_1 : [num_users=0] = placeholder[target=arg0_1]
%_frozen_param0 : [num_users=1] = get_attr[target=_frozen_param0]
return (_frozen_param0,)

The above graph partitioning errors out at put_nodes_subgraph of fx _splitterbase since only frozen_params have nodes with users (thats my assumption)

Case 2:

            def __init__(self, *args, **kwargs) -> None:
                super().__init__(*args, **kwargs)

            def forward(self, x):
                c = torch.ops.aten.add(x, x)
                y = torch.ops.aten.empty_like.default(c)
                return y

Like the above case during compilation, if the empty_like is included in the decomposition, the shape of x is extracted statically before runtime and the graph subgraphs is not created.

Case 3:

            def __init__(self, *args, **kwargs) -> None:
                super().__init__(*args, **kwargs)

            def forward(self, x):
                c = torch.ops.aten.add(x, x)
                y = torch.ops.aten.empty_like.default(c)
                d = y + c
                return d

With the decomposition of empty_like
a. Before AOT trace

   %l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_]
   %add : [num_users=2] = call_function[target=torch.ops.aten.add](args = (%l_x_, %l_x_), kwargs = {})
   %empty_like_default : [num_users=1] = call_function[target=torch.ops.aten.empty_like.default](args = (%add,), kwargs = {})
   %add_1 : [num_users=1] = call_function[target=operator.add](args = (%empty_like_default, %add), kwargs = {})
   return (add_1,)

b. After AOT trace

 %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
 %clone : [num_users=1] = call_function[target=torch.ops.aten.clone.default](args = (%arg0_1,), kwargs = {})
 %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%clone, %clone), kwargs = {})
 %empty_permuted : [num_users=1] = call_function[target=torch.ops.aten.empty_permuted.default](args = ([2, 3], [0, 1]), kwa
rgs = {dtype: torch.float32, layout: torch.strided, device: cuda:0, pin_memory: False})
  %add_1 : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%empty_permuted, %add), kwargs = {})
    return (add_1,)

c. After lowering passes

    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg0_1, %arg0_1), kwargs = {})
    %_frozen_param0 : [num_users=1] = get_attr[target=_frozen_param0]
    %add_1 : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%_frozen_param0, %add), kwargs = {})
    return (add_1,)

In the above case since there are additional add nodes with the frozen_param nodes, so the subgraph is created.

Studying the above cases, it seems that the aten lowering is happening during AOT trace. As discussed ideally a test case should not be required. I do not believe empty_permute is supporteded though.

gs-olive · 2024-03-09T02:33:31Z

Thanks for the analysis @apbose - this is very helpful. It looks like the constant_folding lowering pass is freezing the memory for the empty_like operator and storing it as an attribute of the model.

Regarding empty_permuted - it seems like it would be necessary in the dynamic shape case, since we would not be able to freeze the parameter in that case. It seems based on the Core ATen IR that prims.empty_permuted is a core op, so I do think the conversion/evaluation of that would be helpful here, but it could go in a separate PR.

apbose · 2024-03-12T18:20:17Z

Ok I will go ahead and make a separate PR for empty_permute. For now this PR can be merged then?

gs-olive

Looks good to me

facebook-github-bot added the cla signed label Feb 23, 2024

apbose marked this pull request as draft February 23, 2024 01:23

github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Feb 23, 2024

github-actions bot requested a review from gs-olive February 23, 2024 01:23

github-actions bot added component: lowering Issues re: The lowering / preprocessing passes and removed component: tests Issues re: Tests component: conversion Issues re: Conversion stage labels Feb 27, 2024

apbose changed the title ~~aten::empty_like evaluator~~ aten::empty_like Feb 27, 2024

apbose marked this pull request as ready for review February 27, 2024 07:14

narendasan mentioned this pull request Mar 22, 2024

📖 [Story] Converter coverage for GPT2 #2703

Open

apbose force-pushed the empty_evaluator branch from 78c0428 to 507a8e8 Compare April 5, 2024 00:09

apbose added 2 commits April 4, 2024 17:14

Empty_like evaluator

002b35d

Removing empty_like from evaluator and adding it to lowering pass

507a8e8

gs-olive approved these changes Apr 5, 2024

View reviewed changes

apbose merged commit c5b8909 into main Apr 16, 2024
16 of 21 checks passed

peri044 pushed a commit that referenced this pull request Apr 19, 2024

aten::empty_like (#2654)

822e63c

laikhtewari pushed a commit that referenced this pull request May 24, 2024

aten::empty_like (#2654)

d9e6b70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aten::empty_like #2654

aten::empty_like #2654

apbose commented Feb 23, 2024

apbose commented Feb 27, 2024 •

edited

Loading

gs-olive commented Feb 27, 2024

apbose commented Mar 1, 2024

apbose commented Mar 6, 2024 •

edited

Loading

gs-olive commented Mar 9, 2024

apbose commented Mar 12, 2024

gs-olive left a comment

aten::empty_like #2654

aten::empty_like #2654

Conversation

apbose commented Feb 23, 2024

apbose commented Feb 27, 2024 • edited Loading

gs-olive commented Feb 27, 2024

apbose commented Mar 1, 2024

apbose commented Mar 6, 2024 • edited Loading

gs-olive commented Mar 9, 2024

apbose commented Mar 12, 2024

gs-olive left a comment

Choose a reason for hiding this comment

apbose commented Feb 27, 2024 •

edited

Loading

apbose commented Mar 6, 2024 •

edited

Loading