TorchToStablehlo lit test failure with scatter.mlir #1962

sjain-stanford · 2024-01-28T22:15:09Z

What happened?

Hi ⁠stablehlo, we need some help on the torch-mlir side of things with a lit test error we hit when bumping stablehlo that hits an unexpected error: Expects non-empty reduction block for type inference

PR: llvm/torch-mlir#2821
Failing CI: https://github.com/llvm/torch-mlir/actions/runs/7687361450/job/20947291128?pr=2821

Here's the error:

bazel run @torch-mlir//test/Conversion:TorchToStablehlo/scatter.mlir.test 

...external/torch-mlir/test/Conversion/TorchToStablehlo/scatter.mlir
within split at <stdin>:1 offset :33:8: error: unexpected error: Expects non-empty reduction block for type inference                                                                               
  %0 = torch.aten.scatter.src %arg0, %int0, %arg1, %arg2 : !torch.vtensor<[?,?],si64>, !torch.int, !torch.vtensor<[?,?],si64>, !torch.vtensor<[?,?],si64> -> !torch.vtensor<[?,?],si64>             
       ^                                                                                                             
LLVM ERROR: Failed to infer result type(s).

Seems like this check was added in #1918 but I lack the historic context to know what needs fixing.

Steps to reproduce your issue

Clone https://github.com/sjain-stanford/torch-mlir and checkout sambhav/bazel_fix branch
Fetch git submodules: git submodule update --init --progress
Launch docker: ./utils/bazel/docker/run_docker.sh
Test: bazel run @torch-mlir//test/Conversion:TorchToStablehlo/scatter.mlir.test

Version information

No response

The text was updated successfully, but these errors were encountered:

sdasgup3 · 2024-01-29T17:52:46Z

Hi @sjain-stanford
Thanks for posting the issue!

#1869 allows the block arguments of reduce op to have different element types than that of the input arguments of reduce op and the output type of the reduce op has to equal to those block arguments. As a consequence the output type of reduce op can no longer be inferred from the operand types: The type inference needs to somehow have the information regarding the block-arguments's element type. The check that was added at #1918 is a guard against inferring reduce op type with empty blocks, which is what is happening at https://sourcegraph.com/github.com/llvm/torch-mlir/-/blob/lib/Conversion/TorchToStablehlo/GatherScatter.cpp?L336.

We encountered a similar problem with creating mhlo reduce op with empty reduction block and the solution we proposed was to create a custom builder here and here.

In StableHLO we do not have such builder as we do not see a use-case for that until now with the issue you posted. IMO the solution will be along the similar lines and I will be working on that.

I will keep you posted on updates.

sjain-stanford · 2024-01-29T18:12:54Z

Thanks @sdasgup3 for looking into this. PLMK if/when you have a patch-set you'd like me to test on the torch-mlir side of things, happy to help.

sdasgup3 · 2024-01-29T20:54:13Z

@sjain-stanford

Please find the patch at #1965.

On the torch-mlir side, we need the following patch to make sure that the newly supported builder is invoked.

--- a/lib/Conversion/TorchToStablehlo/GatherScatter.cpp
+++ b/lib/Conversion/TorchToStablehlo/GatherScatter.cpp
@@ -334,7 +334,8 @@ LogicalResult ConvertAtenOp<AtenEmbeddingBagPaddingIdxOp>::matchAndRewrite(
     return failure();
 
   auto stablehloReduceOp = rewriter.create<stablehlo::ReduceOp>(
-      op.getLoc(), gatherOutput, initValue, rewriter.getI64TensorAttr({0}));
+      op.getLoc(), gatherOutput, initValue, rewriter.getDenseI64ArrayAttr({0}),
+      elementTy);
 
   Region &region = stablehloReduceOp.getBody();
   Block &block = region.emplaceBlock();
@@ -666,7 +667,8 @@ LogicalResult ConvertAtenOp<AtenScatterSrcOp>::matchAndRewrite(
       /*indexVectorDim=*/indexVecDim);
 
   auto stablehloScatterOp = rewriter.create<stablehlo::ScatterOp>(
-      loc, input, scatterIndicies, src, scatterDimensionNumbers, false, false);
+      loc, inputType, input, scatterIndicies, src, scatterDimensionNumbers,
+      false, false);
 
   // config update computation function: just return the element from src.
   Block &block = stablehloScatterOp.getUpdateComputation().emplaceBlock();

Please let us know if this works at your end.

sjain-stanford · 2024-01-29T21:44:31Z

@sdasgup3 Thanks for the quick fix. Confirming the test passes locally with your patches 👍🏻 (cherry-picked the stablehlo changes from your fork but will re-apply once it lands on main)

sdasgup3 · 2024-01-30T23:27:59Z

@sjain-stanford Please let us know if the merged #1965 worked at your end.

sjain-stanford · 2024-01-31T00:04:33Z

@sdasgup3 any idea how this piece of code needs to be refactored?

  DenseIntElementsAttr bcast_attr = DenseIntElementsAttr::get(
      RankedTensorType::get({static_cast<long int>(bcastDims.size())},
                            rewriter.getI64Type()),
      bcastDims);
  auto bcast_op = rewriter.create<stablehlo::BroadcastInDimOp>(
      op->getLoc(), outType, input, bcast_attr);

After bumping to stablehlo@HEAD, I hit some new build errors like so:

external/llvm-project/mlir/include/mlir/IR/Builders.h:508:11: error: no matching member function for call to 'build'                                                                                
    OpTy::build(*this, state, std::forward<Args>(args)...);                                                                                                                                         
    ~~~~~~^~~~~                                                                                                                                                                                     
external/torch-mlir/lib/Conversion/TorchToStablehlo/StablehloLegalizeUtils.cpp:248:28: note: in instantiation of function template specialization 'mlir::OpBuilder::create<mlir::stablehlo::BroadcastInDimOp, mlir::TensorType &, mlir::Value &, mlir::DenseIntElementsAttr &>' requested here                                                                                                          
  auto bcast_op = rewriter.create<stablehlo::BroadcastInDimOp>(                                                                                                                                     
                           ^                                                                                                                                                                        
bazel-out/k8-fastbuild/bin/external/stablehlo/stablehlo/dialect/StablehloOps.h.inc:2438:15: note: candidate function not viable: no known conversion from 'mlir::DenseIntElementsAttr' to '::mlir::DenseI64ArrayAttr' (aka 'DenseArrayAttrImpl<long>') for 5th argument                                                                                                                                 
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type resultType0, ::mlir::Value operand, ::mlir::DenseI64ArrayAttr broadcast_dimensions);              
              ^                                                                                                                                                                                     
bazel-out/k8-fastbuild/bin/external/stablehlo/stablehlo/dialect/StablehloOps.h.inc:2440:15: note: candidate function not viable: no known conversion from 'mlir::DenseIntElementsAttr' to '::llvm::ArrayRef<int64_t>' (aka 'ArrayRef<long>') for 5th argument                                                                                                                                           
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::Type resultType0, ::mlir::Value operand, ::llvm::ArrayRef<int64_t> broadcast_dimensions);              
              ^                                                                                                                                                                                     
bazel-out/k8-fastbuild/bin/external/stablehlo/stablehlo/dialect/StablehloOps.h.inc:2439:15: note: candidate function not viable: no known conversion from 'mlir::DenseIntElementsAttr' to '::mlir::DenseI64ArrayAttr' (aka 'DenseArrayAttrImpl<long>') for 5th argument                                                                                                                                 
  static void build(::mlir::OpBuilder &odsBuilder, ::mlir::OperationState &odsState, ::mlir::TypeRange resultTypes, ::mlir::Value operand, ::mlir::DenseI64ArrayAttr broadcast_dimensions);         
              ^

sdasgup3 · 2024-01-31T00:45:00Z

@sjain-stanford
Can you try this?

 auto bcast_op = rewriter.create<stablehlo::BroadcastInDimOp>(
      op->getLoc(), outType, input,
      rewriter.getI64TensorAttr(bcastDims));

cc @mlevesquedion

sjain-stanford · 2024-01-31T15:48:03Z

rewriter.getDenseI64ArrayAttr(bcastDims) seems to fix it, and realized the error message was pretty specific/helpful. A lot more conversions needed an update, I think this is all of it: llvm/torch-mlir@a0ad9a8, please take a look to verify it looks good. The build works, I have one lit test that needs an update.

sjain-stanford · 2024-01-31T17:30:07Z

The CMake build now fails with setup_sanitizers. Probably related to #1959 (cc: @fzakaria , @mlevesquedion ). I see the default for STABLEHLO_ENABLE_SANITIZER being OFF, but this code probably needs to be enclosed in an if-endif:

include(SetupSanitizers)
setup_sanitizers()

Error:

  -- Building StableHLO as an external LLVM project
  -- Building with -fPIC
  CMake Error at /_work/torch-mlir/torch-mlir/externals/stablehlo/CMakeLists.txt:147 (include):
    include could not find requested file:
  
      SetupSanitizers
  
  
  CMake Error at /_work/torch-mlir/torch-mlir/externals/stablehlo/CMakeLists.txt:148 (setup_sanitizers):
    Unknown CMake command "setup_sanitizers".
  
  
  -- Configuring incomplete, errors occurred!

fzakaria · 2024-01-31T18:16:44Z

I'll take a look.

sjain-stanford · 2024-01-31T18:19:47Z

This fixes it: #1972

@fzakaria

Fixes #1962 (comment) Context: This breaks torch-mlir's CMake build when bumping stablehlo. The CMake build [fails](https://github.com/llvm/torch-mlir/actions/runs/7729517677/job/21072878412?pr=2821) with setup_sanitizers. Probably related to #1959 (cc: @fzakaria , @mlevesquedion ). I see the default for `STABLEHLO_ENABLE_SANITIZER` being `OFF`, but this code probably needs to be enclosed in an `if-endif`: ``` include(SetupSanitizers) setup_sanitizers() ``` Error: ``` -- Building StableHLO as an external LLVM project -- Building with -fPIC CMake Error at /_work/torch-mlir/torch-mlir/externals/stablehlo/CMakeLists.txt:147 (include): include could not find requested file: SetupSanitizers CMake Error at /_work/torch-mlir/torch-mlir/externals/stablehlo/CMakeLists.txt:148 (setup_sanitizers): Unknown CMake command "setup_sanitizers". -- Configuring incomplete, errors occurred! ```

sdasgup3 self-assigned this Jan 29, 2024

sdasgup3 mentioned this issue Jan 29, 2024

Bump stablehlo to openxla/stablehlo@fd52182f76cadb82f2064fe5fc49a4fb4347a826 llvm/torch-mlir#2821

Merged

sjain-stanford mentioned this issue Jan 31, 2024

Make setup_sanitizer include optional in CMakeLists.txt #1972

Merged

GleasonK closed this as completed in #1972 Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchToStablehlo lit test failure with scatter.mlir #1962

TorchToStablehlo lit test failure with scatter.mlir #1962

sjain-stanford commented Jan 28, 2024 •

edited

Loading

sdasgup3 commented Jan 29, 2024

sjain-stanford commented Jan 29, 2024

sdasgup3 commented Jan 29, 2024 •

edited

Loading

sjain-stanford commented Jan 29, 2024

sdasgup3 commented Jan 30, 2024

sjain-stanford commented Jan 31, 2024

sdasgup3 commented Jan 31, 2024 •

edited

Loading

sjain-stanford commented Jan 31, 2024

sjain-stanford commented Jan 31, 2024

fzakaria commented Jan 31, 2024

sjain-stanford commented Jan 31, 2024

TorchToStablehlo lit test failure with scatter.mlir #1962

TorchToStablehlo lit test failure with scatter.mlir #1962

Comments

sjain-stanford commented Jan 28, 2024 • edited Loading

What happened?

Steps to reproduce your issue

Version information

sdasgup3 commented Jan 29, 2024

sjain-stanford commented Jan 29, 2024

sdasgup3 commented Jan 29, 2024 • edited Loading

sjain-stanford commented Jan 29, 2024

sdasgup3 commented Jan 30, 2024

sjain-stanford commented Jan 31, 2024

sdasgup3 commented Jan 31, 2024 • edited Loading

sjain-stanford commented Jan 31, 2024

sjain-stanford commented Jan 31, 2024

fzakaria commented Jan 31, 2024

sjain-stanford commented Jan 31, 2024

sjain-stanford commented Jan 28, 2024 •

edited

Loading

sdasgup3 commented Jan 29, 2024 •

edited

Loading

sdasgup3 commented Jan 31, 2024 •

edited

Loading