Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark all TTIR and TTNN ops as pure #1481

Merged
merged 1 commit into from
Dec 4, 2024
Merged

Mark all TTIR and TTNN ops as pure #1481

merged 1 commit into from
Dec 4, 2024

Conversation

LPanosTT
Copy link
Contributor

@LPanosTT LPanosTT commented Dec 3, 2024

  • Added MLIR's Pure attribute to all TTIR and TTNN ops since they are pure SSA ops.
  • MLIR's built-in RemoveDeadValues pass can now trim dead code off of TTIR and TTNN modules effectively.

@svuckovicTT
Copy link
Contributor

Can you update TTNN_EmptyOp in TTNNOps.td?

// Note: NoMemoryEffect is used to indicate that operation can be removed if it is not used.
// Removal of this operation is done by the dead code elimination pass (RemoveDeadValuesPass).
def TTNN_EmptyOp : TTNN_Op<"empty", [NoMemoryEffect]> {

->

def TTNN_EmptyOp : TTNN_Op<"empty"> {

@LPanosTT LPanosTT force-pushed the lpanos/mark_ops_pure branch from 704c7dc to 4b52f8e Compare December 4, 2024 15:04
@LPanosTT
Copy link
Contributor Author

LPanosTT commented Dec 4, 2024

@sasha, I've removed the comment and redundant NoMemoryEffect trait 👍

@LPanosTT LPanosTT enabled auto-merge (squash) December 4, 2024 15:20
@LPanosTT LPanosTT merged commit 7f6046e into main Dec 4, 2024
21 checks passed
Copy link
Contributor

@azecevicTT azecevicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all of our ops are Pure, but not all. In my opinion the best solution would be introducing new class just below TTIR_Op, something like TTIR_PureOp where all other classes would 'extend' that new class, and ops that aren't pure would be defined on the TTIR_Op class. Analogous for the TTNN dialect.

@@ -38,6 +39,6 @@ def TTIR_Dialect : Dialect {
//===----------------------------------------------------------------------===//

class TTIR_Op<string mnemonic, list<Trait> traits = []> :
Op<TTIR_Dialect, mnemonic, traits>;
Op<TTIR_Dialect, mnemonic, !listconcat(traits, [Pure])>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Pure trait be the top-level trait? I'm thinking about alloc and dealloc. Every dealloc will trivially be removed since it doesn't produce the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we probably should adopt MemAlloc and MemFree traits for those respectively.

@@ -45,6 +46,6 @@ def TTNN_Dialect : Dialect {
//===----------------------------------------------------------------------===//

class TTNN_Op<string mnemonic, list<Trait> traits = []> :
Op<TTNN_Dialect, mnemonic, !listconcat(traits, [TTNN_OpModelInterface, TTNN_WorkaroundInterface])>;
Op<TTNN_Dialect, mnemonic, !listconcat(traits, [Pure, TTNN_OpModelInterface, TTNN_WorkaroundInterface])>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@sdjordjevicTT
Copy link
Contributor

@LPanosTT, could you please address @azecevicTT's comments? I know the comment is after the merge, but it is still a valid concern, and we should address it sooner rather than later.

@LPanosTT
Copy link
Contributor Author

LPanosTT commented Dec 9, 2024

@azecevicTT apologies for not responding earlier, but I agree with both you and nick. We may aswell be more precise in modelling memory effects if we’re going to in the first place

@azecevicTT
Copy link
Contributor

@nsmithtt @LPanosTT I think we have an even bigger problem because of DPS. Take this example:

module {
  func.func @forward(%arg0: tensor<64x128xbf16>) -> tensor<128x64xbf16> {
    %0 = tensor.empty() : tensor<128x64xbf16>
    %1 = "ttir.transpose"(%arg0, %0) <{dim0 = 0 : si32, dim1 = 1 : si32, operand_constraints = [#any_device, #any_device]}> : (tensor<64x128xbf16>, tensor<128x64xbf16>) -> tensor<128x64xbf16>
    return %0 : tensor<128x64xbf16>
  }
}

Semantically %1 should be alias of %0 if I understand the idea behind the DPS correctly, so return %0 should return %arg0 transposed. But because we are treating ttir.transpose as Pure, and %1 is unused SSA value it can will be removed, so return %0 will return result of tensor.empty which is incorrect.
We need some kind of alias analysis before we can remove values with no uses.

@sdjordjevicTT
Copy link
Contributor

@nsmithtt @LPanosTT I think we have an even bigger problem because of DPS. Take this example:

module {
  func.func @forward(%arg0: tensor<64x128xbf16>) -> tensor<128x64xbf16> {
    %0 = tensor.empty() : tensor<128x64xbf16>
    %1 = "ttir.transpose"(%arg0, %0) <{dim0 = 0 : si32, dim1 = 1 : si32, operand_constraints = [#any_device, #any_device]}> : (tensor<64x128xbf16>, tensor<128x64xbf16>) -> tensor<128x64xbf16>
    return %0 : tensor<128x64xbf16>
  }
}

Semantically %1 should be alias of %0 if I understand the idea behind the DPS correctly, so return %0 should return %arg0 transposed. But because we are treating ttir.transpose as Pure, and %1 is unused SSA value it can will be removed, so return %0 will return result of tensor.empty which is incorrect. We need some kind of alias analysis before we can remove values with no uses.

I believe we need to revert this and consider the proper solution. DPS ops are not considered pure by default.

@nsmithtt
Copy link
Contributor

nsmithtt commented Dec 10, 2024

@nsmithtt @LPanosTT I think we have an even bigger problem because of DPS. Take this example:

module {
  func.func @forward(%arg0: tensor<64x128xbf16>) -> tensor<128x64xbf16> {
    %0 = tensor.empty() : tensor<128x64xbf16>
    %1 = "ttir.transpose"(%arg0, %0) <{dim0 = 0 : si32, dim1 = 1 : si32, operand_constraints = [#any_device, #any_device]}> : (tensor<64x128xbf16>, tensor<128x64xbf16>) -> tensor<128x64xbf16>
    return %0 : tensor<128x64xbf16>
  }
}

Semantically %1 should be alias of %0 if I understand the idea behind the DPS correctly, so return %0 should return %arg0 transposed. But because we are treating ttir.transpose as Pure, and %1 is unused SSA value it can will be removed, so return %0 will return result of tensor.empty which is incorrect. We need some kind of alias analysis before we can remove values with no uses.

This is an ill formed program if the intention is to do what you stated. DPS uses SSA form for exactly this reason, if you don't use the result of the op then it should rightfully erase it.

The correct way to express this semantically in ttir (i.e. DPS) is:

return %1 : tensor<128x64xbf16>

See this section of the bufferization doc https://mlir.llvm.org/docs/Bufferization/#destination-passing-style.

Relevant paragraph:

DPS exists in itself independently of bufferization and is tied to SSA semantics: many ops are “updating” a part of their input SSA variables. For example the LLVM instruction insertelement is inserting an element inside a vector. Since SSA values are immutable, the operation returns a copy of the input vector with the element inserted.

So it follows, in your example, returning %0 returns the tensor before it was "copied" to the result. SSA form is a very nice convenience for generically applying various graph passes which is why DPS still attempts to stay SSA, despite looking a little funky.

@azecevicTT
Copy link
Contributor

@nsmithtt Thanks for the clarification, it makes more sense now. There are even some examples here https://mlir.llvm.org/docs/Dialects/TensorOps with DPS + Pure. We still need to address the cases of alloc and dealloc (and maybe some other ops), but majority of ops should still stay Pure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants