-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the deallocation problem for sequence op #1673
Conversation
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
@chentong319 ready for reviewing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand the scheme, which looks solid, but left me with a few questions. Will review this until I have a better overall understanding.
src/Dialect/Krnl/Krnl.td
Outdated
}]; | ||
let arguments = (ins AnyMemRef:$seq, | ||
Index:$index); | ||
Index:$index, | ||
DefaultValuedAttr<SI64Attr, "0">:$copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy feels like a boolean attribute, see
DefaultValuedAttr<BoolAttr, "true">:$simdize,
for example
src/Dialect/Krnl/Krnl.td
Outdated
When the attribute 'copy' is 0: the extracted element is directly returned | ||
When the attribute 'copy' is 1: the extracted element is copied and then return. | ||
The returned element is regarded as allocated by this Op because the | ||
element is created (copied) when inserted into the sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentence above is hard to parse.
src/Dialect/Krnl/Krnl.td
Outdated
When the attribute 'copy' is 1: the extracted element is copied and then return. | ||
The returned element is regarded as allocated by this Op because the | ||
element is created (copied) when inserted into the sequence. | ||
Attribute 'copy' can be 0 when analysis or code generation guarantees |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expand a bit more on this. "If an optimization pass can determine that each element is only extracted once, then the copy=false can be used"
Is it required that each element be extracted once? If one is never extracted, should it still work? If so, "each element is extracted at most once"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every element is extracted exactly once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, if an element is not used out of a list, why would it disable the copy=0
. IMO, the correct statement is "at most once"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The element needs to be extracted or erased exactly once from the sequence. Otherwise, the memory for the element will be leaked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an element is not used, you imply that copy cannot be zero which makes no sense either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, if a value is used exactly twice, would it make sense to have the first use with copy=1 but the second/last use with copy=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an element is not used, you imply that copy cannot be zero which makes no sense either.
An element can be accessed only through KrnlSeqExtract. If there is no KrnlSeqExtract, there is no copy attribute.
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
Signed-off-by: chentong319 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better, suggested you use a method in our krnl create class, otherwise its all good after a spell check
@@ -311,8 +311,9 @@ struct ONNXLoopOpLowering : public ConversionPattern { | |||
rewriter.setInsertionPointToStart( | |||
®ionOp.bodyRegion().front()); | |||
Value origIV = loopInd[0]; | |||
auto src = rewriter.create<KrnlSeqExtractOp>( | |||
loc, seqElementType, output, origIV); | |||
auto src = rewriter.create<KrnlSeqExtractOp>(loc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add a createKrnl method to generate a KrnlSeqExtract op where you can then skip the loc
, and could accept an int for the attribute (and hide the IntegerAttr::get(rewriter.getIntegerType(1, false), 0))
inside that method.
just make the code a bit cleaner, obviously not mandatory but it would be nice.
@@ -347,8 +347,9 @@ void FrontendToKrnlLoweringPass::runOnOperation() { | |||
|
|||
// If `emitDealloc` is turned off, make sure we don't have buffer deallocation | |||
// at this level. Will use MLIR buffer-deallocation for this purpose instead. | |||
if (!emitDealloc) | |||
target.addIllegalOp<mlir::memref::DeallocOp>(); | |||
// However, since the SequenceErase needs to emit memref dealloc, the previous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now, but I think we are pretty sure by now not to use the old buffer scheme. At sometimes, we could do a cleanup of the code, and eliminating all that is not needed anymore. Definitely not for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs to comment out the code because dealloc will be generated by SequenceErase.
IndexExpr fixedPosition = positionIE + boundIE; | ||
positionIE = IndexExpr::select(condIE, fixedPosition, positionIE); | ||
|
||
Value outputVal = rewriter.create<KrnlSeqExtractOp>(loc, outputMemRefType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you created a method to add this op, use it here too with create.krnl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of the dialect builder, especially when there is nothing abstracted out by builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can abstract the handling of the int attribute, which I personally don't like to remember this awkward sequence... but if you don't want to do it, it is fine.
@@ -66,7 +66,7 @@ struct ONNXSequenceEraseOpLowering : public ConversionPattern { | |||
positionIE = IndexExpr::select(positionIE < 0, correctionIE, positionIE); | |||
} | |||
|
|||
// Copy before the insert | |||
// Copy the elements before the position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great usage of consistent language, big thanks
rewriter.replaceOp(op, output); | ||
return success(); | ||
} else { | ||
if (!output.getType().isa<MemRefType>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like an assert :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, we will handle the non MemRefType case. I usually use unreachable for that purpose: not a semantic error but an implementation limitation. Not sure about the convention in llvm.
// is needed. | ||
|
||
// Copy elements before the insertion position | ||
rewriter.create<scf::ForOp>(loc, create.math.constantIndex(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just add in the comment above why we use scf for instead of affine?
To me, it feels like it could be an affine as it looks pretty regular; is it called after lowering of affine to scf?
A small justification would explain your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the limitation of our current driver: lowering affine is before deallocation pass. I will add a comment.
: ConversionPattern( | ||
typeConverter, KrnlSeqInsertOp::getOperationName(), 1, context) {} | ||
|
||
LogicalResult matchAndRewrite(Operation *op, ArrayRef<Value> operands, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you could just explain in one sentence what the code pattern you generate is.
it seems to be:
Take the original list, grow the list by one element, copy all the elements before the insertion point, add the new one, and then copy all the elements after the insertion point.
src/Dialect/Krnl/Krnl.td
Outdated
// a sequence. 'copy' here means to allocate a new memref and copy the content | ||
// of the memref. The copy is a costly op to maintain the SSA requirement. | ||
// To reduce the overhead of copy, elements are not copied when | ||
// a new sequence is created from an exist sequence. Hower, when an element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hower
spelling. FYI, the "Code Spell Checker" from "Street Side Software" extension in VSCode works pretty well for me. It knows how to detect words in variables, comments,... I find it useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will try them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending spelling check
Signed-off-by: chentong319 <[email protected]>
Jenkins Linux ppc64le Build #6653 [push] Fix the deallocation pro... started at 14:12 |
Jenkins Linux amd64 Build #7588 [push] Fix the deallocation pro... started at 13:10 |
Jenkins Linux s390x Build #7604 [push] Fix the deallocation pro... started at 14:11 |
Jenkins Linux amd64 Build #7588 [push] Fix the deallocation pro... passed after 1 hr 9 min |
Jenkins Linux ppc64le Build #6653 [push] Fix the deallocation pro... passed after 1 hr 33 min |
Jenkins Linux s390x Build #7604 [push] Fix the deallocation pro... passed after 1 hr 37 min |
* compiled and run Signed-off-by: chentong319 <[email protected]> * restore CompilerPasses.cpp * handle copy Signed-off-by: chentong319 <[email protected]> * use attr Signed-off-by: chentong319 <[email protected]> * lit test Signed-off-by: chentong319 <[email protected]> * dealloc test Signed-off-by: chentong319 <[email protected]> * doc Signed-off-by: chentong319 <[email protected]> * format Signed-off-by: chentong319 <[email protected]> * ordering Signed-off-by: chentong319 <[email protected]> * format Signed-off-by: chentong319 <[email protected]> * change attr Signed-off-by: chentong319 <[email protected]> * doc Signed-off-by: chentong319 <[email protected]> * fix erase Signed-off-by: chentong319 <[email protected]> * doc Signed-off-by: chentong319 <[email protected]> * format Signed-off-by: chentong319 <[email protected]> * comment Signed-off-by: chentong319 <[email protected]> * response Signed-off-by: chentong319 <[email protected]> Signed-off-by: chentong319 <[email protected]> Signed-off-by: Stella Stamenova <[email protected]>
Since the sequence op is implemented as store pointers of element in memref, the buffer deallocation pass, which assumes no pointer, can not work correctly with previous sequence op lowering.
In this PR, the load or store of the pointer of sequence element is encapsulated in krnl sequence ops. These Ops are lowered after bufferization::deallocation pass. These krnl sequence ops defines the interface for bufferazation so that deallocation can be generated correctly.
A lit test to check deallocation result is added.