Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the deallocation problem for sequence op #1673

Merged
merged 27 commits into from
Sep 15, 2022

Conversation

chentong319
Copy link
Collaborator

Since the sequence op is implemented as store pointers of element in memref, the buffer deallocation pass, which assumes no pointer, can not work correctly with previous sequence op lowering.
In this PR, the load or store of the pointer of sequence element is encapsulated in krnl sequence ops. These Ops are lowered after bufferization::deallocation pass. These krnl sequence ops defines the interface for bufferazation so that deallocation can be generated correctly.
A lit test to check deallocation result is added.

@AlexandreEichenberger
Copy link
Collaborator

@chentong319 ready for reviewing?

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the scheme, which looks solid, but left me with a few questions. Will review this until I have a better overall understanding.

}];
let arguments = (ins AnyMemRef:$seq,
Index:$index);
Index:$index,
DefaultValuedAttr<SI64Attr, "0">:$copy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy feels like a boolean attribute, see

    DefaultValuedAttr<BoolAttr, "true">:$simdize,

for example

When the attribute 'copy' is 0: the extracted element is directly returned
When the attribute 'copy' is 1: the extracted element is copied and then return.
The returned element is regarded as allocated by this Op because the
element is created (copied) when inserted into the sequence.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence above is hard to parse.

When the attribute 'copy' is 1: the extracted element is copied and then return.
The returned element is regarded as allocated by this Op because the
element is created (copied) when inserted into the sequence.
Attribute 'copy' can be 0 when analysis or code generation guarantees
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expand a bit more on this. "If an optimization pass can determine that each element is only extracted once, then the copy=false can be used"

Is it required that each element be extracted once? If one is never extracted, should it still work? If so, "each element is extracted at most once"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every element is extracted exactly once

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, if an element is not used out of a list, why would it disable the copy=0. IMO, the correct statement is "at most once"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The element needs to be extracted or erased exactly once from the sequence. Otherwise, the memory for the element will be leaked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an element is not used, you imply that copy cannot be zero which makes no sense either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, if a value is used exactly twice, would it make sense to have the first use with copy=1 but the second/last use with copy=0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an element is not used, you imply that copy cannot be zero which makes no sense either.

An element can be accessed only through KrnlSeqExtract. If there is no KrnlSeqExtract, there is no copy attribute.

src/Dialect/Krnl/Krnl.td Show resolved Hide resolved
src/Dialect/Krnl/Krnl.td Show resolved Hide resolved
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better, suggested you use a method in our krnl create class, otherwise its all good after a spell check

@@ -311,8 +311,9 @@ struct ONNXLoopOpLowering : public ConversionPattern {
rewriter.setInsertionPointToStart(
&regionOp.bodyRegion().front());
Value origIV = loopInd[0];
auto src = rewriter.create<KrnlSeqExtractOp>(
loc, seqElementType, output, origIV);
auto src = rewriter.create<KrnlSeqExtractOp>(loc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add a createKrnl method to generate a KrnlSeqExtract op where you can then skip the loc, and could accept an int for the attribute (and hide the IntegerAttr::get(rewriter.getIntegerType(1, false), 0)) inside that method.

just make the code a bit cleaner, obviously not mandatory but it would be nice.

@@ -347,8 +347,9 @@ void FrontendToKrnlLoweringPass::runOnOperation() {

// If `emitDealloc` is turned off, make sure we don't have buffer deallocation
// at this level. Will use MLIR buffer-deallocation for this purpose instead.
if (!emitDealloc)
target.addIllegalOp<mlir::memref::DeallocOp>();
// However, since the SequenceErase needs to emit memref dealloc, the previous
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for now, but I think we are pretty sure by now not to use the old buffer scheme. At sometimes, we could do a cleanup of the code, and eliminating all that is not needed anymore. Definitely not for this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR needs to comment out the code because dealloc will be generated by SequenceErase.

IndexExpr fixedPosition = positionIE + boundIE;
positionIE = IndexExpr::select(condIE, fixedPosition, positionIE);

Value outputVal = rewriter.create<KrnlSeqExtractOp>(loc, outputMemRefType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you created a method to add this op, use it here too with create.krnl

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of the dialect builder, especially when there is nothing abstracted out by builder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we can abstract the handling of the int attribute, which I personally don't like to remember this awkward sequence... but if you don't want to do it, it is fine.

@@ -66,7 +66,7 @@ struct ONNXSequenceEraseOpLowering : public ConversionPattern {
positionIE = IndexExpr::select(positionIE < 0, correctionIE, positionIE);
}

// Copy before the insert
// Copy the elements before the position
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great usage of consistent language, big thanks

rewriter.replaceOp(op, output);
return success();
} else {
if (!output.getType().isa<MemRefType>())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like an assert :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, we will handle the non MemRefType case. I usually use unreachable for that purpose: not a semantic error but an implementation limitation. Not sure about the convention in llvm.

// is needed.

// Copy elements before the insertion position
rewriter.create<scf::ForOp>(loc, create.math.constantIndex(0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just add in the comment above why we use scf for instead of affine?
To me, it feels like it could be an affine as it looks pretty regular; is it called after lowering of affine to scf?
A small justification would explain your choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the limitation of our current driver: lowering affine is before deallocation pass. I will add a comment.

: ConversionPattern(
typeConverter, KrnlSeqInsertOp::getOperationName(), 1, context) {}

LogicalResult matchAndRewrite(Operation *op, ArrayRef<Value> operands,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could just explain in one sentence what the code pattern you generate is.

it seems to be:
Take the original list, grow the list by one element, copy all the elements before the insertion point, add the new one, and then copy all the elements after the insertion point.

// a sequence. 'copy' here means to allocate a new memref and copy the content
// of the memref. The copy is a costly op to maintain the SSA requirement.
// To reduce the overhead of copy, elements are not copied when
// a new sequence is created from an exist sequence. Hower, when an element
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hower spelling. FYI, the "Code Spell Checker" from "Street Side Software" extension in VSCode works pretty well for me. It knows how to detect words in variables, comments,... I find it useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will try them.

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending spelling check

@chentong319 chentong319 merged commit c438b12 into onnx:main Sep 15, 2022
@chentong319 chentong319 deleted the seq-krnl-2 branch September 15, 2022 18:09
@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #6653 [push] Fix the deallocation pro... started at 14:12

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #7588 [push] Fix the deallocation pro... started at 13:10

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #7604 [push] Fix the deallocation pro... started at 14:11

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #7588 [push] Fix the deallocation pro... passed after 1 hr 9 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #6653 [push] Fix the deallocation pro... passed after 1 hr 33 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #7604 [push] Fix the deallocation pro... passed after 1 hr 37 min

sstamenova pushed a commit to sstamenova/onnx-mlir that referenced this pull request Sep 21, 2022
* compiled and run

Signed-off-by: chentong319 <[email protected]>

* restore CompilerPasses.cpp

* handle copy

Signed-off-by: chentong319 <[email protected]>

* use attr

Signed-off-by: chentong319 <[email protected]>

* lit test

Signed-off-by: chentong319 <[email protected]>

* dealloc test

Signed-off-by: chentong319 <[email protected]>

* doc

Signed-off-by: chentong319 <[email protected]>

* format

Signed-off-by: chentong319 <[email protected]>

* ordering

Signed-off-by: chentong319 <[email protected]>

* format

Signed-off-by: chentong319 <[email protected]>

* change attr

Signed-off-by: chentong319 <[email protected]>

* doc

Signed-off-by: chentong319 <[email protected]>

* fix erase

Signed-off-by: chentong319 <[email protected]>

* doc

Signed-off-by: chentong319 <[email protected]>

* format

Signed-off-by: chentong319 <[email protected]>

* comment

Signed-off-by: chentong319 <[email protected]>

* response

Signed-off-by: chentong319 <[email protected]>

Signed-off-by: chentong319 <[email protected]>
Signed-off-by: Stella Stamenova <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants