Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segment sum Op to relay and 7 corresponding TF Ops , fix scatter_add dynamic bug #7562

Merged
merged 26 commits into from
Mar 4, 2021

Conversation

codeislife99
Copy link
Contributor

@codeislife99 codeislife99 commented Mar 2, 2021

This PR adds the Segment Sum Op which will serve as a generic op for multiple framework specific ops

  1. Tensorflow -- tf.math.segment_sum, tf.sparse.segment_sum

  2. Caffe -- sparse length sum

  3. PyTorch -- Embedding Bag

Since this PR uses scatter_add , it also makes some small changes which make it work for dynamic inputs.

@codeislife99
Copy link
Contributor Author

codeislife99 commented Mar 2, 2021

@masahi @tkonolige @mbrookhart @ymwangg PTAL.

@masahi
Copy link
Member

masahi commented Mar 2, 2021

Nice, are you going to add frontend?

@codeislife99
Copy link
Contributor Author

Yes, do you prefer I add it in this PR or the next one ? I want to add frontends for multiple framework ops based on this relay op.

@masahi
Copy link
Member

masahi commented Mar 2, 2021

Yes, I think it's better to add frontends (TF, PT) to make sure they are supported by this op.

@codeislife99
Copy link
Contributor Author

@masahi I have added 3 TF Ops to the frontend, all of which use this op. Let me know if that's enough.

@codeislife99 codeislife99 changed the title Add segment sum Op Add segment sum Op to relay and corresponding TF Ops , fix scatter_add dynamic bug Mar 2, 2021
@masahi
Copy link
Member

masahi commented Mar 2, 2021

Can you also try PT EmbeddingBag?

@codeislife99
Copy link
Contributor Author

codeislife99 commented Mar 2, 2021

Hey @masahi , upon closely reading the Embedding Bag documentation, it seems that: (Referencing the tf.sparse.segment_sum documentation )

  1. When inputs is 1D , and offsets is given, we simply have to convert offsets into segment_ids and inputs would directly be indices. To convert offsets into segment_ids we have to use a combination of adjacent_difference, arange and repeat : For example: offsets of [0,4] with size of 10 would translate to [0, 0, 0, 0, 1, 1, 1, 1, 1, 1] and relay.segment_sum could be called on it.
  2. When inputs is 2D, its more easier where we convert the input size [B,N] to [0,0,0... Ntimes ... ,1,1,1,... N Times ..... B-1,B-1, .... N Times] This would require arange and repeat. And then use flattened inputs as indices and the converted input size (now 1D) as segment_ids. Then relay.segment_sum could be called on it.

Now all of these ops exist except adjacent_difference although @ymwangg wrote an IR for it. Is it possible to call it in any form or if not , do you think its worthwhile to make it an op ? Numpy equivalent

Let me know your thoughts on the best way to reuse existing code. After that implementation would be only a trivial few lines.

@masahi
Copy link
Member

masahi commented Mar 2, 2021

Ok lets do embedding bag later, then.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. A couple documentation improvements would be nice though.

python/tvm/relay/op/transform.py Outdated Show resolved Hide resolved
python/tvm/relay/op/transform.py Show resolved Hide resolved
python/tvm/relay/op/transform.py Outdated Show resolved Hide resolved
python/tvm/relay/op/transform.py Outdated Show resolved Hide resolved
tests/python/relay/test_op_level3.py Outdated Show resolved Hide resolved
tests/python/relay/test_op_level3.py Outdated Show resolved Hide resolved
@masahi masahi self-assigned this Mar 2, 2021
@codeislife99
Copy link
Contributor Author

@tkonolige I have finished addressing your comments, please re-review

@codeislife99
Copy link
Contributor Author

Actually I would like to add another related op in this PR. I will ping you after I am done with that.

@codeislife99 codeislife99 changed the title Add segment sum Op to relay and corresponding TF Ops , fix scatter_add dynamic bug Add segment sum Op to relay and 5 corresponding TF Ops , fix scatter_add dynamic bug Mar 3, 2021
@codeislife99 codeislife99 changed the title Add segment sum Op to relay and 5 corresponding TF Ops , fix scatter_add dynamic bug Add segment sum Op to relay and 7 corresponding TF Ops , fix scatter_add dynamic bug Mar 3, 2021
@codeislife99
Copy link
Contributor Author

@tkonolige @masahi . I am done with the PR Please review/ re-review.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor comments

python/tvm/relay/op/transform.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Could you add a direct test for scatter_add with dynamic inputs? That would help identifying problems in the future.

@codeislife99
Copy link
Contributor Author

codeislife99 commented Mar 3, 2021

@tkonolige int64 is not allowed with tf sparse ops, I put it on the relay op tests and the tf math ops.
@mbrookhart Yes, thanks, added them now.
Please re-review.

assert len(inputs) == 3, "There should be 3 input tensors"
data = _op.take(inputs[0], inputs[1], axis=0)
return _op.segment_sum(data, inputs[2])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for now, but we definitely want a fused implementation here, just like TF/PT/C2 does. I don't expect this would work for a huge embedding table people want to use in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. When you say a "fused implementation" , do you mean that all of it happens in a single ir ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples of what a "fused implementation" is ? Does this mean that in a fused implementation, the frontend will always just be a one liner ?

Copy link
Contributor Author

@codeislife99 codeislife99 Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I understand we must do the take and the addition from segment_sum simultaneously for performance. So a fused implementation in that case would be a new op ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "fused" I meant we shouldn't materialize the result of take, which can be huge. In a fused implementation, we need to look up indices and accumulate the sum on the fly. This is why PT has EmbeddingBag op, see their doc https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html.

Yes, a complicated op like this will not likely be feasible if we rely only on Relay-level op fusion. We need a dedicated sparse_segment_sum TOPI/Relay op.

Copy link
Member

@masahi masahi Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he meant that scatter_nd exactly realizes fused take and segment_sum above. I haven't put deep thought into this but it made sense to me. But I remember parallelizing scatter_nd looked harder than scatter_add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am having a bit of a mind block understanding how take and segment_sum is essentially scatter_nd, do anyone of you mind writing small pseudocode ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, I believe the take is necessary if we are using scatter_nd. We could make a more generic version of scatter_nd and gather_nd that has indices in both the input and output buffers. That would cover this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll merge this as it is then.

@masahi masahi merged commit 83ab234 into apache:main Mar 4, 2021
@masahi
Copy link
Member

masahi commented Mar 4, 2021

Thanks @codeislife99 @tkonolige @mbrookhart

@codeislife99 codeislife99 deleted the segment_sum branch March 4, 2021 19:45
@codeislife99 codeislife99 restored the segment_sum branch March 4, 2021 19:45
@codeislife99 codeislife99 deleted the segment_sum branch March 4, 2021 20:40
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…add dynamic bug (apache#7562)

* Add segment sum Op

* Remove unnecessary

* Documentation

* Black

* Add GPU

* Uncomment

* Add documentation

* Add dynamic tests

* Add TF Op

* Add Sparse Segment Sum

* Add test coverage

* PR Comments

* Int64 tests

* Add SparseSegmentSqrtN

* Add SparseSegmentSqrtNOp

* Deduplicate code

* Add SparseSegmentMean

* Parametrize Tests

* Remove

* Modularize

* Black

* Modularize Code

* Pylint

* PR Comments

* Add scatter add tests

* Remove Test

Co-authored-by: Ubuntu <[email protected]>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
…add dynamic bug (apache#7562)

* Add segment sum Op

* Remove unnecessary

* Documentation

* Black

* Add GPU

* Uncomment

* Add documentation

* Add dynamic tests

* Add TF Op

* Add Sparse Segment Sum

* Add test coverage

* PR Comments

* Int64 tests

* Add SparseSegmentSqrtN

* Add SparseSegmentSqrtNOp

* Deduplicate code

* Add SparseSegmentMean

* Parametrize Tests

* Remove

* Modularize

* Black

* Modularize Code

* Pylint

* PR Comments

* Add scatter add tests

* Remove Test

Co-authored-by: Ubuntu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants