Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specification for SegmentMax-16 #28103

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

p-wysocki
Copy link
Contributor

Details:

Tickets:

Signed-off-by: p-wysocki <[email protected]>
Signed-off-by: p-wysocki <[email protected]>
@p-wysocki p-wysocki requested a review from a team as a code owner December 17, 2024 15:08
@p-wysocki p-wysocki requested review from zKulesza and removed request for a team December 17, 2024 15:08
@github-actions github-actions bot added the category: docs OpenVINO documentation label Dec 17, 2024
Copy link
Member

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us have EmbeddingSegmentsMax similar to EmbeddingSegmentsSum.
It should also have default index (defining default value for empty segment)

* Segment_4: ``[]``
* Segment_5: ``[data[6], data[7]]``

When there are no values in a segment, ``output[segment]`` is set to 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have default value for empty segments, otherwise, we will have additional computation graph (that is not trivial) to compute empty segments and replace zero value

Copy link
Contributor Author

@p-wysocki p-wysocki Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value seems to be 0, according to https://www.tensorflow.org/api_docs/python/tf/raw_ops/SegmentMax. I don't think we should expand the op on our own, especially since we only expect it to come from TF FE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also V2 https://www.tensorflow.org/api_docs/python/tf/raw_ops/SegmentMaxV2 where the default has been changed to numeric_limits<T>::lowest(). Adding attribute for default value seems to be a simple solution to support both cases, but to enable V2 at once we would also need to consider "num_segments" input.


* **1**: *data*

* **Description**: The numerical data on which SegmentMax operation will be performed. **Required.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define input shapes and output shape for each input and describe what dimensions are equal

Copy link
Contributor Author

@p-wysocki p-wysocki Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data can have any rank and dimensions, so it's described as ND of any numerical type. segment_ids are specified to be a 1D tensor of non-negative, sorted integer numbers of size equal to the size of the first dimension of the input tensor.

Could you please specify what's missing? I think the shapes are covered, but I may be missing something.

@p-wysocki
Copy link
Contributor Author

Let us have EmbeddingSegmentsMax similar to EmbeddingSegmentsSum.

EmbeddingSegmentX is for sparse inputs, while SegmentMax I'm implementing (to unlock some models) accepts dense inputs, so I don't think it should be added as EmbeddingSegmentMax. I'm moving the discussion to internal channels, if it results in changes, I'll apply them to the PR.

@p-wysocki p-wysocki requested a review from rkazants December 18, 2024 12:34
Signed-off-by: p-wysocki <[email protected]>

**Outputs**

* **1**: The output tensor of type *T* and the same shape as the ``input`` tensor with the exception for the first dimension, which is equal to the count of unique segment IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **1**: The output tensor of type *T* and the same shape as the ``input`` tensor with the exception for the first dimension, which is equal to the count of unique segment IDs.
* **1**: The output tensor of type *T* and almost the same shape as the ``data`` input tensor with the exception for the first dimension, which is equal to the count of unique segment IDs (calculated as ``max(segment_ids) + 1``).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead almost use,

The output tensor has same rank and dimensions as the ``data`` input tensor except first dimension which is calculated as ``max(segment_ids) + 1`` 

?

* Segment_4: ``[]``
* Segment_5: ``[data[6], data[7]]``

When there are no values in a segment, ``output[segment]`` is set to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also V2 https://www.tensorflow.org/api_docs/python/tf/raw_ops/SegmentMaxV2 where the default has been changed to numeric_limits<T>::lowest(). Adding attribute for default value seems to be a simple solution to support both cases, but to enable V2 at once we would also need to consider "num_segments" input.

Comment on lines +43 to +51
* **2**: *segment_ids*

* **Description**: Controls how the data is divided into segments. **Required.**
* **Range of values**: 1D tensor of non-negative, sorted integer numbers. Its size is equal to the size of the first dimension of the input tensor.
* **Type**: *T_IDX*

**Outputs**

* **1**: The output tensor of type *T* and the same shape as the ``input`` tensor with the exception for the first dimension, which is equal to the count of unique segment IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style of the "Inputs" section description follows rather the "Attributes" style.
Consider alignment with other spec documents.

* **2**: *segment_ids*

* **Description**: Controls how the data is divided into segments. **Required.**
* **Range of values**: 1D tensor of non-negative, sorted integer numbers. Its size is equal to the size of the first dimension of the input tensor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that unsorted segment_ids may lead to error or undefined behavior (implementation specific, depends on the hardware).
Should we specify a common behavior for OV op?
Can be clarified at the plugin implementation stage.

* **2**: *segment_ids*

* **Description**: Controls how the data is divided into segments. **Required.**
* **Range of values**: 1D tensor of non-negative, sorted integer numbers. Its size is equal to the size of the first dimension of the input tensor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Range of values**: 1D tensor of non-negative, sorted integer numbers. Its size is equal to the size of the first dimension of the input tensor.
* **Range of values**: 1D tensor of non-negative, sorted integer numbers. Its size is equal to the size of the first dimension of the ``data`` input tensor.

Comment on lines +1 to +2
SegmentMax
===================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SegmentMax
===================
SegmentMax
==========

Should number = be same as heading length?


**Outputs**

* **1**: The output tensor of type *T* and the same shape as the ``input`` tensor with the exception for the first dimension, which is equal to the count of unique segment IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead almost use,

The output tensor has same rank and dimensions as the ``data`` input tensor except first dimension which is calculated as ``max(segment_ids) + 1`` 

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants