-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cross encoder support #1615
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1615 +/- ##
============================================
+ Coverage 80.83% 80.98% +0.15%
- Complexity 4215 4246 +31
============================================
Files 404 408 +4
Lines 16977 17122 +145
Branches 1818 1835 +17
============================================
+ Hits 13723 13867 +144
+ Misses 2539 2534 -5
- Partials 715 721 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor question, but overall looks great!
common/src/main/java/org/opensearch/ml/common/dataset/TextSimilarityInputDataSet.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/opensearch/ml/common/dataset/TextSimilarityInputDataSet.java
Outdated
Show resolved
Hide resolved
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Thanks for working on this. Approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with one minor question. You can answer and resolve.)
common/src/main/java/org/opensearch/ml/common/dataset/TextSimilarityInputDataSet.java
Show resolved
Hide resolved
* add text similarity inputs and function name Signed-off-by: HenryL27 <[email protected]> * add text similarity cross encoder model Signed-off-by: HenryL27 <[email protected]> * add text similarity unit tests Signed-off-by: HenryL27 <[email protected]> * add text similarity input unittests Signed-off-by: HenryL27 <[email protected]> * add text similarity dataset unittests Signed-off-by: HenryL27 <[email protected]> * add function name annotation Signed-off-by: HenryL27 <[email protected]> * refactor API to use single query Signed-off-by: HenryL27 <[email protected]> * omit private from class vars Co-authored-by: Navneet Verma <[email protected]> Signed-off-by: HenryL27 <[email protected]> * change output name from logits to similarity Signed-off-by: HenryL27 <[email protected]> * hashify isDLModel Signed-off-by: HenryL27 <[email protected]> * add error message for non-torchscript cross encoders Signed-off-by: HenryL27 <[email protected]> * allow onnx, actually. Signed-off-by: HenryL27 <[email protected]> * apply spotless after rebase Signed-off-by: HenryL27 <[email protected]> * add unittest for new mlinput toXcontent clause Signed-off-by: HenryL27 <[email protected]> * static DLModels Signed-off-by: HenryL27 <[email protected]> * add tests and error message tweaks Signed-off-by: HenryL27 <[email protected]> * name test models w framework Signed-off-by: HenryL27 <[email protected]> * change pt->torch_script Signed-off-by: HenryL27 <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Navneet Verma <[email protected]> (cherry picked from commit 2761d7d)
* add text similarity inputs and function name Signed-off-by: HenryL27 <[email protected]> * add text similarity cross encoder model Signed-off-by: HenryL27 <[email protected]> * add text similarity unit tests Signed-off-by: HenryL27 <[email protected]> * add text similarity input unittests Signed-off-by: HenryL27 <[email protected]> * add text similarity dataset unittests Signed-off-by: HenryL27 <[email protected]> * add function name annotation Signed-off-by: HenryL27 <[email protected]> * refactor API to use single query Signed-off-by: HenryL27 <[email protected]> * omit private from class vars Co-authored-by: Navneet Verma <[email protected]> Signed-off-by: HenryL27 <[email protected]> * change output name from logits to similarity Signed-off-by: HenryL27 <[email protected]> * hashify isDLModel Signed-off-by: HenryL27 <[email protected]> * add error message for non-torchscript cross encoders Signed-off-by: HenryL27 <[email protected]> * allow onnx, actually. Signed-off-by: HenryL27 <[email protected]> * apply spotless after rebase Signed-off-by: HenryL27 <[email protected]> * add unittest for new mlinput toXcontent clause Signed-off-by: HenryL27 <[email protected]> * static DLModels Signed-off-by: HenryL27 <[email protected]> * add tests and error message tweaks Signed-off-by: HenryL27 <[email protected]> * name test models w framework Signed-off-by: HenryL27 <[email protected]> * change pt->torch_script Signed-off-by: HenryL27 <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Navneet Verma <[email protected]> (cherry picked from commit 2761d7d) Co-authored-by: HenryL27 <[email protected]>
@HenryL27 can you please share details of meta config for
error response:
|
* add text similarity inputs and function name Signed-off-by: HenryL27 <[email protected]> * add text similarity cross encoder model Signed-off-by: HenryL27 <[email protected]> * add text similarity unit tests Signed-off-by: HenryL27 <[email protected]> * add text similarity input unittests Signed-off-by: HenryL27 <[email protected]> * add text similarity dataset unittests Signed-off-by: HenryL27 <[email protected]> * add function name annotation Signed-off-by: HenryL27 <[email protected]> * refactor API to use single query Signed-off-by: HenryL27 <[email protected]> * omit private from class vars Co-authored-by: Navneet Verma <[email protected]> Signed-off-by: HenryL27 <[email protected]> * change output name from logits to similarity Signed-off-by: HenryL27 <[email protected]> * hashify isDLModel Signed-off-by: HenryL27 <[email protected]> * add error message for non-torchscript cross encoders Signed-off-by: HenryL27 <[email protected]> * allow onnx, actually. Signed-off-by: HenryL27 <[email protected]> * apply spotless after rebase Signed-off-by: HenryL27 <[email protected]> * add unittest for new mlinput toXcontent clause Signed-off-by: HenryL27 <[email protected]> * static DLModels Signed-off-by: HenryL27 <[email protected]> * add tests and error message tweaks Signed-off-by: HenryL27 <[email protected]> * name test models w framework Signed-off-by: HenryL27 <[email protected]> * change pt->torch_script Signed-off-by: HenryL27 <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Navneet Verma <[email protected]>
Description
Adds support for (huggingface) cross encoders to ml-commons. Uses a new function name (
TEXT_SIMILARITY
) which takes as input a list of text pairs and spits out 1-dimensional tensors representing the similarity of the items in each pair. E.g.yields
This was using the model
cross-encoder/ms-marco-TinyBERT-L-2-v2
- the config I used to upload it looked likeIssues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.