-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport to 2.x]Add tokenizer and sparse encoding (#1301) (#1393) #1398
Closed
zane-neo
wants to merge
1
commit into
opensearch-project:2.x
from
zane-neo:backport/backport-1393-to-2.x
Closed
[Backport to 2.x]Add tokenizer and sparse encoding (#1301) (#1393) #1398
zane-neo
wants to merge
1
commit into
opensearch-project:2.x
from
zane-neo:backport/backport-1393-to-2.x
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ch-project#1393) * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * remove special token Signed-off-by: xinyual <[email protected]> * add filter Signed-off-by: xinyual <[email protected]> * try empty model Signed-off-by: xinyual <[email protected]> * remove warm up Signed-off-by: xinyual <[email protected]> * try empty model Signed-off-by: xinyual <[email protected]> * add block Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * remove log Signed-off-by: xinyual <[email protected]> * remove pt file detect Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add functionName pipeline Signed-off-by: xinyual <[email protected]> * remove verify log Signed-off-by: xinyual <[email protected]> * skip special token in sparse encoding Signed-off-by: xinyual <[email protected]> * skip omit tokenize config Signed-off-by: xinyual <[email protected]> * skip omit tokenize config-change warm up logic Signed-off-by: xinyual <[email protected]> * reArch Signed-off-by: xinyual <[email protected]> * deduplicate Signed-off-by: xinyual <[email protected]> * omit ml config in sparse encoding Signed-off-by: xinyual <[email protected]> * add null config in warm up Signed-off-by: xinyual <[email protected]> * fix original test Signed-off-by: xinyual <[email protected]> * add tokenize ut half Signed-off-by: xinyual <[email protected]> * fix sparse encoding bug Signed-off-by: xinyual <[email protected]> * add UT for sparse encoding and tokenize Signed-off-by: xinyual <[email protected]> * remove useless framwork type Signed-off-by: xinyual <[email protected]> * common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java Signed-off-by: xinyual <[email protected]> * change key for tokenize Signed-off-by: xinyual <[email protected]> * reArch DLModel Signed-off-by: xinyual <[email protected]> * reArch DLModel again Signed-off-by: xinyual <[email protected]> * response format Signed-off-by: xinyual <[email protected]> * tokenize only one output Signed-off-by: xinyual <[email protected]> * clean sparse output Signed-off-by: xinyual <[email protected]> * clean sparse output Signed-off-by: xinyual <[email protected]> * change UT number Signed-off-by: xinyual <[email protected]> * remove useless predict code Signed-off-by: xinyual <[email protected]> * remove useless part Signed-off-by: xinyual <[email protected]> * change tokenize way Signed-off-by: xinyual <[email protected]> * reArch add textEmbedding model Signed-off-by: xinyual <[email protected]> * add tokenize logic Signed-off-by: xinyual <[email protected]> * add abstract Signed-off-by: xinyual <[email protected]> * clear code Signed-off-by: xinyual <[email protected]> * fix it class Signed-off-by: xinyual <[email protected]> * fix it class Signed-off-by: xinyual <[email protected]> * add IT file Signed-off-by: xinyual <[email protected]> * reformulate Signed-off-by: xinyual <[email protected]> * reformulate remote inference Signed-off-by: xinyual <[email protected]> * reformulate remote inference Signed-off-by: xinyual <[email protected]> * reformulate remote inference json and array Signed-off-by: xinyual <[email protected]> * verify Signed-off-by: xinyual <[email protected]> * undo string utils Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * add inner load Model Signed-off-by: xinyual <[email protected]> * rename variable Signed-off-by: xinyual <[email protected]> * add default for idf Signed-off-by: xinyual <[email protected]> * add ut for sparse encoding and tokenizer Signed-off-by: xinyual <[email protected]> * add close model Signed-off-by: xinyual <[email protected]> * change mock class Signed-off-by: xinyual <[email protected]> * remove buffer for sparse encoding output Signed-off-by: xinyual <[email protected]> * change tokenize model ready logic Signed-off-by: xinyual <[email protected]> * rewrite input functionName Signed-off-by: xinyual <[email protected]> * deduplicate Signed-off-by: xinyual <[email protected]> * change UT usage Signed-off-by: xinyual <[email protected]> * fix downloadAndSplit test Signed-off-by: xinyual <[email protected]> * fix Helper test Signed-off-by: xinyual <[email protected]> * remove meaningless change Signed-off-by: xinyual <[email protected]> * remove complie change Signed-off-by: xinyual <[email protected]> * rename Signed-off-by: xinyual <[email protected]> * fix typo error and simplify wrap code Signed-off-by: xinyual <[email protected]> * add comment Signed-off-by: xinyual <[email protected]> * using gson and remove useless close logic Signed-off-by: xinyual <[email protected]> * update comment and import problem Signed-off-by: xinyual <[email protected]> * add static idf name Signed-off-by: xinyual <[email protected]> * fix format problem Signed-off-by: xinyual <[email protected]> * extract an abstract model for sparse and dense sentence transformer translator Signed-off-by: xinyual <[email protected]> * fix typo error Signed-off-by: xinyual <[email protected]> * remove duplicate tokenizer file, fix import problem and add comment for tokenizer model Signed-off-by: xinyual <[email protected]> --------- Signed-off-by: xinyual <[email protected]> (cherry picked from commit 31a4e25) Co-authored-by: xinyual <[email protected]> (cherry picked from commit 44946da)
zane-neo
had a problem deploying
to
ml-commons-cicd-env
September 27, 2023 05:09 — with
GitHub Actions
Failure
zane-neo
had a problem deploying
to
ml-commons-cicd-env
September 27, 2023 05:09 — with
GitHub Actions
Error
zane-neo
had a problem deploying
to
ml-commons-cicd-env
September 27, 2023 05:09 — with
GitHub Actions
Error
zane-neo
had a problem deploying
to
ml-commons-cicd-env
September 27, 2023 05:09 — with
GitHub Actions
Failure
model-collapse
approved these changes
Sep 27, 2023
zane-neo
temporarily deployed
to
ml-commons-cicd-env
September 27, 2023 06:58 — with
GitHub Actions
Inactive
zane-neo
had a problem deploying
to
ml-commons-cicd-env
September 27, 2023 06:58 — with
GitHub Actions
Error
zane-neo
temporarily deployed
to
ml-commons-cicd-env
September 27, 2023 06:58 — with
GitHub Actions
Inactive
zane-neo
had a problem deploying
to
ml-commons-cicd-env
September 27, 2023 06:58 — with
GitHub Actions
Failure
Codecov Report
@@ Coverage Diff @@
## 2.x #1398 +/- ##
============================================
- Coverage 78.28% 78.24% -0.04%
- Complexity 2272 2302 +30
============================================
Files 190 195 +5
Lines 9283 9370 +87
Branches 909 917 +8
============================================
+ Hits 7267 7332 +65
- Misses 1608 1624 +16
- Partials 408 414 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Backport will be done in this PR: #1399 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
Signed-off-by: xinyual [email protected]
(cherry picked from commit 31a4e25)
Co-authored-by: xinyual [email protected]
(cherry picked from commit 44946da)
Description
[Describe what this change achieves]
Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.