Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport main] Add tokenizer and sparse encoding #1393

Merged
merged 1 commit into from
Sep 27, 2023

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport 31a4e25 from #1301

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* remove special token

Signed-off-by: xinyual <[email protected]>

* add filter

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* remove warm up

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* add block

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* remove log

Signed-off-by: xinyual <[email protected]>

* remove pt file detect

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add functionName pipeline

Signed-off-by: xinyual <[email protected]>

* remove verify log

Signed-off-by: xinyual <[email protected]>

* skip special token in sparse encoding

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config-change warm up logic

Signed-off-by: xinyual <[email protected]>

* reArch

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* omit ml config in sparse encoding

Signed-off-by: xinyual <[email protected]>

* add null config in warm up

Signed-off-by: xinyual <[email protected]>

* fix original test

Signed-off-by: xinyual <[email protected]>

* add tokenize ut half

Signed-off-by: xinyual <[email protected]>

* fix sparse encoding bug

Signed-off-by: xinyual <[email protected]>

* add UT for sparse encoding and tokenize

Signed-off-by: xinyual <[email protected]>

* remove useless framwork type

Signed-off-by: xinyual <[email protected]>

* common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java

Signed-off-by: xinyual <[email protected]>

* change key for tokenize

Signed-off-by: xinyual <[email protected]>

* reArch DLModel

Signed-off-by: xinyual <[email protected]>

* reArch DLModel again

Signed-off-by: xinyual <[email protected]>

* response format

Signed-off-by: xinyual <[email protected]>

* tokenize only one output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* change UT number

Signed-off-by: xinyual <[email protected]>

* remove useless predict code

Signed-off-by: xinyual <[email protected]>

* remove useless part

Signed-off-by: xinyual <[email protected]>

* change tokenize way

Signed-off-by: xinyual <[email protected]>

* reArch add textEmbedding model

Signed-off-by: xinyual <[email protected]>

* add tokenize logic

Signed-off-by: xinyual <[email protected]>

* add abstract

Signed-off-by: xinyual <[email protected]>

* clear code

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* add IT file

Signed-off-by: xinyual <[email protected]>

* reformulate

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference json and array

Signed-off-by: xinyual <[email protected]>

* verify

Signed-off-by: xinyual <[email protected]>

* undo string utils

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* add inner load Model

Signed-off-by: xinyual <[email protected]>

* rename variable

Signed-off-by: xinyual <[email protected]>

* add default for idf

Signed-off-by: xinyual <[email protected]>

* add ut for sparse encoding and tokenizer

Signed-off-by: xinyual <[email protected]>

* add close model

Signed-off-by: xinyual <[email protected]>

* change mock class

Signed-off-by: xinyual <[email protected]>

* remove buffer for sparse encoding output

Signed-off-by: xinyual <[email protected]>

* change tokenize model ready logic

Signed-off-by: xinyual <[email protected]>

* rewrite input functionName

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* change UT usage

Signed-off-by: xinyual <[email protected]>

* fix downloadAndSplit test

Signed-off-by: xinyual <[email protected]>

* fix Helper  test

Signed-off-by: xinyual <[email protected]>

* remove meaningless change

Signed-off-by: xinyual <[email protected]>

* remove complie change

Signed-off-by: xinyual <[email protected]>

* rename

Signed-off-by: xinyual <[email protected]>

* fix typo error and simplify wrap code

Signed-off-by: xinyual <[email protected]>

* add comment

Signed-off-by: xinyual <[email protected]>

* using gson and remove useless close logic

Signed-off-by: xinyual <[email protected]>

* update comment and import problem

Signed-off-by: xinyual <[email protected]>

* add static idf name

Signed-off-by: xinyual <[email protected]>

* fix format problem

Signed-off-by: xinyual <[email protected]>

* extract an abstract model for sparse and dense sentence transformer translator

Signed-off-by: xinyual <[email protected]>

* fix typo error

Signed-off-by: xinyual <[email protected]>

* remove duplicate tokenizer file, fix import problem and add comment for tokenizer model

Signed-off-by: xinyual <[email protected]>

---------

Signed-off-by: xinyual <[email protected]>
(cherry picked from commit 31a4e25)
@zane-neo zane-neo merged commit 44946da into main Sep 27, 2023
3 of 9 checks passed
@github-actions github-actions bot deleted the backport/backport-1301-to-main branch September 27, 2023 02:35
@opensearch-trigger-bot
Copy link
Contributor Author

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1393-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 44946dae7cb573921902757fff3173bb63e43a02
# Push it to GitHub
git push --set-upstream origin backport/backport-1393-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1393-to-2.x.

zane-neo pushed a commit to zane-neo/ml-commons that referenced this pull request Sep 27, 2023
…ch-project#1393)

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* remove special token

Signed-off-by: xinyual <[email protected]>

* add filter

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* remove warm up

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* add block

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* remove log

Signed-off-by: xinyual <[email protected]>

* remove pt file detect

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add functionName pipeline

Signed-off-by: xinyual <[email protected]>

* remove verify log

Signed-off-by: xinyual <[email protected]>

* skip special token in sparse encoding

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config-change warm up logic

Signed-off-by: xinyual <[email protected]>

* reArch

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* omit ml config in sparse encoding

Signed-off-by: xinyual <[email protected]>

* add null config in warm up

Signed-off-by: xinyual <[email protected]>

* fix original test

Signed-off-by: xinyual <[email protected]>

* add tokenize ut half

Signed-off-by: xinyual <[email protected]>

* fix sparse encoding bug

Signed-off-by: xinyual <[email protected]>

* add UT for sparse encoding and tokenize

Signed-off-by: xinyual <[email protected]>

* remove useless framwork type

Signed-off-by: xinyual <[email protected]>

* common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java

Signed-off-by: xinyual <[email protected]>

* change key for tokenize

Signed-off-by: xinyual <[email protected]>

* reArch DLModel

Signed-off-by: xinyual <[email protected]>

* reArch DLModel again

Signed-off-by: xinyual <[email protected]>

* response format

Signed-off-by: xinyual <[email protected]>

* tokenize only one output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* change UT number

Signed-off-by: xinyual <[email protected]>

* remove useless predict code

Signed-off-by: xinyual <[email protected]>

* remove useless part

Signed-off-by: xinyual <[email protected]>

* change tokenize way

Signed-off-by: xinyual <[email protected]>

* reArch add textEmbedding model

Signed-off-by: xinyual <[email protected]>

* add tokenize logic

Signed-off-by: xinyual <[email protected]>

* add abstract

Signed-off-by: xinyual <[email protected]>

* clear code

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* add IT file

Signed-off-by: xinyual <[email protected]>

* reformulate

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference json and array

Signed-off-by: xinyual <[email protected]>

* verify

Signed-off-by: xinyual <[email protected]>

* undo string utils

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* add inner load Model

Signed-off-by: xinyual <[email protected]>

* rename variable

Signed-off-by: xinyual <[email protected]>

* add default for idf

Signed-off-by: xinyual <[email protected]>

* add ut for sparse encoding and tokenizer

Signed-off-by: xinyual <[email protected]>

* add close model

Signed-off-by: xinyual <[email protected]>

* change mock class

Signed-off-by: xinyual <[email protected]>

* remove buffer for sparse encoding output

Signed-off-by: xinyual <[email protected]>

* change tokenize model ready logic

Signed-off-by: xinyual <[email protected]>

* rewrite input functionName

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* change UT usage

Signed-off-by: xinyual <[email protected]>

* fix downloadAndSplit test

Signed-off-by: xinyual <[email protected]>

* fix Helper  test

Signed-off-by: xinyual <[email protected]>

* remove meaningless change

Signed-off-by: xinyual <[email protected]>

* remove complie change

Signed-off-by: xinyual <[email protected]>

* rename

Signed-off-by: xinyual <[email protected]>

* fix typo error and simplify wrap code

Signed-off-by: xinyual <[email protected]>

* add comment

Signed-off-by: xinyual <[email protected]>

* using gson and remove useless close logic

Signed-off-by: xinyual <[email protected]>

* update comment and import problem

Signed-off-by: xinyual <[email protected]>

* add static idf name

Signed-off-by: xinyual <[email protected]>

* fix format problem

Signed-off-by: xinyual <[email protected]>

* extract an abstract model for sparse and dense sentence transformer translator

Signed-off-by: xinyual <[email protected]>

* fix typo error

Signed-off-by: xinyual <[email protected]>

* remove duplicate tokenizer file, fix import problem and add comment for tokenizer model

Signed-off-by: xinyual <[email protected]>

---------

Signed-off-by: xinyual <[email protected]>
(cherry picked from commit 31a4e25)

Co-authored-by: xinyual <[email protected]>
(cherry picked from commit 44946da)
xinyual pushed a commit to xinyual/ml-commons that referenced this pull request Sep 27, 2023
…ch-project#1393)

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* remove special token

Signed-off-by: xinyual <[email protected]>

* add filter

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* remove warm up

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* add block

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* remove log

Signed-off-by: xinyual <[email protected]>

* remove pt file detect

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add functionName pipeline

Signed-off-by: xinyual <[email protected]>

* remove verify log

Signed-off-by: xinyual <[email protected]>

* skip special token in sparse encoding

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config-change warm up logic

Signed-off-by: xinyual <[email protected]>

* reArch

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* omit ml config in sparse encoding

Signed-off-by: xinyual <[email protected]>

* add null config in warm up

Signed-off-by: xinyual <[email protected]>

* fix original test

Signed-off-by: xinyual <[email protected]>

* add tokenize ut half

Signed-off-by: xinyual <[email protected]>

* fix sparse encoding bug

Signed-off-by: xinyual <[email protected]>

* add UT for sparse encoding and tokenize

Signed-off-by: xinyual <[email protected]>

* remove useless framwork type

Signed-off-by: xinyual <[email protected]>

* common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java

Signed-off-by: xinyual <[email protected]>

* change key for tokenize

Signed-off-by: xinyual <[email protected]>

* reArch DLModel

Signed-off-by: xinyual <[email protected]>

* reArch DLModel again

Signed-off-by: xinyual <[email protected]>

* response format

Signed-off-by: xinyual <[email protected]>

* tokenize only one output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* change UT number

Signed-off-by: xinyual <[email protected]>

* remove useless predict code

Signed-off-by: xinyual <[email protected]>

* remove useless part

Signed-off-by: xinyual <[email protected]>

* change tokenize way

Signed-off-by: xinyual <[email protected]>

* reArch add textEmbedding model

Signed-off-by: xinyual <[email protected]>

* add tokenize logic

Signed-off-by: xinyual <[email protected]>

* add abstract

Signed-off-by: xinyual <[email protected]>

* clear code

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* add IT file

Signed-off-by: xinyual <[email protected]>

* reformulate

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference json and array

Signed-off-by: xinyual <[email protected]>

* verify

Signed-off-by: xinyual <[email protected]>

* undo string utils

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* add inner load Model

Signed-off-by: xinyual <[email protected]>

* rename variable

Signed-off-by: xinyual <[email protected]>

* add default for idf

Signed-off-by: xinyual <[email protected]>

* add ut for sparse encoding and tokenizer

Signed-off-by: xinyual <[email protected]>

* add close model

Signed-off-by: xinyual <[email protected]>

* change mock class

Signed-off-by: xinyual <[email protected]>

* remove buffer for sparse encoding output

Signed-off-by: xinyual <[email protected]>

* change tokenize model ready logic

Signed-off-by: xinyual <[email protected]>

* rewrite input functionName

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* change UT usage

Signed-off-by: xinyual <[email protected]>

* fix downloadAndSplit test

Signed-off-by: xinyual <[email protected]>

* fix Helper  test

Signed-off-by: xinyual <[email protected]>

* remove meaningless change

Signed-off-by: xinyual <[email protected]>

* remove complie change

Signed-off-by: xinyual <[email protected]>

* rename

Signed-off-by: xinyual <[email protected]>

* fix typo error and simplify wrap code

Signed-off-by: xinyual <[email protected]>

* add comment

Signed-off-by: xinyual <[email protected]>

* using gson and remove useless close logic

Signed-off-by: xinyual <[email protected]>

* update comment and import problem

Signed-off-by: xinyual <[email protected]>

* add static idf name

Signed-off-by: xinyual <[email protected]>

* fix format problem

Signed-off-by: xinyual <[email protected]>

* extract an abstract model for sparse and dense sentence transformer translator

Signed-off-by: xinyual <[email protected]>

* fix typo error

Signed-off-by: xinyual <[email protected]>

* remove duplicate tokenizer file, fix import problem and add comment for tokenizer model

Signed-off-by: xinyual <[email protected]>

---------

Signed-off-by: xinyual <[email protected]>
(cherry picked from commit 31a4e25)

Co-authored-by: xinyual <[email protected]>
(cherry picked from commit 44946da)
zane-neo pushed a commit that referenced this pull request Sep 27, 2023
* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* add tokenizer and sparse encoding

Signed-off-by: xinyual <[email protected]>

* remove special token

Signed-off-by: xinyual <[email protected]>

* add filter

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* remove warm up

Signed-off-by: xinyual <[email protected]>

* try empty model

Signed-off-by: xinyual <[email protected]>

* add block

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* remove log

Signed-off-by: xinyual <[email protected]>

* remove pt file detect

Signed-off-by: xinyual <[email protected]>

* add log

Signed-off-by: xinyual <[email protected]>

* add functionName pipeline

Signed-off-by: xinyual <[email protected]>

* remove verify log

Signed-off-by: xinyual <[email protected]>

* skip special token in sparse encoding

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config

Signed-off-by: xinyual <[email protected]>

* skip omit tokenize config-change warm up logic

Signed-off-by: xinyual <[email protected]>

* reArch

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* omit ml config in sparse encoding

Signed-off-by: xinyual <[email protected]>

* add null config in warm up

Signed-off-by: xinyual <[email protected]>

* fix original test

Signed-off-by: xinyual <[email protected]>

* add tokenize ut half

Signed-off-by: xinyual <[email protected]>

* fix sparse encoding bug

Signed-off-by: xinyual <[email protected]>

* add UT for sparse encoding and tokenize

Signed-off-by: xinyual <[email protected]>

* remove useless framwork type

Signed-off-by: xinyual <[email protected]>

* common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java

Signed-off-by: xinyual <[email protected]>

* change key for tokenize

Signed-off-by: xinyual <[email protected]>

* reArch DLModel

Signed-off-by: xinyual <[email protected]>

* reArch DLModel again

Signed-off-by: xinyual <[email protected]>

* response format

Signed-off-by: xinyual <[email protected]>

* tokenize only one output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* clean sparse output

Signed-off-by: xinyual <[email protected]>

* change UT number

Signed-off-by: xinyual <[email protected]>

* remove useless predict code

Signed-off-by: xinyual <[email protected]>

* remove useless part

Signed-off-by: xinyual <[email protected]>

* change tokenize way

Signed-off-by: xinyual <[email protected]>

* reArch add textEmbedding model

Signed-off-by: xinyual <[email protected]>

* add tokenize logic

Signed-off-by: xinyual <[email protected]>

* add abstract

Signed-off-by: xinyual <[email protected]>

* clear code

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* fix it class

Signed-off-by: xinyual <[email protected]>

* add IT file

Signed-off-by: xinyual <[email protected]>

* reformulate

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference

Signed-off-by: xinyual <[email protected]>

* reformulate remote inference json and array

Signed-off-by: xinyual <[email protected]>

* verify

Signed-off-by: xinyual <[email protected]>

* undo string utils

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* skip dummy model

Signed-off-by: xinyual <[email protected]>

* add inner load Model

Signed-off-by: xinyual <[email protected]>

* rename variable

Signed-off-by: xinyual <[email protected]>

* add default for idf

Signed-off-by: xinyual <[email protected]>

* add ut for sparse encoding and tokenizer

Signed-off-by: xinyual <[email protected]>

* add close model

Signed-off-by: xinyual <[email protected]>

* change mock class

Signed-off-by: xinyual <[email protected]>

* remove buffer for sparse encoding output

Signed-off-by: xinyual <[email protected]>

* change tokenize model ready logic

Signed-off-by: xinyual <[email protected]>

* rewrite input functionName

Signed-off-by: xinyual <[email protected]>

* deduplicate

Signed-off-by: xinyual <[email protected]>

* change UT usage

Signed-off-by: xinyual <[email protected]>

* fix downloadAndSplit test

Signed-off-by: xinyual <[email protected]>

* fix Helper  test

Signed-off-by: xinyual <[email protected]>

* remove meaningless change

Signed-off-by: xinyual <[email protected]>

* remove complie change

Signed-off-by: xinyual <[email protected]>

* rename

Signed-off-by: xinyual <[email protected]>

* fix typo error and simplify wrap code

Signed-off-by: xinyual <[email protected]>

* add comment

Signed-off-by: xinyual <[email protected]>

* using gson and remove useless close logic

Signed-off-by: xinyual <[email protected]>

* update comment and import problem

Signed-off-by: xinyual <[email protected]>

* add static idf name

Signed-off-by: xinyual <[email protected]>

* fix format problem

Signed-off-by: xinyual <[email protected]>

* extract an abstract model for sparse and dense sentence transformer translator

Signed-off-by: xinyual <[email protected]>

* fix typo error

Signed-off-by: xinyual <[email protected]>

* remove duplicate tokenizer file, fix import problem and add comment for tokenizer model

Signed-off-by: xinyual <[email protected]>

---------

Signed-off-by: xinyual <[email protected]>
(cherry picked from commit 31a4e25)

Co-authored-by: xinyual <[email protected]>
(cherry picked from commit 44946da)

Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants