forked from opensearch-project/ml-commons
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add tokenizer and sparse encoding (opensearch-project#1301)
* add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * remove special token Signed-off-by: xinyual <[email protected]> * add filter Signed-off-by: xinyual <[email protected]> * try empty model Signed-off-by: xinyual <[email protected]> * remove warm up Signed-off-by: xinyual <[email protected]> * try empty model Signed-off-by: xinyual <[email protected]> * add block Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * remove log Signed-off-by: xinyual <[email protected]> * remove pt file detect Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add functionName pipeline Signed-off-by: xinyual <[email protected]> * remove verify log Signed-off-by: xinyual <[email protected]> * skip special token in sparse encoding Signed-off-by: xinyual <[email protected]> * skip omit tokenize config Signed-off-by: xinyual <[email protected]> * skip omit tokenize config-change warm up logic Signed-off-by: xinyual <[email protected]> * reArch Signed-off-by: xinyual <[email protected]> * deduplicate Signed-off-by: xinyual <[email protected]> * omit ml config in sparse encoding Signed-off-by: xinyual <[email protected]> * add null config in warm up Signed-off-by: xinyual <[email protected]> * fix original test Signed-off-by: xinyual <[email protected]> * add tokenize ut half Signed-off-by: xinyual <[email protected]> * fix sparse encoding bug Signed-off-by: xinyual <[email protected]> * add UT for sparse encoding and tokenize Signed-off-by: xinyual <[email protected]> * remove useless framwork type Signed-off-by: xinyual <[email protected]> * common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java Signed-off-by: xinyual <[email protected]> * change key for tokenize Signed-off-by: xinyual <[email protected]> * reArch DLModel Signed-off-by: xinyual <[email protected]> * reArch DLModel again Signed-off-by: xinyual <[email protected]> * response format Signed-off-by: xinyual <[email protected]> * tokenize only one output Signed-off-by: xinyual <[email protected]> * clean sparse output Signed-off-by: xinyual <[email protected]> * clean sparse output Signed-off-by: xinyual <[email protected]> * change UT number Signed-off-by: xinyual <[email protected]> * remove useless predict code Signed-off-by: xinyual <[email protected]> * remove useless part Signed-off-by: xinyual <[email protected]> * change tokenize way Signed-off-by: xinyual <[email protected]> * reArch add textEmbedding model Signed-off-by: xinyual <[email protected]> * add tokenize logic Signed-off-by: xinyual <[email protected]> * add abstract Signed-off-by: xinyual <[email protected]> * clear code Signed-off-by: xinyual <[email protected]> * fix it class Signed-off-by: xinyual <[email protected]> * fix it class Signed-off-by: xinyual <[email protected]> * add IT file Signed-off-by: xinyual <[email protected]> * reformulate Signed-off-by: xinyual <[email protected]> * reformulate remote inference Signed-off-by: xinyual <[email protected]> * reformulate remote inference Signed-off-by: xinyual <[email protected]> * reformulate remote inference json and array Signed-off-by: xinyual <[email protected]> * verify Signed-off-by: xinyual <[email protected]> * undo string utils Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * add inner load Model Signed-off-by: xinyual <[email protected]> * rename variable Signed-off-by: xinyual <[email protected]> * add default for idf Signed-off-by: xinyual <[email protected]> * add ut for sparse encoding and tokenizer Signed-off-by: xinyual <[email protected]> * add close model Signed-off-by: xinyual <[email protected]> * change mock class Signed-off-by: xinyual <[email protected]> * remove buffer for sparse encoding output Signed-off-by: xinyual <[email protected]> * change tokenize model ready logic Signed-off-by: xinyual <[email protected]> * rewrite input functionName Signed-off-by: xinyual <[email protected]> * deduplicate Signed-off-by: xinyual <[email protected]> * change UT usage Signed-off-by: xinyual <[email protected]> * fix downloadAndSplit test Signed-off-by: xinyual <[email protected]> * fix Helper test Signed-off-by: xinyual <[email protected]> * remove meaningless change Signed-off-by: xinyual <[email protected]> * remove complie change Signed-off-by: xinyual <[email protected]> * rename Signed-off-by: xinyual <[email protected]> * fix typo error and simplify wrap code Signed-off-by: xinyual <[email protected]> * add comment Signed-off-by: xinyual <[email protected]> * using gson and remove useless close logic Signed-off-by: xinyual <[email protected]> * update comment and import problem Signed-off-by: xinyual <[email protected]> * add static idf name Signed-off-by: xinyual <[email protected]> * fix format problem Signed-off-by: xinyual <[email protected]> * extract an abstract model for sparse and dense sentence transformer translator Signed-off-by: xinyual <[email protected]> * fix typo error Signed-off-by: xinyual <[email protected]> * remove duplicate tokenizer file, fix import problem and add comment for tokenizer model Signed-off-by: xinyual <[email protected]> --------- Signed-off-by: xinyual <[email protected]>
- Loading branch information
Showing
34 changed files
with
1,101 additions
and
233 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.