Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Supporting sparse semantic retrieval in neural search #343

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport 7bef7a0 from #333

* sparse mapper field and query builder

Signed-off-by: zhichao-aws <[email protected]>

* fix typo

Signed-off-by: zhichao-aws <[email protected]>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <[email protected]>

* Fix compilation failure issue

Signed-off-by: zane-neo <[email protected]>

* Add more UTs

Signed-off-by: zane-neo <[email protected]>

* add sparse encoding processor

Signed-off-by: xinyual <[email protected]>

* add sparse encoding processor

Signed-off-by: xinyual <[email protected]>

* remove guava in gradle

Signed-off-by: xinyual <[email protected]>

* modify access control

Signed-off-by: xinyual <[email protected]>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <[email protected]>

* Fix compilation failure issue

Signed-off-by: zane-neo <[email protected]>

* change output logic

Signed-off-by: xinyual <[email protected]>

* create abstract

Signed-off-by: xinyual <[email protected]>

* create abstract proccesor

Signed-off-by: xinyual <[email protected]>

* add abstract class

Signed-off-by: xinyual <[email protected]>

* remove duplicate code

Signed-off-by: xinyual <[email protected]>

* remove duplicate code

Signed-off-by: xinyual <[email protected]>

* remove dl process

Signed-off-by: xinyual <[email protected]>

* move static to abstract class

Signed-off-by: xinyual <[email protected]>

* update query rewrite logic

Signed-off-by: zhichao-aws <[email protected]>

* modify header

Signed-off-by: zhichao-aws <[email protected]>

* merge conflict

Signed-off-by: xinyual <[email protected]>

* delete index mapper, change to rank_features

Signed-off-by: zhichao-aws <[email protected]>

* remove unused import

Signed-off-by: zhichao-aws <[email protected]>

* list return result

Signed-off-by: zhichao-aws <[email protected]>

* refactor type and listTypeNestedMapKey, tidy

Signed-off-by: zhichao-aws <[email protected]>

* forbid nested input. tidy.

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* enable nested

Signed-off-by: zhichao-aws <[email protected]>

* fix test

Signed-off-by: zhichao-aws <[email protected]>

* Add ut it to sparse encoding processor (#6)

* fix original UT problem

Signed-off-by: xinyual <[email protected]>

* add UT IT

Signed-off-by: xinyual <[email protected]>

* add more UT

Signed-off-by: xinyual <[email protected]>

* add more ut

Signed-off-by: xinyual <[email protected]>

* fix typo error

Signed-off-by: xinyual <[email protected]>

---------

Signed-off-by: xinyual <[email protected]>

* utils, tidy

Signed-off-by: zhichao-aws <[email protected]>

* rename to sparse_encoding query

Signed-off-by: zhichao-aws <[email protected]>

* add validation and ut

Signed-off-by: zhichao-aws <[email protected]>

* sparse encoding query builder ut

Signed-off-by: zhichao-aws <[email protected]>

* rename

Signed-off-by: zhichao-aws <[email protected]>

* UT for utils

Signed-off-by: zhichao-aws <[email protected]>

* enrich sparse encoding IT mappings

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* add integ test

Signed-off-by: zhichao-aws <[email protected]>

* rename resource file

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* remove BoundedLinearQuery and TokenScoreUpperBound

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add delta to loose the equal

Signed-off-by: zhichao-aws <[email protected]>

* move SparseEncodingQueryBuilder to upper level path

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Co-authored-by: zane-neo <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>

* Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java

Co-authored-by: zane-neo <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>

* restore gradle.propeties

Signed-off-by: zhichao-aws <[email protected]>

* add release notes

Signed-off-by: zhichao-aws <[email protected]>

* change field modifier to private for NLPProcessor

Signed-off-by: zhichao-aws <[email protected]>

* add comments

Signed-off-by: zhichao-aws <[email protected]>

* use StringUtils to check

Signed-off-by: zhichao-aws <[email protected]>

* null check

Signed-off-by: zhichao-aws <[email protected]>

* modify changelog

Signed-off-by: zhichao-aws <[email protected]>

* nit

Signed-off-by: zhichao-aws <[email protected]>

* nit

Signed-off-by: zhichao-aws <[email protected]>

* remove query tokens from user interface

Signed-off-by: zhichao-aws <[email protected]>

* fix test

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* update function name

Signed-off-by: zhichao-aws <[email protected]>

* add javadoc

Signed-off-by: zhichao-aws <[email protected]>

* remove debug log including inference result

Signed-off-by: zhichao-aws <[email protected]>

* make query text and model id required

Signed-off-by: zhichao-aws <[email protected]>

* minor changes based on comments

Signed-off-by: zhichao-aws <[email protected]>

* add locale to String.format

Signed-off-by: zhichao-aws <[email protected]>

* update mock model url

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zane-neo <[email protected]>
Signed-off-by: xinyual <[email protected]>
Co-authored-by: zane-neo <[email protected]>
Co-authored-by: xinyual <[email protected]>
(cherry picked from commit 7bef7a0)
* fix apache http version

Signed-off-by: zhichao-aws <[email protected]>

* add import

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
@zhichao-aws
Copy link
Member

waiting opensearch-project/ml-commons#1398 get merged to run sparse encoding integ test

listener.onResponse(vector);
}, e -> {
if (RetryUtil.shouldRetry(e, retryTime)) {
final int retryTimeAdd = retryTime + 1;
inferenceSentencesWithRetry(targetResponseFilters, modelId, inputText, retryTimeAdd, listener);
retryableInferenceSentencesWithVectorResult(targetResponseFilters, modelId, inputText, retryTimeAdd, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you name function like that? An function should be a verb phrase.

@zane-neo zane-neo merged commit 415082e into 2.x Sep 27, 2023
12 checks passed
@github-actions github-actions bot deleted the backport/backport-333-to-2.x branch September 27, 2023 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants