Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support analyzer-based neural sparse query & build BERT tokenizer as pre-defined tokenizer #1061

Closed
wants to merge 38 commits into from

Conversation

zhichao-aws
Copy link
Member

Description

Support analyzer-based neural sparse query & build BERT tokenizer as pre-defined tokenizer

Related Issues

Resolves #1052

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

zhichao-aws and others added 30 commits September 9, 2024 15:40
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
This is required to enable rescoring for on disk mode indices

Signed-off-by: Tejas Shah <[email protected]>
…) (opensearch-project#894)

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 22f36c5)

Co-authored-by: Martin Gaievski <[email protected]>
* Add coverage for NeuralSearch class

Signed-off-by: Daniel Widdis <[email protected]>
* chore: bump guide java version

Signed-off-by: Ian Menendez <[email protected]>

* chore: java 21 on title and other parts

Signed-off-by: Ian Menendez <[email protected]>

---------

Signed-off-by: Ian Menendez <[email protected]>
* Adding 2.18 snapshot version to bwc workflow

Signed-off-by: Martin Gaievski <[email protected]>

---------

Signed-off-by: Martin Gaievski <[email protected]>
Co-authored-by: Varun Jain <[email protected]>
…search-project#907)

* feat: add ignore missing field to text chunking processor

Signed-off-by: Ian Menendez <[email protected]>
Co-authored-by: Ian Menendez <[email protected]>
* Initial version for rescorer

Signed-off-by: Martin Gaievski <[email protected]>
* Adding non empty check before filling in result

Signed-off-by: wangdongyu.danny <[email protected]>
…rs (opensearch-project#928)

* Adds documentation to help fix local Lucene code errors

Signed-off-by: Brian Flores <[email protected]>
…ject#932)

* Implements initial By Field re rank

Signed-off-by: Brian Flores <[email protected]>
* update maintainers.md

Signed-off-by: Varun Jain <[email protected]>

* update codeowners file

Signed-off-by: Varun Jain <[email protected]>

---------

Signed-off-by: Varun Jain <[email protected]>
…es (opensearch-project#977)

* Refactored HybridQueryPhaseSearcherTests to remove knn specific classes

Signed-off-by: Owais <[email protected]>

* Refactored HybridQueryTests

Signed-off-by: Owais <[email protected]>

---------

Signed-off-by: Owais <[email protected]>
* Fixed flaky hybrid collector test

Signed-off-by: Owais <[email protected]>

* Removed explicit exception

Signed-off-by: Owais <[email protected]>

---------

Signed-off-by: Owais <[email protected]>
* Update codecov to check during PR

Signed-off-by: Peter Zhu <[email protected]>
…t#1002)

* Make test security github action also support AL2 on node20

Signed-off-by: Peter Zhu <[email protected]>
ryanbogan and others added 8 commits January 6, 2025 16:09
* Change CI to use pull_request

Signed-off-by: Ryan Bogan <[email protected]>
…roject#1014)

* Added Explainability support for hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
…project#988)

* add impl

Signed-off-by: zhichao-aws <[email protected]>

* add UT

Signed-off-by: zhichao-aws <[email protected]>

* rename pruneType; UT

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* ut

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* change on 2-phase

Signed-off-by: zhichao-aws <[email protected]>

* UT

Signed-off-by: zhichao-aws <[email protected]>

* it

Signed-off-by: zhichao-aws <[email protected]>

* rename

Signed-off-by: zhichao-aws <[email protected]>

* enhance: more detailed error message

Signed-off-by: zhichao-aws <[email protected]>

* refactor to prune and split

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* fix UT cov

Signed-off-by: zhichao-aws <[email protected]>

* address review comments

Signed-off-by: zhichao-aws <[email protected]>

* enlarge score diff range

Signed-off-by: zhichao-aws <[email protected]>

* address comments: check lowScores non null instead of flag

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
…ch-project#1041)

* Allow empty string for field in field map

Signed-off-by: Yizhe Liu <[email protected]>

* Allow empty string when validation

Signed-off-by: Yizhe Liu <[email protected]>

* Add to change log

Signed-off-by: Yizhe Liu <[email protected]>

* Update CHANGELOG to: Support empty string for fields in text embedding processor

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
@github-actions github-actions bot added the RFC label Jan 6, 2025
@zhichao-aws
Copy link
Member Author

close due to the wrong source branch selection

@zhichao-aws zhichao-aws closed this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal][RFC] Support analyzer-based neural sparse query & build BERT tokenizer as pre-defined tokenizer