Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for post filter in hybrid query #633

Merged
merged 9 commits into from
Mar 14, 2024

Conversation

vibrantvarun
Copy link
Member

@vibrantvarun vibrantvarun commented Mar 13, 2024

Description

This PR adds Support for Post filter in hybrid query.

The crux of the logic implemented in this PR is when the search request has post filter query in that, then instead of returning plain HybridTopDocCollector Object we would be returning FilteredCollectorObject which will have HybridTopDocCollector object wrapped underneath it.

Also This PR has unit tests and Integration tests for the post_filter scenario.

Testing example on local machine

  1. Create index
http://localhost:9200/movies

{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "index.search.concurrent_segment_search.enabled": false
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "stock":{
        "type": "integer"
      }
    }
  }
}
  1. Create pipeline
http://localhost:9200/_search/pipeline/nlp-search-pipeline

{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.6,
              0.1
            ]
          }
        }
      }
    }
  ]
}

  1. Add documents
{
  "name": "Dunes part 1",
  "stock":2
}
{
  "name": "Dunes part 2",
  "stock":26
}
{
  "name": "Mission Impossible 1",
  "stock":234
}
{
  "name": "Mission Impossible 2",
  "stock":238
}
{
  "name": "Avengers",
  "stock":300
}
  1. search without post_filter
http://localhost:9200/movies/_search?search_pipeline=nlp-search-pipeline

{
   "query": {
      "hybrid":{
          "queries":[
              { 
                "match":{
                    "name": "mission"
                }
              },
              {
                "term":{
                    "name":{
                        "value":"part"
                    }
                }
              },
              {
              "range": {
                 "stock": {
                      "gte": 200,
                      "lte": 400
                  }
                }
              }
          ]
      }

  }
}


{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 0.70000005,
        "hits": [
            {
                "_index": "movies",
                "_id": "2",
                "_score": 0.70000005,
                "_source": {
                    "name": "Mission Impossible 1",
                    "stock": 234
                }
            },
            {
                "_index": "movies",
                "_id": "3",
                "_score": 0.70000005,
                "_source": {
                    "name": "Mission Impossible 2",
                    "stock": 238
                }
            },
            {
                "_index": "movies",
                "_id": "4",
                "_score": 0.4,
                "_source": {
                    "name": "Avengers",
                    "stock": 300
                }
            },
            {
                "_index": "movies",
                "_id": "1",
                "_score": 0.3,
                "_source": {
                    "name": "Dunes part 1",
                    "stock": 2
                }
            },
            {
                "_index": "movies",
                "_id": "5",
                "_score": 0.3,
                "_source": {
                    "name": "Dunes part 2",
                    "stock": 26
                }
            }
        ]
    }
}
  1. Search with post_filter
{
   "query": {
      "hybrid":{
          "queries":[
              { 
                "match":{
                    "name": "mission"
                }
              },
              {
                "term":{
                    "name":{
                        "value":"part"
                    }
                }
              },
              {
              "range": {
                 "stock": {
                      "gte": 200,
                      "lte": 400
                  }
                }
              }
          ]
      }

  },
  "post_filter":{
      "range": {
                 "stock": {
                      "gte": 230,
                      "lte": 400
                  }
       }
  }
}

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.70000005,
        "hits": [
            {
                "_index": "movies",
                "_id": "2",
                "_score": 0.70000005,
                "_source": {
                    "name": "Mission Impossible 1",
                    "stock": 234
                }
            },
            {
                "_index": "movies",
                "_id": "3",
                "_score": 0.70000005,
                "_source": {
                    "name": "Mission Impossible 2",
                    "stock": 238
                }
            },
            {
                "_index": "movies",
                "_id": "4",
                "_score": 0.4,
                "_source": {
                    "name": "Avengers",
                    "stock": 300
                }
            }
        ]
    }
}

Issues Resolved

508

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Varun Jain <[email protected]>
@vibrantvarun vibrantvarun changed the title Post Filter for hybrid query Support for post filter in hybrid query Mar 13, 2024
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code looks good, left just few comments

Signed-off-by: Varun Jain <[email protected]>
@@ -216,9 +247,10 @@ public HybridCollectorNonConcurrentManager(
HitsThresholdChecker hitsThresholdChecker,
boolean isSingleShard,
int trackTotalHitsUpTo,
SortAndFormats sortAndFormats
SortAndFormats sortAndFormats,
Weight filteringWeight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this parameter as Optional optionalFilterWeight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i do that then in newCollector method

 if (optionalFilterWeight.isEmpty()) {
            return hybridcollector;
        }

This code breaks with below error

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "Cannot invoke \"java.util.Optional.isEmpty()\" because \"this.optionalFilterWeight\" is null"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "movies",
                "node": "MpUK-BFmRRiigVIII84uIQ",
                "reason": {
                    "type": "null_pointer_exception",
                    "reason": "Cannot invoke \"java.util.Optional.isEmpty()\" because \"this.optionalFilterWeight\" is null"
                }
            }
        ],
        "caused_by": {
            "type": "null_pointer_exception",
            "reason": "Cannot invoke \"java.util.Optional.isEmpty()\" because \"this.optionalFilterWeight\" is null",
            "caused_by": {
                "type": "null_pointer_exception",
                "reason": "Cannot invoke \"java.util.Optional.isEmpty()\" because \"this.optionalFilterWeight\" is null"
            }
        }
    },
    "status": 500
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vibrantvarun did you construct optional with ofNullable method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. @navneet1v and me hopped on call and determined that it is kind of potiential bug and not the right practice to use Optional in class variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder what is the potential bug, it's not a blocker for this PR but using optional definitely isn't a bug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v would you like to answer martin question

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not a bug. The implementation was not correct due to which above exception happened.

but overall when we used optional on class member the IDE was adding warnings. Hence I suggested we can skip that. It was more of a coding practice and not a serious issue

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code looks good to me. Just resolve the comments and then its ready to be shipped.

@navneet1v
Copy link
Collaborator

@vibrantvarun lets ensure the GH actions are successful.

Signed-off-by: Varun Jain <[email protected]>
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 82.54%. Comparing base (759a971) to head (82e204f).

Files Patch % Lines
...ralsearch/search/query/HybridCollectorManager.java 76.92% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #633      +/-   ##
============================================
- Coverage     82.57%   82.54%   -0.03%     
- Complexity      656      659       +3     
============================================
  Files            52       52              
  Lines          2055     2063       +8     
  Branches        328      332       +4     
============================================
+ Hits           1697     1703       +6     
  Misses          212      212              
- Partials        146      148       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vibrantvarun vibrantvarun requested a review from navneet1v March 14, 2024 00:59
@vibrantvarun vibrantvarun requested a review from navneet1v March 14, 2024 01:03
Signed-off-by: Varun Jain <[email protected]>
@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.13.0 labels Mar 14, 2024
@vibrantvarun vibrantvarun merged commit d2d4cc6 into opensearch-project:main Mar 14, 2024
63 of 64 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 14, 2024
* Post Filter for hybrid query

Signed-off-by: Varun Jain <[email protected]>

* Add changelog

Signed-off-by: Varun Jain <[email protected]>

* Addressing martin comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing martin comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing navneet comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing navneet comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing navneet comments

Signed-off-by: Varun Jain <[email protected]>

* Adding Coverage

Signed-off-by: Varun Jain <[email protected]>

---------

Signed-off-by: Varun Jain <[email protected]>
(cherry picked from commit d2d4cc6)
vibrantvarun added a commit that referenced this pull request Mar 14, 2024
* Post Filter for hybrid query

Signed-off-by: Varun Jain <[email protected]>

* Add changelog

Signed-off-by: Varun Jain <[email protected]>

* Addressing martin comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing martin comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing navneet comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing navneet comments

Signed-off-by: Varun Jain <[email protected]>

* Addressing navneet comments

Signed-off-by: Varun Jain <[email protected]>

* Adding Coverage

Signed-off-by: Varun Jain <[email protected]>

---------

Signed-off-by: Varun Jain <[email protected]>
(cherry picked from commit d2d4cc6)

Co-authored-by: Varun Jain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants