Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine #934

Conversation

navneet1v
Copy link
Collaborator

@navneet1v navneet1v commented Jun 13, 2023

Description

-> Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine.
-> Removed some TODOs and fixed an edge case in the code which failed during Unit tests.

Issues Resolved

#903

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


addKnnDocWithAttributes(
DOC_ID,
new float[] { 6.0f, 7.9f, 3.1f },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can generate vector values using random, seems we're not testing exact doc ids in result. This way it's easy to generate more docs and then vary between ann and exact searches

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use random generator. But I don't see any point of using random generator. Atleast this way we know what we are ingesting and in future we can compare scores also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have both exact vector values and random generated, first can be used in case we're checking score values and second one in cases when we do need more massive corpus. I think we doing random generation for other test in k-NN. Although not a blocker for this PR.

@@ -280,4 +293,106 @@ public void testEndToEnd_fromModel() throws IOException, InterruptedException {
assertEquals(numDocs - i - 1, Integer.parseInt(results.get(i).getDocId()));
}
}

@SneakyThrows
public void testQueryWithFilter_withDifferentCombination_thenSuccess() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add different tests for exact and ann searchers? I on a fence if we need it in IT as we already have unit test for this and it might be too low level for IT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we should have integration tests. Even when we have Unit tests. The reason why I am combining both is, the setup for the tests are same. Hence I combined them at 1 place. Given that we already have so many integration tests and setting up each integration tests add more time. Hence, using a single setup.

@@ -44,6 +49,14 @@

public class FaissIT extends KNNRestTestCase {

private static final String INDEX_NAME = "test-index-1";
private static final String FIELD_NAME = "test-field-1";
private static final String DOC_ID = "doc1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DOC_ID_1 for consistency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final List<KNNResult> knnResultsKLimitsFilterResult = parseSearchResponse(responseBodyKLimitsFilterResult, FIELD_NAME);

assertEquals(expectedDocIdsKLimitsFilterResult.size(), knnResultsKLimitsFilterResult.size());
assertTrue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dup of 313 - should we just make this its own statement?

@navneet1v navneet1v force-pushed the feature/faiss-filtering branch from 1bde71e to ffcbd77 Compare June 13, 2023 18:01
@navneet1v navneet1v merged commit 76ac15a into opensearch-project:feature/faiss-filtering Jun 13, 2023
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include

 * Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit that referenced this pull request Jun 14, 2023
…es include (#936)

* Enabled the efficient filtering support for Faiss Engine (#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jun 14, 2023
…es include (opensearch-project#936)

* Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Jul 10, 2023
…es include (opensearch-project#936)

* Enabled the efficient filtering support for Faiss Engine (opensearch-project#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (opensearch-project#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (opensearch-project#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (opensearch-project#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (opensearch-project#934)

Signed-off-by: Navneet Verma <[email protected]>
navneet1v added a commit that referenced this pull request Jul 10, 2023
…es include (#936)

* Enabled the efficient filtering support for Faiss Engine (#907)
 * Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests (#926)
 * Added exact search for cases when filteredIds < k to improve the recall for exact search (#928)
 * Improved Exact Search to return only K results and added client side latency metric for query Benchmarks (#933)
 * Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine (#934)

Signed-off-by: Navneet Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants