Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Vector Search Nested benchmark #584

Merged
merged 7 commits into from
Aug 22, 2024

Conversation

finnroblin
Copy link
Contributor

Description

This change supports nested field benchmarks for vector search. Previously k-NN maintainers would need to use the deprecated perf tool to benchmark nested fields.

Issues Resolved

This PR is part of the effort to port perf tool benchmarks to OSB.

Testing

  • New functionality includes testing

Unit tests + tested E2E with modified nested sift euclidian dataset (workload PR forthcoming)


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -884,6 +884,7 @@ class VectorDataSetPartitionParamSource(ParamSource):
def __init__(self, workload, params, context: Context, **kwargs):
super().__init__(workload, params, **kwargs)
self.field_name: str = parse_string_parameter("field", params)
self.is_nested = "." in self.field_name # in base class because used for both bulk ingest and queries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make "." as constant

@@ -1155,6 +1156,16 @@ def _build_vector_search_query_body(self, vector, efficient_filter=None) -> dict
query.update({
"filter": efficient_filter,
})

if self.is_nested:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if more than one level nested is identified, lets raise an exception and add TODO to fix that later

@@ -884,6 +884,8 @@ class VectorDataSetPartitionParamSource(ParamSource):
def __init__(self, workload, params, context: Context, **kwargs):
super().__init__(workload, params, **kwargs)
self.field_name: str = parse_string_parameter("field", params)
self.NESTED_FIELD_SEPARATOR = "."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it constant

"knn": {
self.field_name: query,
},
}

if self.is_nested:
outer_field_name, _inner_field_name = self.get_split_fields()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
outer_field_name, _inner_field_name = self.get_split_fields()
outer_field_name, _ = self.get_split_fields()

@VijayanB
Copy link
Member

@finnroblin looks good, please resolve conflict and rebase your change.

@VijayanB
Copy link
Member

LGTM. Thanks

@VijayanB VijayanB merged commit 2532d77 into opensearch-project:main Aug 22, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants