Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate dimension, vector data type validation/processing inside Library #1957

Merged
merged 15 commits into from
Aug 15, 2024

Conversation

jmazanec15
Copy link
Member

@jmazanec15 jmazanec15 commented Aug 14, 2024

Description

As part of #1779 , we need to have the ability to take a user config for a float based space type (i.e. l2) and configure the quantization framework to go from float to bit and build a faiss binary index. In the current implementation, the data type validation is outside of the KNNLibrary/engine abstractions. This will make it difficult do the complex configuration around the quantization framework index building and search. The main theme of this PR is to move towards handling Library specific configuration to the library.

This PR does the following:

  1. Adds KNNMethodConfigContext that is passed to KNNLibrary for validation as well as processing. The KNNMethodConfigContext object has information outside of what is passed by the user for configuring the method. This will allow the KNNLibrary specific components to do more complex validation/processing logic such as configuring a binary index, or figuring out how to validate/process incoming vectors. This wasnt added as a part of KNNMethodContext because it would bloat the class further. This is somewhat of an extension to the VectorSpaceInfo abstraction
  2. For per-dimension processing/validation, faiss fp16, we moved the different validators/processors to the Library so we dont have to extract encoder information from the mapping.
  3. Dimension check's and version created were added to KNNMethodConfigContext

Check List

  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jmazanec15 jmazanec15 added Refactoring Improve the design, structure, and implementation while preserving its functionality backport 2.x labels Aug 14, 2024
Introduces a new configuration object for KNNMethodContext,
KNNMethodConfigContext. The KNNMethodContext contains the user provided
information for they want their index to be built like. However, it is
missing a few pieces that are defined outside of it. However, these
pieces are needed to validate the config and actually build the config.

This new object gives the code a chance to pass that along and handle
branching inside the engine definition.

Signed-off-by: John Mazanec <[email protected]>
Integrates validation of KNNMethodContexts with the optional values of
KNNMethodConfigContext. This simplifies the code in the mapper pretty
significantly.

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 changed the title [WIP] Encapsulate dimension, vector data type validation/processing inside Library Encapsulate dimension, vector data type validation/processing inside Library Aug 14, 2024
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Just 1 comment on the BWC at the codec level

@@ -243,18 +242,6 @@ private void createKNNIndexFromScratch(FieldInfo fieldInfo, KNNCodecUtil.Pair pa
);
}

// Update index description of Faiss for binary data type
if (KNNEngine.FAISS == knnEngine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if segment is an old segment. I think we will still need this update for old segments. We might need to find a way to check if the index description is correct for binary index and if it not correct then update it otherwise don't update the index description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - forgot about this. Fixed

@jmazanec15
Copy link
Member Author

@navneet1v ah I forgot about that - good catch. Let me fix

@jmazanec15 jmazanec15 force-pushed the method-backpack branch 5 times, most recently from 7adbb2b to 4b2871d Compare August 15, 2024 02:02
Comment on lines 77 to 79
if (getBWCVersion().isPresent() && Version.fromString(maybeRemoveSnapshotSuffix(getBWCVersion().get())).before(Version.V_2_16_0)) {
return;
}
Copy link
Collaborator

@navneet1v navneet1v Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another way to do this skipping through brazil.gradle. wondering why we choose this way?

Ref on how to skip via brazil.gradle

https://github.com/opensearch-project/neural-search/blob/main/qa/restart-upgrade/build.gradle#L134-L142

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

@jmazanec15 jmazanec15 force-pushed the method-backpack branch 4 times, most recently from c49de85 to a95577e Compare August 15, 2024 02:33
Signed-off-by: John Mazanec <[email protected]>
@navneet1v navneet1v self-requested a review August 15, 2024 03:58
@navneet1v
Copy link
Collaborator

@jmazanec15 seems like a build failure.. can you check once whats happening

@jmazanec15
Copy link
Member Author

@navneet1v I dont see anything interesting in the logs and am not able to reproduce it locally. Im guessing its unrelated/flaky. Captured the logs: https://gist.github.com/jmazanec15/ec985b4aeeae3325d5159974cfb375ab. Ill retry

@jmazanec15 jmazanec15 merged commit f42e86e into opensearch-project:main Aug 15, 2024
29 of 30 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1957-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f42e86eaaa8d4c7b6e0da30326a1215312560b0a
# Push it to GitHub
git push --set-upstream origin backport/backport-1957-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1957-to-2.x.

akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
…Library (opensearch-project#1957)

Introduces a new configuration object KNNMethodConfigContext.
The KNNMethodContext contains the user provided
information for want their index to be built like. However, it is
missing a few pieces that are defined outside of it. These
pieces are needed to validate the config and actually build the config.

This change Integrates validation of KNNMethodContexts with
KNNMethodConfigContext and better encapsulates KNNLibrary
config info in the engine package

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: Akash Shankaran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Refactoring Improve the design, structure, and implementation while preserving its functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants