-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Move Lucene Vector field and HNSW KNN Search as a first class feature in core #1467
Comments
Why? |
I thought the knn plugin API applies to both lucene and native engine but they are only for native engines. So, the previous comment is not correct. If lucene ever support model training feature by introducing new algorithm like ivf in the future, then there will be two different APIs. |
++ sounds good. So then I propose a |
If there's no opposition then I'll move this issue to core because it'll essentially be a no-op for kNN. kNN can decide in the future whether to simplify the vector field type API for native implementations. |
The index setting and mapping is shared between lucene engine and native engine currently. The
|
Sure you can. FieldMappers implementations are independent of the underlying Also, it looks like I misunderstood your comment, I thought you said this plugin didn't end up implementing Lucene's HNSW, but it looks like it does through the |
As a reference, this is similar to how I supported QuadTree, GeoHashPrefixTree, and BKD implementations of It's just moving around the implementation to promote Lucene's HNSW vector and KNN search as a first class core field type and feature. |
@nknize Having a vector field as a first class citizen in Opensearch is quite exciting. But I have few questions around this:
|
There are users of OpenSearch min distribution that need vector types and cannot use any plugins.
Correct. Is this a problem?
See my comment above: "You may want a deprecation path for BWC, of course". This isn't difficult to do. My question here is whether there is interest or not. If not, then I may end up just adding a new field mapper to core anyway that exposes Lucene's |
I would say interest is there atleast from my side.
If this is done then for sure there will be big confusion and won't be good for the Opensearch as a product.
I don't see this as a problem the only thing for me is backward compatibility. It should not be like customer needs to migrate their workloads because of the above mentioned problems. |
I agree 💯 |
+1 - would like to avoid this. Overall, I think this makes sense. Moving base vector functionality to core has the following pros that I can think of
Also, philosophically, I think that core should keep field types and query types for fundamental functionality. For instance, we wouldnt have a plugin that implements numeric types (maybe an over-exageration). I think vectors have definitely exited the niche type and moved towards a pretty common type. That being said, Im not sure on introducing a new field type |
+1 to Jack's comment above. Its high time we see vector data type as first class citizen(part of core) for the OpenSearch similar to what Lucene has done. I would support moving knn_vector field type to core rather than creating another field type mimicking the same behavior. Anyways this is going to be a big lift and shift work and involves rearchitecting some of the pieces. It will be good to make it part of a major version release. Do we actually see backward compatibility issue? |
For the most part you're not missing anything. I would suggest renaming the mapped field type from |
@nknize I would prefer just keeping the field name same |
Is your feature request related to a problem?
Core OpenSearch does not support Vector types as a first class field. The correlation engine has a
CorrelationVectorFieldMapper
that uses Lucene'sKNNFloatVectorField
but this is in theevents-correlation-engine
plugin. We could move that field mapper to the core library, but we don't want to fragment between different vector field implementations. So why not move the Lucene HNSW backed vector field and Knn search as a first class field in a core library?What solution would you like?
A discussion around making
vector
field type as a first class citizen in core. We've discussed this before in "person" but I don't know if we have an issue around it. I don't think there's a reason to not have Lucene vector fields and HNSW backed KNN search as a core feature and leverage the OpenSearch kNN plugin as an optional accelerator using alternative native options like FAISS or nmslib?What alternatives have you considered?
Leave as is if there is a compelling reason to keep this base Lucene capability integration in a separate downstream plugin.
Do you have any additional context?
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: