Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track segment information in (mixed) dense vector search #106914

Closed
tteofili opened this issue Mar 29, 2024 · 4 comments
Closed

Track segment information in (mixed) dense vector search #106914

tteofili opened this issue Mar 29, 2024 · 4 comments
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@tteofili
Copy link
Contributor

Related to #106591 , a good point was raised that in case there're bugs or concerns about a given KNN query running against a "mixed" set of segments (e.g. partly flat and partly hnsw) it would be hard to debug where the problem comes from.
To this end it'd be useful to have some way to track segment info in this context and e.g. be able to relate failures / warnings / slowness to specific segments.

@tteofili tteofili added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Mar 29, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Mar 29, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@tteofili tteofili self-assigned this Apr 4, 2024
@tteofili
Copy link
Contributor Author

tteofili commented Apr 5, 2024

one thing we could do is start by adding information from Lucene SegmentInfo#codec within ES Engine class to expose which kinds of underlying data structures are used within each segments (including KnnVectorFormat) within the Index Segments API.

@tteofili
Copy link
Contributor Author

tteofili commented Apr 22, 2024

another option is to enable tracking vector formats in AbstractKnnVectorQuery#explain so that the Explanation also contains per-doc vector format. This would help in situations were mappings have been updated (e.g. from hnsw to int8_hnsw) but most of the knn query results still come from segments with pre-update formats.

@tteofili
Copy link
Contributor Author

tteofili commented May 14, 2024

in addition to the per-field KnnVectorFormat information recorded on the ES side (from mappings), Lucene can provide proper per-segment, per-field KnnVectorFormat (read from the segments), see PR.

update: this PR superseeds the Lucene one, as what we need is already available in FieldInfo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

2 participants