Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement][Build-TimeReduction V1] Remove conversion of KNN Field to string if stored Fields not required #1598

Closed
navneet1v opened this issue Apr 8, 2024 · 4 comments
Assignees
Labels
Enhancements Increases software capabilities beyond original client specifications indexing-improvements This label should be attached to all the github issues which will help improving the indexing time.

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Apr 8, 2024

Description

While doing some benchmarks with Opensearch k-NN. I see that there is time going in to convert the kNN field to String even when storedFields is marked as false. On doing deep-dive I found out that, we convert the field to the string and then check if stored field is set as true or false.

Code ref:
https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java#L575

Here point.toString will convert the object to string which is expensive operation in high dimensionality use cases.

Refer the Flame graph for the same for Vector dimension: 1536 ingestion

Screenshot 2024-04-08 at 12 58 58 PM

@navneet1v navneet1v moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Apr 8, 2024
@navneet1v navneet1v moved this from Backlog (Hot) to 2.14.0 in Vector Search RoadMap Apr 8, 2024
@navneet1v navneet1v added Enhancements Increases software capabilities beyond original client specifications and removed untriaged labels Apr 8, 2024
@navneet1v navneet1v self-assigned this Apr 8, 2024
@navneet1v
Copy link
Collaborator Author

The fix for the issue is pretty simple rather than doing toString we just need to pass the VectorField to the underline function.

@navneet1v navneet1v added the indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. label Apr 8, 2024
@navneet1v navneet1v moved this from 2.14.0 to Backlog (Hot) in Vector Search RoadMap Apr 8, 2024
@navneet1v navneet1v changed the title Remove conversion of KNN Field to string if stored Fields not required [Enhancement][Build-Time] Remove conversion of KNN Field to string if stored Fields not required Apr 9, 2024
@navneet1v
Copy link
Collaborator Author

Resolving the issue as the PR is merged.

@github-project-automation github-project-automation bot moved this from Backlog (Hot) to ✅ Done in Vector Search RoadMap Apr 12, 2024
@jmazanec15
Copy link
Member

@navneet1v for the flame graph above, what was the setup? Trying to understand impact on ingestion

@navneet1v
Copy link
Collaborator Author

navneet1v commented Apr 15, 2024

I created a 2.14 cluster and did the ingestion. I removed few more things from indexing path like adding recovery_source and parsing of vector field as an array. But you don't need all these changes. Just pick up a 1536D dataset and run it with and without this change

and I used a 1536D dataset with 5M records

@navneet1v navneet1v changed the title [Enhancement][Build-Time] Remove conversion of KNN Field to string if stored Fields not required [Enhancement][Build-TimeReduction V1] Remove conversion of KNN Field to string if stored Fields not required Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications indexing-improvements This label should be attached to all the github issues which will help improving the indexing time.
Projects
Status: Done
Development

No branches or pull requests

2 participants