Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce heap usage for knn index writers #13538

Merged

Conversation

benwtrent
Copy link
Member

This slightly reduces heap utilization for KnnIndex writers by:

  • Only constructing one DocsWithFieldSet instead of 3 for Scalar quantized HNSW formats
  • Only having one List<T> instead of 3 for Scalar quantized HNSW formats

This does change an experimental API (FlatVectorFormats) slightly to allow reuse.

@gautamworah96
Copy link
Contributor

Thanks for the quick turnaround on this Ben. I am not too familiar with this code and am taking some time off this week. I'll try to review the code by Friday.

In the meantime, I am also trying to understand some core IndexWriter related concepts. It seems that tuning the indexWriterBufferSizeMB to a lower value seems to be helping (which makes sense because the sum of length of the float[] array across segments being flushed is small, I am trying to see if I can get larger segments by keeping merges enabled, in the email thread I had mentioned that my merges had been disabled).

@benwtrent
Copy link
Member Author

Related: #13553

@benwtrent benwtrent added this to the 9.12.0 milestone Jul 9, 2024
@benwtrent benwtrent merged commit 428fdb5 into apache:main Jul 10, 2024
3 checks passed
@benwtrent benwtrent deleted the feature/reduce-heap-usage-on-vector-index branch July 10, 2024 14:28
benwtrent added a commit that referenced this pull request Jul 10, 2024
* Reduce heap usage for knn index writers

* iter

* fixing heap usage & adding changes

* javadocs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants