Skip to content

Commit

Permalink
Iterative Vector Insertion (opensearch-project#1840)
Browse files Browse the repository at this point in the history
* Rebased with new version of k-NN

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized faiss insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized threadCount logic

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed IDEA files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary cmake file

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to new functions

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex and fixed test cases that use it

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused code

Signed-off-by: Andrew Klepchick <[email protected]>

* Explained zero initialization for vector transfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Added locale

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless Apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Account for zero documents in finished batch

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed where we check for zero docs

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed tip for return

Signed-off-by: Andrew Klepchick <[email protected]>

* Use unique pointers to make sure resources are released on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Moved createIndex to testUtils

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management so that the underlying index is not deleted after initialized

Signed-off-by: Andrew Klepchick <[email protected]>

* Created new KNNIndexBuilder graph to make index building more modular

Signed-off-by: Andrew Klepchick <[email protected]>

* Streamlined logic in KNNIndexBuilder.

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up unnecessary code in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management process

Signed-off-by: Andrew Klepchick <[email protected]>

* Added note about index initialization in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for case where the exception happens after the indexWriter is released.

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/modules.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/vcs.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/workspace.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply and free iterative index on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid hack for checking first document metrics

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Free Vector Transfer on batch ingestion

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid free

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed check for transfer ready

Signed-off-by: Andrew Klepchick <[email protected]>

* Don't crash when zero vectors inserted?

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted to old insertion process?

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed prior createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Test remaking vectorTransfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Test restructuring of insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed case where vector address is immediately discarded

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Split Index Builder into multiple classes

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed descriptions of functions in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back copyright files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused builder names

Signed-off-by: Andrew Klepchick <[email protected]>

* Modified tests to work with new insertion methods

Signed-off-by: Andrew Klepchick <[email protected]>

* Track index insertions

Signed-off-by: Andrew Klepchick <[email protected]>

* Tracked insertions for binary indices

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back insertIds

Signed-off-by: Andrew Klepchick <[email protected]>

* Added check for freeVectorData to see if it works with an already deleted address

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up logs and comments in KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Restructured the logic for KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed package name of KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed all package names and deleted unnecessary headers

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed for loop

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex methods for faiss index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed package to fit naming conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name of index builder

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to NativeIndexBuilder and restructured

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion for memoryAddress

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed naming of classes to Writer and changed package name to fit conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed NativeIndexInfo and NativeVectorInfo to follow builder pattern

Signed-off-by: Andrew Klepchick <[email protected]>

* Added feature to changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Added class descriptions to each NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name to getBytesPerVector

Signed-off-by: Andrew Klepchick <[email protected]>

* Added == false instead of ! for readability

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming in docvaluesconsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* SpotlessApply

Signed-off-by: Andrew Klepchick <[email protected]>

* Made it so that we don't reuse testValues and removed a foot gun

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed another foot gun in getIndexInfo

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion on exception cases

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary delete (NativeIndexWriter will handle deletion of vectors on exception)

Signed-off-by: Andrew Klepchick <[email protected]>

* Added correct logger and getWriter method to NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Ensured memory safety on JNI layer so that Java doesn't have to wrap everything in a try catch loop.

Signed-off-by: Andrew Klepchick <[email protected]>

* Refactored NativeIndexWriter and added comments to FaissService

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed free in the JNIExport since index will always be freed in writeIndex.

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed getVectorTransfer back to accept VectorDataType

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted free since not guaranteed to be IDMap.

Signed-off-by: Andrew Klepchick <[email protected]>

* Added all processes in addKNNBinaryField to NativeIndexWriter.createKNNIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Applied spotless

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back writeFooter

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed threadCount fron writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed redundancies in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed serializationMode

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed double free test as we don't have to worry about that anymore

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for HNSWSQ in index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed delete in catch

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed faiss tests to work with writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>
  • Loading branch information
MrFlap authored and shatejas committed Aug 14, 2024
1 parent a4697f4 commit 047becb
Show file tree
Hide file tree
Showing 27 changed files with 1,323 additions and 513 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
### Features
* Integrate Lucene Vector field with native engines to use KNNVectorFormat during segment creation [#1945](https://github.com/opensearch-project/k-NN/pull/1945)
### Enhancements
* Add functionality to iteratively insert vectors into a faiss index to improve the memory footprint during indexing. [#1840](https://github.com/opensearch-project/k-NN/pull/1840)
### Bug Fixes
* Corrected search logic for scenario with non-existent fields in filter [#1874](https://github.com/opensearch-project/k-NN/pull/1874)
* Add script_fields context to KNNAllowlist [#1917] (https://github.com/opensearch-project/k-NN/pull/1917)
Expand Down
82 changes: 49 additions & 33 deletions jni/include/faiss_index_service.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,36 +31,38 @@ namespace faiss_wrapper {
class IndexService {
public:
IndexService(std::unique_ptr<FaissMethods> faissMethods);
//TODO Remove dependency on JNIUtilInterface and JNIEnv
//TODO Reduce the number of parameters

/**
* Create index
* Initialize index
*
* @param jniUtil jni util
* @param env jni environment
* @param metric space type for distance calculation
* @param indexDescription index description to be used by faiss index factory
* @param dim dimension of vectors
* @param numVectors number of vectors
* @param threadCount number of thread count to be used while adding data
* @param parameters parameters to be applied to faiss index
* @return memory address of the native index object
*/
virtual jlong initIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, faiss::MetricType metric, std::string indexDescription, int dim, int numVectors, int threadCount, std::unordered_map<std::string, jobject> parameters);
/**
* Add vectors to index
*
* @param dim dimension of vectors
* @param numIds number of vectors
* @param threadCount number of thread count to be used while adding data
* @param vectorsAddress memory address which is holding vector data
* @param ids a list of document ids for corresponding vectors
* @param idMapAddress memory address of the native index object
*/
virtual void insertToIndex(int dim, int numIds, int threadCount, int64_t vectorsAddress, std::vector<int64_t> &ids, jlong idMapAddress);
/**
* Write index to disk
*
* @param threadCount number of thread count to be used while adding data
* @param indexPath path to write index
* @param parameters parameters to be applied to faiss index
* @param idMap memory address of the native index object
*/
virtual void createIndex(
knn_jni::JNIUtilInterface * jniUtil,
JNIEnv * env,
faiss::MetricType metric,
std::string indexDescription,
int dim,
int numIds,
int threadCount,
int64_t vectorsAddress,
std::vector<int64_t> ids,
std::string indexPath,
std::unordered_map<std::string, jobject> parameters);
virtual void writeIndex(std::string indexPath, jlong idMapAddress);
virtual ~IndexService() = default;
protected:
std::unique_ptr<FaissMethods> faissMethods;
Expand All @@ -76,7 +78,21 @@ class BinaryIndexService : public IndexService {
//TODO Reduce the number of parameters
BinaryIndexService(std::unique_ptr<FaissMethods> faissMethods);
/**
* Create binary index
* Initialize index
*
* @param jniUtil jni util
* @param env jni environment
* @param metric space type for distance calculation
* @param indexDescription index description to be used by faiss index factory
* @param dim dimension of vectors
* @param numVectors number of vectors
* @param threadCount number of thread count to be used while adding data
* @param parameters parameters to be applied to faiss index
* @return memory address of the native index object
*/
virtual jlong initIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, faiss::MetricType metric, std::string indexDescription, int dim, int numVectors, int threadCount, std::unordered_map<std::string, jobject> parameters) override;
/**
* Add vectors to index
*
* @param jniUtil jni util
* @param env jni environment
Expand All @@ -86,23 +102,23 @@ class BinaryIndexService : public IndexService {
* @param numIds number of vectors
* @param threadCount number of thread count to be used while adding data
* @param vectorsAddress memory address which is holding vector data
* @param ids a list of document ids for corresponding vectors
* @param idMap a map of document id and vector id
* @param parameters parameters to be applied to faiss index
*/
virtual void insertToIndex(int dim, int numIds, int threadCount, int64_t vectorsAddress, std::vector<int64_t> &ids, jlong idMapAddress) override;
/**
* Write index to disk
*
* @param jniUtil jni util
* @param env jni environment
* @param metric space type for distance calculation
* @param indexDescription index description to be used by faiss index factory
* @param threadCount number of thread count to be used while adding data
* @param indexPath path to write index
* @param idMap a map of document id and vector id
* @param parameters parameters to be applied to faiss index
*/
virtual void createIndex(
knn_jni::JNIUtilInterface * jniUtil,
JNIEnv * env,
faiss::MetricType metric,
std::string indexDescription,
int dim,
int numIds,
int threadCount,
int64_t vectorsAddress,
std::vector<int64_t> ids,
std::string indexPath,
std::unordered_map<std::string, jobject> parameters
) override;
virtual void writeIndex(std::string indexPath, jlong idMapAddress) override;
virtual ~BinaryIndexService() = default;
};

Expand Down
9 changes: 5 additions & 4 deletions jni/include/faiss_wrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@

namespace knn_jni {
namespace faiss_wrapper {
// Create an index with ids and vectors. The configuration is defined by values in the Java map, parametersJ.
// The index is serialized to indexPathJ.
void CreateIndex(knn_jni::JNIUtilInterface * jniUtil, JNIEnv * env, jintArray idsJ, jlong vectorsAddressJ, jint dimJ,
jstring indexPathJ, jobject parametersJ, IndexService* indexService);
jlong InitIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jlong numDocs, jint dimJ, jobject parametersJ, IndexService *indexService);

void InsertToIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jintArray idsJ, jlong vectorsAddressJ, jint dimJ, jlong indexAddr, jint threadCount, IndexService *indexService);

void WriteIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jstring indexPathJ, jlong indexAddr, IndexService *indexService);

// Create an index with ids and vectors. Instead of creating a new index, this function creates the index
// based off of the template index passed in. The index is serialized to indexPathJ.
Expand Down
49 changes: 40 additions & 9 deletions jni/include/org_opensearch_knn_jni_FaissService.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,54 @@
#ifdef __cplusplus
extern "C" {
#endif

/*
* Class: org_opensearch_knn_jni_FaissService
* Method: createIndex
* Method: initIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_createIndex
(JNIEnv *, jclass, jintArray, jlong, jint, jstring, jobject);

JNIEXPORT jlong JNICALL Java_org_opensearch_knn_jni_FaissService_initIndex(JNIEnv * env, jclass cls,
jlong numDocs, jint dimJ,
jobject parametersJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: createBinaryIndex
* Method: initBinaryIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_createBinaryIndex
(JNIEnv *, jclass, jintArray, jlong, jint, jstring, jobject);

JNIEXPORT jlong JNICALL Java_org_opensearch_knn_jni_FaissService_initBinaryIndex(JNIEnv * env, jclass cls,
jlong numDocs, jint dimJ,
jobject parametersJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: insertToIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_insertToIndex(JNIEnv * env, jclass cls, jintArray idsJ,
jlong vectorsAddressJ, jint dimJ,
jlong indexAddress, jint threadCount);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: insertToBinaryIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_insertToBinaryIndex(JNIEnv * env, jclass cls, jintArray idsJ,
jlong vectorsAddressJ, jint dimJ,
jlong indexAddress, jint threadCount);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: writeIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_writeIndex(JNIEnv * env, jclass cls,
jlong indexAddress,
jstring indexPathJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: writeBinaryIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_writeBinaryIndex(JNIEnv * env, jclass cls,
jlong indexAddress,
jstring indexPathJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: createIndexFromTemplate
Expand Down
Loading

0 comments on commit 047becb

Please sign in to comment.