-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGVectorStore: Allow insertion of rows with set ids #3594
Comments
🤖 Hey @MJDeligan! Back at it again, huh? Always good to see familiar faces around these parts 👾 Yes, it is possible to modify the PGVectorStore in the langchainjs repository to allow the insertion of rows with set ids, similar to the Python API, for the implementation of the Indexing API. To achieve this, you would need to modify the Here is how you could modify these methods: /**
* Constructs the SQL query for inserting rows into the specified table.
*
* @param rows - The rows of data to be inserted, consisting of values and records.
* @param chunkIndex - The starting index for generating query placeholders based on chunk positioning.
* @returns The complete SQL INSERT INTO query string.
*/
private async buildInsertQuery(rows: (string | Record<string, unknown>)[][]) {
let collectionId;
if (this.collectionTableName) {
collectionId = await this.getOrCreateCollection();
}
const columns = [
this.idColumnName,
this.contentColumnName,
this.vectorColumnName,
this.metadataColumnName,
];
if (collectionId) {
columns.push("collection_id");
}
const valuesPlaceholders = rows
.map((_, j) => this.generatePlaceholderForRowAt(j, columns.length))
.join(", ");
const text = `
INSERT INTO ${this.tableName}(
${columns}
)
VALUES ${valuesPlaceholders}
`;
return text;
}
/**
* Method to add vectors to the vector store. It converts the vectors into
* rows and inserts them into the database.
*
* @param vectors - Array of vectors.
* @param documents - Array of `Document` instances.
* @param ids - Array of ids.
* @returns Promise that resolves when the vectors have been added.
*/
async addVectors(vectors: number[][], documents: Document[], ids: string[]): Promise<void> {
const rows = [];
let collectionId;
if (this.collectionTableName) {
collectionId = await this.getOrCreateCollection();
}
for (let i = 0; i < vectors.length; i += 1) {
const values = [];
const embedding = vectors[i];
const embeddingString = `[${embedding.join(",")}]`;
values.push(
ids[i],
documents[i].pageContent,
embeddingString,
documents[i].metadata
);
if (collectionId) {
values.push(collectionId);
}
rows.push(values);
}
for (let i = 0; i < rows.length; i += this.chunkSize) {
const chunk = rows.slice(i, i + this.chunkSize);
const insertQuery = await this.buildInsertQuery(chunk);
const flatValues = chunk.flat();
try {
await this.pool.query(insertQuery, flatValues);
} catch (e) {
console.error(e);
throw new Error(`Error inserting: ${(e as Error).message}`);
}
}
} In the In the You can find the current implementation of these methods in the langchain/src/vectorstores/pgvector.ts file in the langchainjs repository. Please note that this modification will require you to manually provide the ids when calling the I hope this helps! If you have any further questions, feel free to ask. Best regards, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Currently the implementation of the PGVectorStore only allows setting the page content and metadata for documents when adding them (or vectors directly). The Python API additionally allows passing ids that are used when inserting the rows.
This feature is necessary for an implementation of the Indexing API as the hashed ids would need to be set manually for the rows of the vector store. See also #3390 .
The text was updated successfully, but these errors were encountered: