Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: PGVectorStore error inserting documents when column names have uppercase letters #3595

Closed
MJDeligan opened this issue Dec 8, 2023 · 2 comments · Fixed by #3692
Closed
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@MJDeligan
Copy link
Contributor

MJDeligan commented Dec 8, 2023

Description

When the vector store is initialized with one of the column names containing uppercase letters, such as contentColumnName = pageContent, then adding documents to the store causes an error 👍

Error inserting: column "pageContent" of relation "testlangchain" does not exist

This can be reproduced by simply creating a pgvectorstore where one of the columns contains uppercase lettering and trying to insert. The pgvector integration can also be used to demonstrate this.

Cause

This is because the creation of the table and the insertion query for rows are incompatible:

The creation query uses "" to maintain uppercase letters in the column names:

 await this.pool.query(`
      CREATE TABLE IF NOT EXISTS ${this.tableName} (
        "${this.idColumnName}" uuid NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
        "${this.contentColumnName}" text,
        "${this.metadataColumnName}" jsonb,
        "${this.vectorColumnName}" vector
      );
    `);

The insertion query however uses a parameterized query to set the column names dynamically:

The insertion query however just inserts the column names as is:

private async buildInsertQuery(rows: (string | Record<string, unknown>)[][]) {
    let collectionId;
    if (this.collectionTableName) {
      collectionId = await this.getOrCreateCollection();
    }

    const columns = [
      this.contentColumnName,
      this.vectorColumnName,
      this.metadataColumnName,
    ];

    if (collectionId) {
      columns.push("collection_id");
    }

    const valuesPlaceholders = rows
      .map((_, j) => this.generatePlaceholderForRowAt(j, columns.length))
      .join(", ");

    const text = `
      INSERT INTO ${this.tableName}(
        ${columns}
      )
      VALUES ${valuesPlaceholders}
    `;
    return text;
  }

This does not preserve the uppercasing. Note that node-pg explicitly states that identifiers are not supported for parametrization.

Options

  1. Create two separate queries: one with the collection_id and one without, both with column names written out in "" and execute only one of them
  2. The column names are put in "" , dynamically joined and inserted into the string (not as parameter)
  3. Consider the pg-format library, which can also handle identifiers. This library would also be good for many of the other queries, since it significantly reduces the effort required to dynamically create insert queries, because it can also handle arrays and objects, eliminating the need to dynamically generate placeholders.
@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 8, 2023
Copy link

dosubot bot commented Dec 8, 2023

🤖

Thank you for bringing this issue to our attention! It's great to see your thorough analysis of the problem and the proposed options for resolving it. Your understanding of the underlying cause and the potential solutions is very valuable.

I would encourage you to go ahead and implement one of the proposed solutions as a pull request. Your contribution will not only help address this issue but also benefit other users who may encounter similar challenges. Keep up the great work, and thank you for considering contributing to langchainjs!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@MJDeligan
Copy link
Contributor Author

#3692 fixes this bug for the insertion query, but uppercase letters still remain a problem in other queries such as the similaritySearchVectorWithScore, where column names are not quoted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant