-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add schema name if dropping index in pgvector store #1277
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,13 @@ | |
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import os | ||
import random | ||
import string | ||
from unittest.mock import patch | ||
|
||
import numpy as np | ||
import psycopg | ||
import pytest | ||
from haystack.dataclasses.document import ByteStream, Document | ||
from haystack.document_stores.errors import DuplicateDocumentError | ||
|
@@ -259,3 +263,41 @@ def test_from_pg_to_haystack_documents(): | |
assert haystack_docs[2].meta == {"meta_key": "meta_value"} | ||
assert haystack_docs[2].embedding == [0.7, 0.8, 0.9] | ||
assert haystack_docs[2].score is None | ||
|
||
|
||
@pytest.mark.integration | ||
def test_hnsw_index_recreation_in_new_schema(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the test can be improved/changed. I would probably first create a Document Store instance, I can help with the implementation, if needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your help would be greatly appreciated, as I’ve been trying to find a way to check if the index was recreated, but I haven’t found a solution yet. From this "high-level" scope, I can’t verify if the deletion was successful. Because it happened inside the "process". |
||
# Set your Postgres connection string (or set PG_CONN_STR in your environment directly). | ||
os.environ["PG_CONN_STR"] = "postgresql://postgres:postgres@localhost:5432/postgres" | ||
|
||
table_name = "test_table" | ||
index_name = f"{table_name}_index" | ||
schema_name = "".join(random.choices(string.ascii_letters, k=8)).lower() # noqa: S311 | ||
embedding_dimension = 1024 | ||
|
||
# Create the new schema if it doesn't exist. | ||
with psycopg.connect(os.environ["PG_CONN_STR"]) as connection: | ||
with connection.cursor() as cursor: | ||
cursor.execute(f"CREATE SCHEMA IF NOT EXISTS {schema_name};") | ||
connection.commit() | ||
|
||
# Instantiate the document store in the new schema with HNSW indexing. | ||
document_store = PgvectorDocumentStore( | ||
embedding_dimension=embedding_dimension, | ||
schema_name=schema_name, | ||
vector_function="cosine_similarity", | ||
recreate_table=False, | ||
search_strategy="hnsw", | ||
table_name=table_name, | ||
hnsw_index_name=index_name, | ||
hnsw_recreate_index_if_exists=True, # This ensures we drop/re-create the index if it exists | ||
keyword_index_name=f"{table_name}_keyword_index", | ||
) | ||
|
||
# First write documents | ||
docs1 = [Document(content="Test Content 1", embedding=[0.8] * embedding_dimension)] | ||
document_store.write_documents(docs1) | ||
|
||
# Second write documents | ||
docs2 = [Document(content="Test Content 2", embedding=[0.7] * embedding_dimension)] | ||
document_store.write_documents(docs2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this check.
If the HNSW index is not dropped, the following invocation
self._create_hnsw_index()
will fail. Right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right; I overthought that. The only reason it might fail during creation is if it wasn't deleted beforehand. However, you wouldn't know that was the cause. That said, this will be tested moving forward.