Skip to content

Commit

Permalink
pgvector: ensure vector is sent in binary representation
Browse files Browse the repository at this point in the history
PostgreSQL supports two methods of passing data from client to
server: text and binary. While for many data types the difference
may not be noticeable, we can see significant performance impact
when converting a vector from binary => text => binary representation.
See previous explanation here[1].

While the pgvector loading code accounts for this, the query code
did not. This is due to the use of a list[float] type, which
the pgvector-python adapter currently doesn't support. However,
this adapter does support direct binary transfer if the data
is represent as a Numpy array[2]. Testing shows that moving to
a direct binary representation does have a significant impact on
query results - my tests are showing a 3x impact --  and provides
a more accurate representation for how this workload would execute.

[1] erikbern/ann-benchmarks#488
[2] https://github.com/pgvector/pgvector-python?tab=readme-ov-file#psycopg-3
  • Loading branch information
jkatz authored and alwayslove2013 committed Jun 28, 2024
1 parent 9f8fbbb commit 5265f2f
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion vectordb_bench/backend/clients/pgvector/pgvector.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,9 +341,10 @@ def search_embedding(
assert self.conn is not None, "Connection is not initialized"
assert self.cursor is not None, "Cursor is not initialized"

q = np.asarray(query)
# TODO add filters support
result = self.cursor.execute(
self._unfiltered_search, (query, k), prepare=True, binary=True
self._unfiltered_search, (q, k), prepare=True, binary=True
)

return [int(i[0]) for i in result.fetchall()]

0 comments on commit 5265f2f

Please sign in to comment.