Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support pgvector #121432

Open
10 of 31 tasks
jordanlewis opened this issue Mar 31, 2024 · 0 comments
Open
10 of 31 tasks

sql: support pgvector #121432

jordanlewis opened this issue Mar 31, 2024 · 0 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@jordanlewis
Copy link
Member

jordanlewis commented Mar 31, 2024

The pgvector Postgres extension is a popular extension that adds support for vector similarity indexing and search.

This issue tracks adding support to CockroachDB.

Types

  • vector type
  • bit type
  • halfvec type
  • sparsevec type

Operators

  • + operator
  • - operator
  • * operator
  • || operator (concatenate)
  • <-> operator (l2 distance)
  • <#> operator (negative inner product)
  • <=> operator (cosine distance)
  • <+> operator (taxicab/l1 distance)
  • <~> operator (Hamming distance for bit type)
  • <%> operator (Jaccard distance for bit type)

Builtin functions

  • cosine_distance function
  • inner_product function
  • l2_distance function
  • l1_distance function
  • l2_normalize function
  • vector_dims function
  • vector_norm function
  • subvector function
  • binary_quantize function
  • hamming_distance function
  • jaccard_distance function
  • avg aggregate function
  • sum aggregate function

Indexing

  • ivfflat index
  • hnsw index
  • Acceleration of <->, <#> and <=> operators depending on which opclass is chosen at index creation time (vector_l2_ops, vector_ip_ops, vector_cosine_ops)

Configuration options

  • ivfflat.probes - defines # of centroid lists to search during lookup

Jira issue: CRDB-37252

@jordanlewis jordanlewis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 31, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries May 17, 2024
@jordanlewis jordanlewis moved this from Triage to Backlog in SQL Queries May 17, 2024
@yuzefovich yuzefovich added the T-sql-queries SQL Queries Team label May 17, 2024
craig bot pushed a commit that referenced this issue Jul 6, 2024
124292: sql: implement pgvector datatype and evaluation r=jordanlewis a=jordanlewis

This commit adds the pgvector datatype and associated evaluation operators and functions. It doesn't include index acceleration.

Functionality included:

- `CREATE EXTENSION vector`
- `vector` datatype with optional length, storage and retrieval in non-indexed table columns
- Equality and inequality operators
- `<->` operator - L2 distance
- `<#>` operator - (negative) inner product
- `<=>` operator - cosine distance
- `l1_distance` builtin
- `l2_distance` builtin
- `cosine_distance` builtin
- `inner_product` builtin
- `vector_dims` builtin
- `vector_norm` builtin

Updates #121432
Epic: None

Release note (sql change): implement pgvector encoding, decoding, and operators, without index acceleration.

Co-authored-by: Jordan Lewis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants