pgvector dim size #513

nathan-vo810 · 2024-09-30T20:32:09Z

The limit for the vector type is 16,000 dimensions (docs). 2,000 is the limit for indexing it (you'll see an error if you try).

AruneshSingh · 2024-10-01T09:55:29Z

Can you create a PR changing it in this file. Updating the vector_dims property, adding the doc as the source URL. And then @dhruv-anand-aintech can review it.

dhruv-anand-aintech · 2024-10-01T09:58:07Z

Hi @nathan-vo810,

Thanks for updating here.

2,000 is the limit for indexing it

I think it would be reasonable to keep the primary field as 2000 then. A viewer of this table should not have to try vector search with a 2k+ vector on pgvector and then see the error after we listed 16k.

At max, it can be added as a note in the comment section of the cell.

nathan-vo810 · 2024-10-01T17:53:56Z

As a user, when I first saw the vector size limit, I assumed it wouldn’t be possible to store embeddings larger than 2000 dimensions. However, it turns out it can, and the vector search (nearest neighbor search) works just fine.

The only aspect affected by the 2000-dimension limit is indexing.

In any case, it’s up to you to decide how to handle this. Thanks for providing such a comprehensive tool for comparison!

svonava-superlinked · 2024-10-01T19:00:45Z

@nathan-vo810 so you are saying that full-scan search works with longer vectors, just the approximate nearest neighbor search doesn't?

I'd be curious what is your use-case for larger vectors and what does full-scan do to your latency, if you are open to share!

nathan-vo810 · 2024-10-01T19:56:09Z

In my current use case, I’m using Langchain with pgvector, specifically with the text-3-embedding-large model, which has a vector dimension of 3072. I’m able to perform similarity searches on the database without any issues. The 2,000 limit only applies to indexing, so it doesn't prevent vector searches. What I'm trying to clarify is that the vector dim column in the table should have a dimension of 16,000 to match the vector type in pgvector.

…

On Tue, Oct 1, 2024 at 9:01 PM Daniel Svonava ***@***.***> wrote: @nathan-vo810 <https://github.com/nathan-vo810> so you are saying that full-scan search works with longer vectors, just the approximate nearest neighbor search doesn't? I'd be curious what is your use-case for larger vectors and what does full-scan do to your latency, if you are open to share! — Reply to this email directly, view it on GitHub <#513 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLF2AVSKMUUSL33N7WNZCDZZLWPHAVCNFSM6AAAAABPEFJBXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBWG42DIMZVGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

svonava-superlinked · 2024-10-01T19:57:40Z

Got it! How many vectors do you have and what search latency do you observe?

nathan-vo810 · 2024-10-01T20:01:06Z

I don’t actually measure the latency, but with my current DB of 15,000 records, I think it’s less than 1s.

…

On Tue, Oct 1, 2024 at 9:58 PM Daniel Svonava ***@***.***> wrote: Got it! How many vectors do you have and what search latency do you observe? — Reply to this email directly, view it on GitHub <#513 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLF2AVUZABSU425OMUR6IDZZL5EVAVCNFSM6AAAAABPEFJBXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBWHE2DENRYGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

svonava-superlinked · 2024-10-01T20:03:34Z

@nathan-vo810 got it - and probably quite low query-per-second? as in, sub 1 QPS?

@dhruv-anand-aintech I think the proper way to handle this would be to have dim limit per indexing algorithm (since now we have a list of supported algos), but that sounds like a nightmare to maintain..

alternatively, we add a comment for the dims column to clarify that this is for the ANN-type indexes.

What do you think?

dhruv-anand-aintech · 2024-10-01T20:15:16Z

Yeah I would prefer the latter suggestion (clarify further in column description), as this kind of case is not common.

nathan-vo810 assigned AruneshSingh Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgvector dim size #513

pgvector dim size #513

nathan-vo810 commented Sep 30, 2024

AruneshSingh commented Oct 1, 2024

dhruv-anand-aintech commented Oct 1, 2024 •

edited

Loading

nathan-vo810 commented Oct 1, 2024 •

edited

Loading

svonava-superlinked commented Oct 1, 2024

nathan-vo810 commented Oct 1, 2024 via email

svonava-superlinked commented Oct 1, 2024

nathan-vo810 commented Oct 1, 2024 via email

svonava-superlinked commented Oct 1, 2024

dhruv-anand-aintech commented Oct 1, 2024

pgvector dim size #513

pgvector dim size #513

Comments

nathan-vo810 commented Sep 30, 2024

AruneshSingh commented Oct 1, 2024

dhruv-anand-aintech commented Oct 1, 2024 • edited Loading

nathan-vo810 commented Oct 1, 2024 • edited Loading

svonava-superlinked commented Oct 1, 2024

nathan-vo810 commented Oct 1, 2024 via email

svonava-superlinked commented Oct 1, 2024

nathan-vo810 commented Oct 1, 2024 via email

svonava-superlinked commented Oct 1, 2024

dhruv-anand-aintech commented Oct 1, 2024

dhruv-anand-aintech commented Oct 1, 2024 •

edited

Loading

nathan-vo810 commented Oct 1, 2024 •

edited

Loading