Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgvector dim size #513

Open
nathan-vo810 opened this issue Sep 30, 2024 · 9 comments
Open

pgvector dim size #513

nathan-vo810 opened this issue Sep 30, 2024 · 9 comments
Assignees

Comments

@nathan-vo810
Copy link

The limit for the vector type is 16,000 dimensions (docs). 2,000 is the limit for indexing it (you'll see an error if you try).

@AruneshSingh
Copy link
Collaborator

Can you create a PR changing it in this file. Updating the vector_dims property, adding the doc as the source URL. And then @dhruv-anand-aintech can review it.

@dhruv-anand-aintech
Copy link
Contributor

dhruv-anand-aintech commented Oct 1, 2024

Hi @nathan-vo810,

Thanks for updating here.

2,000 is the limit for indexing it

I think it would be reasonable to keep the primary field as 2000 then. A viewer of this table should not have to try vector search with a 2k+ vector on pgvector and then see the error after we listed 16k.

At max, it can be added as a note in the comment section of the cell.

@nathan-vo810
Copy link
Author

nathan-vo810 commented Oct 1, 2024

As a user, when I first saw the vector size limit, I assumed it wouldn’t be possible to store embeddings larger than 2000 dimensions. However, it turns out it can, and the vector search (nearest neighbor search) works just fine.

The only aspect affected by the 2000-dimension limit is indexing.

In any case, it’s up to you to decide how to handle this. Thanks for providing such a comprehensive tool for comparison!

@svonava-superlinked
Copy link
Contributor

@nathan-vo810 so you are saying that full-scan search works with longer vectors, just the approximate nearest neighbor search doesn't?

I'd be curious what is your use-case for larger vectors and what does full-scan do to your latency, if you are open to share!

@nathan-vo810
Copy link
Author

nathan-vo810 commented Oct 1, 2024 via email

@svonava-superlinked
Copy link
Contributor

Got it! How many vectors do you have and what search latency do you observe?

@nathan-vo810
Copy link
Author

nathan-vo810 commented Oct 1, 2024 via email

@svonava-superlinked
Copy link
Contributor

@nathan-vo810 got it - and probably quite low query-per-second? as in, sub 1 QPS?

@dhruv-anand-aintech I think the proper way to handle this would be to have dim limit per indexing algorithm (since now we have a list of supported algos), but that sounds like a nightmare to maintain..

alternatively, we add a comment for the dims column to clarify that this is for the ANN-type indexes.

What do you think?

@dhruv-anand-aintech
Copy link
Contributor

Yeah I would prefer the latter suggestion (clarify further in column description), as this kind of case is not common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants