Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dense vector/embeddings dimension size #92458

Closed
aykutfirat opened this issue Dec 20, 2022 · 9 comments
Closed

dense vector/embeddings dimension size #92458

aykutfirat opened this issue Dec 20, 2022 · 9 comments
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@aykutfirat
Copy link

Description

The latest Open AI embeddings (text-embedding-ada-002) has size 1536. Open AI embeddings are perceived as state of the art and offered at a very good price.

Can you increase the dense vector size so that we can use these kind of models?

Thank you!

@aykutfirat aykutfirat added >enhancement needs:triage Requires assignment of a team area label labels Dec 20, 2022
@DJRickyB DJRickyB added :Search Relevance/Vectors Vector search and removed needs:triage Requires assignment of a team area label labels Dec 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Dec 22, 2022
@carloslfu
Copy link

carloslfu commented Dec 30, 2022

I just got into this limit. For people facing the same limitation AWS OpenSearch supports 10.000 dimensions. Love Elasticsearch Cloud because of its UX but going to AWS for now because my use case, the same as @aykutfirat, it requires 1536 floats.

@carloslfu
Copy link

carloslfu commented Dec 31, 2022

The limit is because Elasticsearch uses the Lucene implementation of vector values:

  /** The maximum length of a vector */
  public static final int MAX_DIMENSIONS = 1024;

Code here: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/VectorValues.java#L33

There is an ongoing discussion on apache/lucene#874 and apache/lucene#11507, but the maintainers look very reluctant to that change.

Elasticsearch can do what OpenSearch did, they implemented multiple engines and let people decide, docs here. This way OpenSearch supports up to 10000 dimensions with the Faiss or nmslib engines, so people can still use the 1024-dims-limited Lucene engine and make their own tradeoffs.

@mayya-sharipova
Copy link
Contributor

Addressed by #95257

@nik13
Copy link

nik13 commented May 12, 2023

How to use 2048? Do I need to update my elasticsearch version?

@benwtrent
Copy link
Member

@nik13 when 8.8 is released, you can specify your dimensions in the mapping up to the limit of 2048.

So, upgrade to 8.8 when it is released.

@The-Redhat
Copy link

@benwtrent sorry to bother you. Do you know the timeline for the 8.8 release?

@maxwill9457
Copy link

Hi, I understand that the 2048 dimensions of dense_vector is available since 8.8 release.
However, I also found that this change still marked as [preview] in the document.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#index-vectors-knn-search

Is there anyone can tell me when or how the track dwon the progess to GA of this function?

@benwtrent
Copy link
Member

@maxwill9457 #96850

8.10

@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

10 participants