Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Create our own experimental HNSW implementation #23376

Open
1 task done
mbautin opened this issue Aug 3, 2024 · 1 comment
Open
1 task done

[DocDB] Create our own experimental HNSW implementation #23376

mbautin opened this issue Aug 3, 2024 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@mbautin
Copy link
Contributor

mbautin commented Aug 3, 2024

Jira Link: DB-12298

Description

Create our own experimental HNSW implementation.

  • Initial implementation will index vectors in memory.
  • DocDB integration can be added gradually.
  • Parts of the experimental implementation can be moved to the production implementation.
  • Command-line tools and benchmarks, with the ability to tune parameters, can be added on top of the experimental implementation.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@mbautin mbautin added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Aug 3, 2024
@mbautin mbautin self-assigned this Aug 3, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Aug 3, 2024
mbautin added a commit to mbautin/yugabyte-db that referenced this issue Aug 17, 2024
Summary:
Some utilities needed for the HNSW vector index implementation and benchmarking.

Adding a new directory, "vector", and the new library yb_vector. The namespace is called vectorindex.

benchmark_data.{h,cc} -- implements readers for the .fvec file format (see http://corpus-texmex.irisa.fr/).

distance.{h,cc} -- functions for distance calculation, currently for only for L2 squared and cosine.

vector_index_if.h -- intended to contain high-level interfaces exposed by a vector index such as HNSW. Currently only the reader API is included, which will be needed by the recall computation utility.

hnsw_util.{h,cc} -- various types and functions needed in the HNSW implementation: level selection, and min/max priority queues for (vector, distance) pairs.

Test Plan: Jenkins

Reviewers: sergei, aleksandr.ponomarenko

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37340
mbautin added a commit that referenced this issue Aug 17, 2024
Summary:
Some utilities needed for the HNSW vector index implementation and benchmarking.

Adding a new directory, "vector", and the new library yb_vector. The namespace is called vectorindex.

benchmark_data.{h,cc} -- implements readers for the .fvec file format (see http://corpus-texmex.irisa.fr/).

distance.{h,cc} -- functions for distance calculation, currently for only for L2 squared and cosine.

vector_index_if.h -- intended to contain high-level interfaces exposed by a vector index such as HNSW. Currently only the reader API is included, which will be needed by the recall computation utility.

hnsw_util.{h,cc} -- various types and functions needed in the HNSW implementation: level selection, and min/max priority queues for (vector, distance) pairs.

The vector_types.h header in the common directory is needed by the dockv code, so it can't be in the vector directory. The yb_dockv library is not allowed to depend on the yb_vector library.
Jira: DB-12298

Test Plan: Jenkins

Reviewers: sergei, aleksandr.ponomarenko

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37340
jasonyb pushed a commit that referenced this issue Aug 20, 2024
Summary:
 b8cd4da Fix broken header links in Explore section (#23522)
 cc43b2e [DEVOPS-3048] test automation: Backup unit tests use YBC by default
 43652cc [PLAT-13836] Upgrading python setuptools
 71f5eeb [doc][yba] Note on deleting KMS config (#23527)
 a7061c6 [PLAT-14912] Adding replicated migrate guardrails for subdirectories
 9b21783 [#23286] xCluster: Speedup setup for large table counts
 6d4d8f6 [DEVOPS-3048] test automation: Fix ybc extraction
 15fd362 [PLAT-14951] Add positive interger error message to pitr param step
 c8cbcbf [#23243] docdb: Fix tablet bootstrap stuck when replaying truncate operation
 5f286f5 [PLAT-14976] Make node agent silent parameters more obvious by showing in usage
 68ac66e [#23492] DocDB: Upgrade and Rollback tests
 Excluded: 516ead0 [#23304] xCluster: fix ysql_dump/Postgres so pg_class OIDs are preserved
 Excluded: 16941de [#23304] fix Postgres so old dumps can be loaded that do not have pg_class OIDs
 404075d [#23376] DocDB: Utilities needed for HNSW
 875ccc1 [PLAT-14077] Update /get endpoint to support db scoped replication tables + metrics
 71610b5 [#23536] fix test_macros.h to avoid problems with complaints about capturing variables
 027f0e1 [#23493] xCluster: implement function for scanning sequences_data table
 9103885 [#22462] DocDB: Enable pg_cron tests in TSAN
 7e1f72c [PLAT-14963]Clicking use same/diff replica while DR repair is throwing a permission error for a superadmin user
 b983d56 [PLAT-14595] Ability to change communication ports via edit universe
 a6ee050 [PLAT-13285] Make cloud provider edit retryable

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37382
@ZhenNan2016
Copy link

ZhenNan2016 commented Aug 27, 2024

Jira Link: DB-12298

Description

Create our own experimental HNSW implementation.

  • Initial implementation will index vectors in memory.
  • DocDB integration can be added gradually.
  • Parts of the experimental implementation can be moved to the production implementation.
  • Command-line tools and benchmarks, with the ability to tune parameters, can be added on top of the experimental implementation.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

Hi, @mbautin
May I ask when our ybdb, can fully support the hnsw index?
Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants