Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] A templatized framework to support different coordinate types #23613

Closed
1 task done
mbautin opened this issue Aug 25, 2024 · 1 comment
Closed
1 task done

[DocDB] A templatized framework to support different coordinate types #23613

mbautin opened this issue Aug 25, 2024 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@mbautin
Copy link
Contributor

mbautin commented Aug 25, 2024

Jira Link: DB-12525

Description

We need C++ infrastructure (macros/templates) to support different vector coordinate types.

This is needed to support various SIFT datasets. http://corpus-texmex.irisa.fr/ (using bvecs in SIFT1B and fvecs elsewhere).

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@mbautin mbautin added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Aug 25, 2024
@mbautin mbautin self-assigned this Aug 25, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Aug 25, 2024
mbautin added a commit that referenced this issue Aug 26, 2024
…, SIFT 1B hnsw_tool support

Summary:
A framework for supporting different vector coordinate types. Many vector-related classes have to become templates parameterized with the coordinate type. In this diff, they are actually parameterized with the vector type, not the coordinate type, where the vector type must satisfy the IndexableVectorType concept, because we need to work with vectors, not with coordinates, most of the time.

Usearch currently does not support u8 (unsigned byte) data type directly, so we are still converting vectors to float before invoking usearch. We are still able to run some tests with the SIFT 1B dataset with this setup.

We are also doing significant refactoring in hnsw_tool as part of making sure we can run tests on the SIFT 1B dataset.
Jira: DB-12525

Test Plan:
Jenkins

Manual testing:

  bin/hnsw_tool benchmark \
    --build_vecs_path sift1b/bigann_base.bvecs \
    --query_vecs_path sift1b/bigann_query.bvecs \
    --ground_truth_path sift1b/gnd/idx_5M.ivecs \
    --num_vectors_to_insert=5000000 \
    --k=1000 \
    --num_indexing_threads=15 \
    --num_query_threads=80

Reviewers: sergei, aleksandr.ponomarenko

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37459
jasonyb pushed a commit that referenced this issue Aug 28, 2024
Summary:
 0c5102e [doc][yba] xCluster Replication update (#23417)
 Excluded: a9466df [#22325] YSQL, QueryDiagnostics: Adding a catalog view for queryDiagnostics
 b9597b3 [#23612] YSQL: Fix java unit test misuse of == for string comparison
 bb72624 [#23613] DocDB: Framework for different vector index coordinate types, SIFT 1B hnsw_tool support
 12032f3 [PLAT-12510] Add option to use UTC for cron expression backup schedule time calculation
 Excluded: 141703a [#22533] YSQL: fix setrefs for index scan
 1bb8c62 [#23543] docdb: Update tablegroup manager in RepartitionTable
 1e28b8a [#23518] Do not include full snapshot info for list snapshot schedules RPC.
 e98c383 [PLAT-15048] Fix auto-master failover local test
 f606132 [doc][yba] Backup clarification (#23611)
 e80d60f [PLAT-14973] Precheck for node agent install to verify that we have correct permissions to execute in the installer directory
 5230f5a [#23630]yugabyted: Modiying the APIs required for the new Migrate Schema Page.
 0a310d3 [PLAT-15042] Add default pitr retention period
 aa15c81 [PLAT-12435] Adding a precheck for libselinux bindings for system python3
 525672e [#23632] DocDB: Unify GetFlagInfos and remove duplicate code
 4ab5ca0 [#23601] YSQL: Fix TestPreparedStatements tests with connection manager enabled
 57a7690 [PLAT-12222][PLAT-15036][PLAT-14333] Add connection pooling support for create universe API
 3407682 [PLAT-10119]: Do not allow back-tick for DB password in YBA

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Differential Revision: https://phorge.dev.yugabyte.com/D37578
@mbautin mbautin closed this as completed Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

3 participants
@mbautin @yugabyte-ci and others