-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Make dummy ordered read path for single-table vector indexes #22825
Labels
area/docdb
YugabyteDB core features
kind/new-feature
This is a request for a completely new feature
priority/medium
Medium priority issue
Comments
tanujnay112
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Jun 11, 2024
yugabyte-ci
added
kind/new-feature
This is a request for a completely new feature
priority/medium
Medium priority issue
labels
Jun 11, 2024
This was referenced Jun 11, 2024
tanujnay112
added a commit
that referenced
this issue
Sep 5, 2024
Summary: This change lays some foundations for the read path of vector indexes. It relies on a dummy implementation on the DocDB side that materializes all of the tablet's rows in memory before iterating through the closest rows to the query vector in increasing order of (distance from query vector, ybctid). This dummy in-memory implementation logic is added in `vectorann.cc` in the `DummyANN` class. This class implements an interface `VectorANN`, which later more sophisticated ANN index algorithms are expected to satisfy. The `VectorANN` class expects to take in pairs of Vectors and Values. The Values are expected to be RocksDB keys that one can use to read RocksDB rows with. A `VectorANNIterator` can be instantiated with a query vector and is expected to yield rows stored in the VectorANN in increasing order of `(distance from query vector, ybctid)`. The `Iterator` interface has a `ANNPagingState` that can be used to page through this iterator. This paging state will have a pair of (distance from query vector) and (ybctid) to keep track of where we are within the iteration and to enable paged responses. For each result from the vector search, DocDB needs to serialize the distance/score of this vector result from the query vector to allow for merging at the Pggate layer. For this reason, this diff adds a `distances` field to `PgsqlResponsePB`. The ith value of this contains the distance from the query vector to the ith response row. This merging logic on Pggate will be implemented in a follow-up diff. This is why this diff forces DummyANN vector index tables to have just one tablet. Note that `DummyANN` is not MVCC aware. This is not a problem for this change as all visible rows are loaded into a `DummyANN` during read-time. This new logic intends on grabbing a bunch of RocksDB keys from a `VectorANN` and doing point lookups in RocksDB. This is very much like what happened in `ExecuteBatchYbctids` before this change. In order to reuse that logic, this change abstracts the source of ybctids in `ExecuteBatchYbctids` by providing the method with an iterable `KeyProvider` class. This method was also renamed to `ExecuteBatchKeys` as it is possible in the future that Vector index keys might not just be a `ybctid`. Other changes are: - Removed the dependency of yb_vector library on yb_docdb -- the dependency should be the other way around. - Renaming TestThreadHolder to ThreadHolder and moving it to the util library, since it is used for parallelizing index load and validation in hnsw_tool. **Upgrade/Rollback safety:** This adds vector index protobuf fields that should not be used by anybody production customer right now. Jira: DB-11724 Test Plan: Jenkins: test regex: .*TestPgRegressThirdPartyExtensionsPgvector.* Reviewers: sergei, mbautin Reviewed By: sergei, mbautin Subscribers: svc_phabricator, robert, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D35708
jasonyb
pushed a commit
that referenced
this issue
Sep 6, 2024
Summary: c587efd [docs] minor edit (#23796) 31e09f3 [PLAT-15029] yba installer split data and software directory setup f5ba17d [PLAT-15039]: Fix bootstrap on bi-directional xCluster config creation 578248a [#23770] YSQL: Deterministically populate catalog cache in tests with Connection Manager enabled 7b1f22a [#23799] test: Fixed PgTableSizeTest.SharedTableSize test for pg15 788434a [#18771, #21352] docdb: Fix LightweightMessage max size when parsing 02ced43 [#22821] YSQL: Preserve local limit in a multi-page read 50ff737 [#23741] docdb: Fix cloning of colocated databases with only parent table Excluded: 9889df7 [#23706] YSQL: Add table-level catcache Prometheus metrics c770d79 [#23747] MetaCache: Callback should not be called while holding the lock Excluded: 40689bc [#22150] YSQL, QueryDiagnostics: EXPLAIN (ANALYZE, DIST) support for queryDiagnostics 1655e69 [PLAT-15148]: Set XCluster Table Status to DroppedFromTarget if table in replication is dropped from target only 16262f7 [#22519] YSQL: Simplify API of the ExplicitRowLockBuffer class 6614afb [PLAT-14958][PLAT-14959] Make ssh fields optional if skipProvisioning is true 1153b56 [PLAT-14867] Make sure restart alerts don't trigger for small time updates during NTP sync bf1c7bc [PLAT-12226] Add connection pooling status to universe health check 38d8ae8 [PLAT-14805]Support adding EAR configs a180bef [#19134] YSQL, ASH: Setting ASH circular buffer size based on the number of cores 7d8fc76 Adjust heading link (#23807) 4c6cf5a [PLAT-4899]Basic validation of certificates f24eb10 [#23787] YSQL: Avoid executing conn mgr guc variables hooks for parallel workers ee18df8 [PLAT-13921] [K8] [UI] Universe action tasks are disabled after a failed shrink rr node task Excluded: dcf1821 [#23797] YSQL: Modify some tests to run in single connection mode with Connection Manager 0ac22cd [Docs] Remove Drift chat bot (#23802) 0e91003 [#22825] DocDB: Vector Index General Read Path with DummyANN e8f09b5 [PLAT-15175] Make runtime conf for skipping cluster consistency check public ee479ee Versionwarning (#23781) a05c6a3 [DB-12681] yugabyted-ui: Add Voyager commands to different Voyager phases in the UI. cc80d59 [#23777] yugabyted: updating the pg parity testcase to reflect the new gflags enabled for the pg parity feature. Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D37822
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/new-feature
This is a request for a completely new feature
priority/medium
Medium priority issue
Jira Link: DB-11724
Description
We can make a dummy implementation for an ordered read path for a single-table vector index. For now, we can materialize all rows in memory within a tablet and find their TopK vectors. This will be useful to lay some foundational read-path logic before we make vector indexes persistent.
Issue Type
kind/new-feature
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: