Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vecstore: implement partition manipulation functions #137319

Merged
merged 3 commits into from
Jan 14, 2025

Conversation

mw5h
Copy link
Contributor

@mw5h mw5h commented Dec 12, 2024

Stub out the persistent storage vecstore implementation. Implement the methods for partition creation, retrieval and deletion. Add a simple test to ensure that these functions work properly.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Mostly just nits, and one concern about a possible bug.

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)


pkg/sql/vecindex/vecstore/persistent_store.go line 17 at r1 (raw file):

	db        *kv.DB // Needed for index maintenance functions
	quantizer quantize.Quantizer
	prefix    []byte

[nit] Can we make this roachpb.Key?


pkg/sql/vecindex/vecstore/persistent_store.go line 19 at r1 (raw file):

	prefix    []byte
	dims      int
	seed      int64

[nit] doesn't look like dims or seed are used here, and it seems like seed should never be needed. dims could maybe be accessed through quantizer, though I'm fine with keeping it here if we do end up needing it.


pkg/sql/vecindex/vecstore/persistent_store_test.go line 40 at r1 (raw file):

	store := NewPersistentStore(kvDB, quantizer, prefix, 2, 0)

	t.Run("insert a root partition into the store and read it back", func(t *testing.T) {

Could we use this to verify the partition is correct?

func testingAssertPartitionsEqual(t *testing.T, l, r *Partition) {


pkg/sql/vecindex/vecstore/persistent_txn.go line 76 at r1 (raw file):

	// partition metadata but is part of the data that all implementations
	// QuantizedVectorSet need.
	vectorSet := psTxn.store.quantizer.Quantize(ctx, &vector.Set{})

[nit] It seems strange to quantize the empty set - can we instead have var vectorSet QuantizedVectorSet and then initialize it with MakeRaBitQCodeSet and &UnQuantizedVectorSet{Centroid: centroid} respectively in the switch?

Alternatively, maybe it's time to add some methods to either the QuantizedVectorSet interface, or Partition, to avoid having to switch on the quantization type all the time.


pkg/sql/vecindex/vecstore/persistent_txn.go line 142 at r1 (raw file):

	// Store the length of the partition's metadata key so that we can truncate back
	// to that length instead of building a new key from scratch.

[nit] This comment looks stale.


pkg/sql/vecindex/vecstore/persistent_txn.go line 166 at r1 (raw file):

	// batch? Could this be batched with the insertPartition? Docs say operations
	// happen out of order, but maybe there's something to keep a single batch from
	// self-interfering?

Which docs are you referring to? I'd expect write batches to evaluate requests in order. Separate batches seems fine for now, though.

// Sequence numbers serve a few roles in the transaction model:
//
// 1. they are used to enforce an ordering between read and write operations in a
// single transaction that go to the same key. Each read request that travels
// through the interceptor is assigned the sequence number of the most recent
// write. Each write request that travels through the interceptor is assigned
// a sequence number larger than any previously allocated.


pkg/sql/vecindex/vecstore/persistent_txn.go line 180 at r1 (raw file):

	return partitionID, psTxn.insertPartition(ctx, partitionID, partition,
		func(b *kv.Batch, k roachpb.Key, v []byte) {
			b.CPutAllowingIfNotExists(k, v, nil /* expValue */)

[nit] CPut has the same behavior for nil expValue, right?


pkg/sql/vecindex/vecstore/persistent_txn.go line 215 at r1 (raw file):

	partitionCounts []int,
) (Level, error) {
	panic("SearchPartitions() unimplemented")

I think we could implement this relatively easily by calling GetPartition


pkg/sql/vecindex/vecstore/persistent_txn.go line 226 at r1 (raw file):

// the next partition's metadata.
func (psTxn *PersistentStoreTxn) encodePartitionKey(partitionKey PartitionKey) roachpb.Key {
	psTxn.keyBuffer = psTxn.keyBuffer[:psTxn.keyPrefixLen]

I'm not certain it's safe to use a buffer like this. I think it's possible for the KV layer to hold on to keys for the duration of the txn (to be used for things like read refreshes and async consensus for writes?). I ran into a problem like that while working on encoding the keys for vectorized index joins a while ago.


pkg/sql/vecindex/vecstore/persistent_txn.go line 227 at r1 (raw file):

func (psTxn *PersistentStoreTxn) encodePartitionKey(partitionKey PartitionKey) roachpb.Key {
	psTxn.keyBuffer = psTxn.keyBuffer[:psTxn.keyPrefixLen]
	psTxn.keyBuffer = encoding.EncodeUvarintAscending(psTxn.keyBuffer, uint64(partitionKey))

[nit] Let's add an EncodePartitionKey helper function as well.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mw5h)


pkg/sql/vecindex/vecstore/persistent_store.go line 19 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] doesn't look like dims or seed are used here, and it seems like seed should never be needed. dims could maybe be accessed through quantizer, though I'm fine with keeping it here if we do end up needing it.

I think we should drop the dims, since there's no situation where it can ever be different than Quantizer.GetOriginalDims / Quantizer.GetRandomDims. Having it separate could cause confusion/bugs over which it is and whether it can be different.


pkg/sql/vecindex/vecstore/persistent_txn.go line 76 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] It seems strange to quantize the empty set - can we instead have var vectorSet QuantizedVectorSet and then initialize it with MakeRaBitQCodeSet and &UnQuantizedVectorSet{Centroid: centroid} respectively in the switch?

Alternatively, maybe it's time to add some methods to either the QuantizedVectorSet interface, or Partition, to avoid having to switch on the quantization type all the time.

I think we have a few options to discover the right quantization type:

  1. We could store it in the partition metadata.
  2. We could add methods to Quantizer or QuantizedSet that decode/encode.
  3. We could switch on the type of psTxn.store.quantizer or the set it produces like we do here.

The concern with #1 is that there might be cases where we'd have to fetch the metadata just to know how to decode the partition data - I'm not sure it's worth the risk.

The concern with #2 is that it seems like encoding/decoding is a Store concern, not a Quantizer concern. Maybe the in-memory store wants to encode/decode using protobuf and the persistent store wants to use a SQL-like encoding, which seems like the right separation of concerns.

For #3, I think we should probably switch on the type psTxn.store.quantizer rather than creating the empty set. That way, we can preallocate the buffers in UnQuantizedSet/RaBitQuantizedSet so that they don't need to be resized by the decode methods. We could add a helper method to UnQuantizedVectorSet and RaBitQuantizedVectorSet to do this, e.g. NewEmptyRaBitQuantizedVectorSet(capacity int).


pkg/sql/vecindex/vecstore/persistent_txn.go line 180 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] CPut has the same behavior for nil expValue, right?

In both cases, we're expecting there to be no existing rows (b/c we deleted the root partition and b/c we generated a unique key for other partitions). Is expValue intended to assert that there really is no existing partition in this case? I'd be a bit concerned that CPut is quite a bit more expensive than Put - maybe worth asking the KV team in case it's not worth the cost of the assertion. Regardless, I'd expect to use Put in both cases or CPut in both cases, since that would simplify psTxn.insertPartition to not require a higher-order function.


pkg/sql/vecindex/vecstore/persistent_txn.go line 226 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I'm not certain it's safe to use a buffer like this. I think it's possible for the KV layer to hold on to keys for the duration of the txn (to be used for things like read refreshes and async consensus for writes?). I ran into a problem like that while working on encoding the keys for vectorized index joins a while ago.

If you do switch to always allocate here, you should add a separate helper function that does reuse keyBuffer for the PrefixEnd case. That function allocates a new buffer anyways, so no reason to allocate twice.


pkg/sql/vecindex/vecstore/encoding.go line 55 at r1 (raw file):

		return append(appendTo, key.PrimaryKey...)
	}
	return encoding.EncodeUvarintAscending(appendTo, uint64(key.PartitionKey))

How come you switched to the varint encoding? I seem to remember @DrewKimball believing that the fixed length encoding was preferable for this use case, maybe he can describe his reasoning.


pkg/sql/vecindex/vecstore/persistent_txn.go line 58 at r1 (raw file):

	endKey := startKey.PrefixEnd()

	b.Scan(startKey, endKey)

I think we'll need to lock the keyspan that we read here, e.g. using equivalent of SELECT FOR UPDATE. GetPartition is only used by split/merge, and we want any concurrent insert/delete ops on this partition to block until commit.


pkg/sql/vecindex/vecstore/persistent_txn.go line 76 at r1 (raw file):

	// partition metadata but is part of the data that all implementations
	// QuantizedVectorSet need.
	vectorSet := psTxn.store.quantizer.Quantize(ctx, &vector.Set{})

For the root partition, we always want to use UnQuantizer. We can check for that case using partitionKey == RootKey. All other partitions should use the deserialization rules for psTxn.store.quantizer.


pkg/sql/vecindex/vecstore/persistent_txn.go line 177 at r1 (raw file):

	ctx context.Context, partition *Partition,
) (PartitionKey, error) {
	partitionID := PartitionKey(builtins.GenerateUniqueInt(builtins.ProcessUniqueID(psTxn.store.db.Context().NodeID.SQLInstanceID())))

NIT: line longer than 100 chars, also would be good to break up expression for more clarity.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)


pkg/sql/vecindex/vecstore/encoding.go line 55 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

How come you switched to the varint encoding? I seem to remember @DrewKimball believing that the fixed length encoding was preferable for this use case, maybe he can describe his reasoning.

It seems fine in this case because this is the last thing encoded in the key, so we can know the offset of each encoded element. The PK encoding for leaf partitions will be variable length, anyway.


pkg/sql/vecindex/vecstore/persistent_txn.go line 226 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

If you do switch to always allocate here, you should add a separate helper function that does reuse keyBuffer for the PrefixEnd case. That function allocates a new buffer anyways, so no reason to allocate twice.

What do you mean by this? Where exactly would the buffer be used?

@mw5h mw5h force-pushed the vector-partition-ops branch from da2a5c5 to 4b65130 Compare December 13, 2024 22:45
Copy link
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @DrewKimball)


pkg/sql/vecindex/vecstore/encoding.go line 55 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

How come you switched to the varint encoding? I seem to remember @DrewKimball believing that the fixed length encoding was preferable for this use case, maybe he can describe his reasoning.

I did this while debugging an unrelated issue. The fixed with encoding was being displayed as a series of NULLs and a single byte of data. It made me wonder if the KV layer didn't like the input, since the rest of the key prefix is built using varint encoding. My problem was elsewhere, but I left this change in because it seemed more in line with what was happening elsewhere.


pkg/sql/vecindex/vecstore/persistent_store.go line 17 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Can we make this roachpb.Key?

Done.


pkg/sql/vecindex/vecstore/persistent_store.go line 19 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think we should drop the dims, since there's no situation where it can ever be different than Quantizer.GetOriginalDims / Quantizer.GetRandomDims. Having it separate could cause confusion/bugs over which it is and whether it can be different.

I agree.


pkg/sql/vecindex/vecstore/persistent_txn.go line 58 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think we'll need to lock the keyspan that we read here, e.g. using equivalent of SELECT FOR UPDATE. GetPartition is only used by split/merge, and we want any concurrent insert/delete ops on this partition to block until commit.

Sure, that makes sense.


pkg/sql/vecindex/vecstore/persistent_txn.go line 76 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think we have a few options to discover the right quantization type:

  1. We could store it in the partition metadata.
  2. We could add methods to Quantizer or QuantizedSet that decode/encode.
  3. We could switch on the type of psTxn.store.quantizer or the set it produces like we do here.

The concern with #1 is that there might be cases where we'd have to fetch the metadata just to know how to decode the partition data - I'm not sure it's worth the risk.

The concern with #2 is that it seems like encoding/decoding is a Store concern, not a Quantizer concern. Maybe the in-memory store wants to encode/decode using protobuf and the persistent store wants to use a SQL-like encoding, which seems like the right separation of concerns.

For #3, I think we should probably switch on the type psTxn.store.quantizer rather than creating the empty set. That way, we can preallocate the buffers in UnQuantizedSet/RaBitQuantizedSet so that they don't need to be resized by the decode methods. We could add a helper method to UnQuantizedVectorSet and RaBitQuantizedVectorSet to do this, e.g. NewEmptyRaBitQuantizedVectorSet(capacity int).

The next version adds a method to the Quantizer interface to produce a pre-allocated but empty QuantizedSet. I still switch on the quantized set type, but this feels quite a bit cleaner.


pkg/sql/vecindex/vecstore/persistent_txn.go line 76 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

For the root partition, we always want to use UnQuantizer. We can check for that case using partitionKey == RootKey. All other partitions should use the deserialization rules for psTxn.store.quantizer.

Oh, that's important!


pkg/sql/vecindex/vecstore/persistent_txn.go line 142 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] This comment looks stale.

Done.


pkg/sql/vecindex/vecstore/persistent_txn.go line 166 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Which docs are you referring to? I'd expect write batches to evaluate requests in order. Separate batches seems fine for now, though.

// Sequence numbers serve a few roles in the transaction model:
//
// 1. they are used to enforce an ordering between read and write operations in a
// single transaction that go to the same key. Each read request that travels
// through the interceptor is assigned the sequence number of the most recent
// write. Each write request that travels through the interceptor is assigned
// a sequence number larger than any previously allocated.

Done.


pkg/sql/vecindex/vecstore/persistent_txn.go line 177 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: line longer than 100 chars, also would be good to break up expression for more clarity.

Fixed.


pkg/sql/vecindex/vecstore/persistent_txn.go line 180 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

In both cases, we're expecting there to be no existing rows (b/c we deleted the root partition and b/c we generated a unique key for other partitions). Is expValue intended to assert that there really is no existing partition in this case? I'd be a bit concerned that CPut is quite a bit more expensive than Put - maybe worth asking the KV team in case it's not worth the cost of the assertion. Regardless, I'd expect to use Put in both cases or CPut in both cases, since that would simplify psTxn.insertPartition to not require a higher-order function.

The intention was to ensure that we're not overwriting data, yes. I can lose this in favor of Put().


pkg/sql/vecindex/vecstore/persistent_txn.go line 226 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

If you do switch to always allocate here, you should add a separate helper function that does reuse keyBuffer for the PrefixEnd case. That function allocates a new buffer anyways, so no reason to allocate twice.

Okay, thanks for the info.


pkg/sql/vecindex/vecstore/persistent_txn.go line 227 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Let's add an EncodePartitionKey helper function as well.

Done.


pkg/sql/vecindex/vecstore/persistent_store_test.go line 40 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Could we use this to verify the partition is correct?

func testingAssertPartitionsEqual(t *testing.T, l, r *Partition) {

I bet we could.

@mw5h mw5h force-pushed the vector-partition-ops branch 2 times, most recently from 58cbd1c to d1cff40 Compare December 14, 2024 23:42
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mw5h)


pkg/sql/vecindex/quantize/unquantizer.go line 80 at r7 (raw file):

// NewQuantizedVectorSet implements the Quantizer interface
func (q *unQuantizer) NewQuantizedVectorSet(size int) QuantizedVectorSet {
	dataBuffer := make([]float32, 0, size*q.GetOriginalDims())

Same here, for consistency, even though it's the same value.


pkg/sql/vecindex/vecstore/persistent_txn.go line 20 at r7 (raw file):

)

// The PersistentStoreTxn provides a context to make transactional changes to a

NIT: Linter might not like that comment does not begin with name of the struct - by convention, should drop "The".


pkg/sql/vecindex/vecstore/persistent_txn.go line 33 at r7 (raw file):

var _ Txn = (*PersistentStoreTxn)(nil)

func NewPersistentStoreTxn(store *PersistentStore, kv *kv.Txn) *PersistentStoreTxn {

NIT: comment for exported func.


pkg/sql/vecindex/vecstore/persistent_txn.go line 38 at r7 (raw file):

		store: store,
	}
	// TODO (mw5h): This doesn't take into account session variables that control

You should maybe check with the KV team on whether we need GuaranteedDurability for non-Serializable isolation levels for this scenario.


pkg/sql/vecindex/vecstore/persistent_txn.go line 72 at r7 (raw file):

		return nil, ErrPartitionNotFound
	}
	// Partition metadata is stored in /Prefix/PartitionID, with vector data following in /Prefix/PartitionID/ChildKey

NIT: break comment lines at 80 chars, above as well.


pkg/sql/vecindex/vecstore/persistent_txn.go line 233 at r7 (raw file):

// the next partition's metadata.
func (psTxn *PersistentStoreTxn) encodePartitionKey(partitionKey PartitionKey) roachpb.Key {
	keyBuffer := append(roachpb.Key{}, psTxn.store.prefix...)

This looks like it might have to reallocate the underlying array buffer in EncodePartitionKey - you should preallocate enough bytes to prevent that.


pkg/sql/vecindex/vecstore/persistent_store.go line 24 at r7 (raw file):

var _ Store = (*PersistentStore)(nil)

func NewPersistentStore(

NIT: should have comments on exported funcs.


pkg/sql/vecindex/vecstore/persistent_store.go line 39 at r7 (raw file):

}

// Begin() is part of the vecstore.Store interface. Begin() creates a new KV

NIT: style-wise, we don't add () parens in comments.


pkg/sql/vecindex/quantize/rabitq.go line 147 at r7 (raw file):

// NewQuantizedVectorSet implements the Quantizer interface
func (q *raBitQuantizer) NewQuantizedVectorSet(size int) QuantizedVectorSet {
	dataBuffer := make([]uint64, 0, size*RaBitQCodeSetWidth(q.GetOriginalDims()))

This should use GetRandomDims, because the vector set is populated after the randomization step.

@mw5h mw5h force-pushed the vector-partition-ops branch from d1cff40 to b9ba263 Compare December 18, 2024 19:46
Copy link
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @DrewKimball)


pkg/sql/vecindex/quantize/rabitq.go line 147 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This should use GetRandomDims, because the vector set is populated after the randomization step.

Done.


pkg/sql/vecindex/quantize/unquantizer.go line 80 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Same here, for consistency, even though it's the same value.

Done.


pkg/sql/vecindex/vecstore/encoding.go line 55 at r1 (raw file):

Previously, mw5h (Matt White) wrote…

I did this while debugging an unrelated issue. The fixed with encoding was being displayed as a series of NULLs and a single byte of data. It made me wonder if the KV layer didn't like the input, since the rest of the key prefix is built using varint encoding. My problem was elsewhere, but I left this change in because it seemed more in line with what was happening elsewhere.

Done.


pkg/sql/vecindex/vecstore/persistent_txn.go line 215 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think we could implement this relatively easily by calling GetPartition

GetPartition takes a lock, so no. I'll factor the partition decoding out so that it can be reused.


pkg/sql/vecindex/vecstore/persistent_txn.go line 226 at r1 (raw file):

Previously, mw5h (Matt White) wrote…

Okay, thanks for the info.

Done.


pkg/sql/vecindex/vecstore/persistent_txn.go line 38 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

You should maybe check with the KV team on whether we need GuaranteedDurability for non-Serializable isolation levels for this scenario.

This is encoded in optbuilder, but the clean way of accessing it is a method that we don't have access to.


pkg/sql/vecindex/vecstore/persistent_txn.go line 233 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This looks like it might have to reallocate the underlying array buffer in EncodePartitionKey - you should preallocate enough bytes to prevent that.

Done.

@mw5h mw5h force-pushed the vector-partition-ops branch from b9ba263 to 1c10b1f Compare December 18, 2024 21:22
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 5 files at r7, 8 of 8 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)


pkg/sql/vecindex/vecstore/persistent_txn.go line 77 at r8 (raw file):

	}
	// Partition metadata is stored in /Prefix/PartitionID, with vector data
	// following in /Prefix/PartitionID/ChildKey

nit: comment needs a period


pkg/sql/vecindex/vecstore/persistent_txn.go line 92 at r8 (raw file):

	// Build a decoder function for deserializing encoded vectors. We also set the
	// centroid of the empty vector set here because the centroid is stored in the
	// partition metadata but is part of the data that all implementations

nit: should be: all implementations of

@mw5h mw5h force-pushed the vector-partition-ops branch from 1c10b1f to 1e8c022 Compare December 18, 2024 23:11
@mw5h mw5h force-pushed the vector-partition-ops branch from 1e8c022 to d426d97 Compare December 28, 2024 01:23
Copy link

blathers-crl bot commented Dec 28, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@mw5h mw5h force-pushed the vector-partition-ops branch 2 times, most recently from 8122477 to 3aad54a Compare December 30, 2024 18:22
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r9, 2 of 2 files at r10, 4 of 4 files at r11, 1 of 1 files at r12, 6 of 6 files at r13, 2 of 2 files at r14, 4 of 4 files at r15, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @mw5h)


pkg/sql/vecindex/vecstore/store_test.go line 17 at r15 (raw file):

)

func commonStoreTests(

Nice!


pkg/sql/vecindex/quantize/quantizer.go line 120 at r11 (raw file):

	// Clear removes all the elements of the vector set so that it may be reused. The
	// new centroid is copied over the existing centroid.
	Clear(centroid vector.T)

[nit] Given that this method sets a new centroid and prepares the set for reuse, would Reset() be a better name?


pkg/sql/vecindex/vecstore/persistent_txn.go line 236 at r13 (raw file):

// existing metadata, but existing vectors will not be deleted. Vectors in the
// new partition will overwrite existing vectors if child keys collide, but
// otherwise the resulting partition will be a union of the two partitions.

Should we consider it a bug if a caller is trying to overwrite a partition without having first deleted it?


pkg/sql/vecindex/vecstore/persistent_txn.go line 302 at r14 (raw file):

	ctx context.Context, partitionKey PartitionKey, vector vector.T, childKey ChildKey,
) (int, error) {
	// TODO(mw5h): Add to an existing batch instead of starting a new one.

This concern might go away when buffered writes are implemented.


pkg/sql/vecindex/vecstore/persistent_txn.go line 320 at r14 (raw file):

	}

	entryKey := make([]byte, len(metadataKey))

[nit] Since we have to reallocate on the append anyway, we could just cap the metadataKey and use it directly:

// Cap the metadata key so that the append allocates a new slice for the child key.
prefix := metadataKey[:len(metadataKey):len(metadataKey)]
entryKey := EncodeChildKey(prefix, childKey)

@mw5h mw5h force-pushed the vector-partition-ops branch from 3aad54a to 0faf35f Compare December 31, 2024 00:07
Copy link
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @DrewKimball)


pkg/sql/vecindex/quantize/quantizer.go line 120 at r11 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Given that this method sets a new centroid and prepares the set for reuse, would Reset() be a better name?

Sadly, the protobuf generated code already defines Reset(). Not sure we want to mess with that. I do agree that Clear() leaves something to be desired -- I just grabbed that because there are some Clear methods hanging about that do similar things. The 'centroid' argument is suboptimal.


pkg/sql/vecindex/vecstore/persistent_txn.go line 236 at r13 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should we consider it a bug if a caller is trying to overwrite a partition without having first deleted it?

To do that, we'd have to go back to using CPut for writing the partition's metadata. Andy's comments made it sound like this was not free. Are we willing to spend cycles on this?


pkg/sql/vecindex/vecstore/persistent_txn.go line 320 at r14 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Since we have to reallocate on the append anyway, we could just cap the metadataKey and use it directly:

// Cap the metadata key so that the append allocates a new slice for the child key.
prefix := metadataKey[:len(metadataKey):len(metadataKey)]
entryKey := EncodeChildKey(prefix, childKey)

Done. Thanks for that!

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @mw5h)


pkg/sql/vecindex/quantize/quantizer.go line 120 at r11 (raw file):

Previously, mw5h (Matt White) wrote…

Sadly, the protobuf generated code already defines Reset(). Not sure we want to mess with that. I do agree that Clear() leaves something to be desired -- I just grabbed that because there are some Clear methods hanging about that do similar things. The 'centroid' argument is suboptimal.

I see, too bad.


pkg/sql/vecindex/vecstore/persistent_txn.go line 236 at r13 (raw file):

Previously, mw5h (Matt White) wrote…

To do that, we'd have to go back to using CPut for writing the partition's metadata. Andy's comments made it sound like this was not free. Are we willing to spend cycles on this?

I wouldn't think CPut overhead should matter for partition creation, but if it's a concern we could just make it a documented requirement of the function without actually validating it during execution. It's minor though, so I'm fine with whatever you end up deciding.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mw5h)


pkg/sql/vecindex/vecstore/persistent_store.go line 16 at r22 (raw file):

)

type PersistentStore struct {

NIT: Add comment for exported type.xN


pkg/sql/vecindex/vecstore/persistent_txn.go line 40 at r22 (raw file):

// PersistentStoreCodec abstracts quantizer-specific aspects of reading and writing vector sets.
type PersistentStoreCodec struct {

Does this need to be exported?


pkg/sql/vecindex/vecstore/persistent_txn.go line 43 at r22 (raw file):

	quantizer quantize.Quantizer

	encoder func(quantize.QuantizedVectorSet, []byte, int) ([]byte, error)

Why have functors like this, rather than using an interface? It creates extra objects to do it this way, as well as being less idiomatic.

In fact, I'm not even sure an interface is worth it, given the simplicity of what's going on here. You could embed the codec struct implementation directly in PersistentStoreTxn so that you don't create any extra objects - getCodecForPartitionKey would then just return a pointer to one or the other. The implementation would just do the switch on the quantizer type each time it's called rather than having to create separate objects for dynamic dispatch.

Copy link
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @DrewKimball)


pkg/sql/vecindex/vecstore/persistent_txn.go line 302 at r14 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

This concern might go away when buffered writes are implemented.

Done.


pkg/sql/vecindex/vecstore/persistent_txn.go line 40 at r22 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Does this need to be exported?

No, it doesn't. Good catch.


pkg/sql/vecindex/vecstore/persistent_txn.go line 43 at r22 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Why have functors like this, rather than using an interface? It creates extra objects to do it this way, as well as being less idiomatic.

In fact, I'm not even sure an interface is worth it, given the simplicity of what's going on here. You could embed the codec struct implementation directly in PersistentStoreTxn so that you don't create any extra objects - getCodecForPartitionKey would then just return a pointer to one or the other. The implementation would just do the switch on the quantizer type each time it's called rather than having to create separate objects for dynamic dispatch.

Yeah, I started with an interface implementation and it just created a lot of boilerplate for very little gain. From there I went to functors because that's the C thing to do. I'll take another shot at it.

@mw5h mw5h force-pushed the vector-partition-ops branch 3 times, most recently from 4a65e46 to cd29e35 Compare January 11, 2025 01:15
@mw5h mw5h marked this pull request as ready for review January 11, 2025 01:15
@mw5h
Copy link
Contributor Author

mw5h commented Jan 11, 2025

This is ready for review (as if we hadn't already been doing that). The GetFullVectors changes are big enough that I'll post it as a separate PR.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 6 files at r20, 12 of 12 files at r23, 8 of 8 files at r24, 10 of 10 files at r25, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)


pkg/sql/vecindex/vecstore/persistent_txn.go line 94 at r25 (raw file):

// encodeVectorFromSet encodes the vector indicated by 'idx' from an external
// vector set.
func (psc *persistentStoreCodec) encodeVectorFromSet(

nit: for this and Encode*Vector, it doesn't look like we're ever appending to a non-empty slice, so we could lose the appendTo argument. I'm fine leaving that as future refactoring, though.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball and @mw5h)


pkg/sql/vecindex/vecstore/persistent_txn.go line 26 at r25 (raw file):

// vector index's internal data. Committing changes is the responsibility of the
// caller.
type PersistentStoreTxn struct {

NIT: Why export this type?


pkg/sql/vecindex/vecstore/persistent_txn.go line 380 at r25 (raw file):

	}

	if err := psTxn.kv.Run(ctx, b); err != nil {

Does this code path try to parallelize the scans?

Copy link
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @andy-kimball and @DrewKimball)


pkg/sql/vecindex/vecstore/persistent_txn.go line 236 at r13 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I wouldn't think CPut overhead should matter for partition creation, but if it's a concern we could just make it a documented requirement of the function without actually validating it during execution. It's minor though, so I'm fine with whatever you end up deciding.

I'm going to leave this as is for now. Happy to revisit later if we want.


pkg/sql/vecindex/vecstore/persistent_txn.go line 26 at r25 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: Why export this type?

It shouldn't need to be exported. Thanks for pointing that out.


pkg/sql/vecindex/vecstore/persistent_txn.go line 380 at r25 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Does this code path try to parallelize the scans?

My understanding is that the KV layer will attempt to parallelize requests received in a batch.


pkg/sql/vecindex/vecstore/store_test.go line 17 at r15 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Nice!

Done.


pkg/sql/vecindex/quantize/quantizer.go line 120 at r11 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I see, too bad.

Done.

mw5h added 3 commits January 13, 2025 14:45
Add persistent storage to vecstore. Implement the methods for partition
creation, retrieval and deletion. Add a simple test to ensure these
methods are functioning.

Epic: CRDB-42943
Add implementations of AddToPartition, RemoveFromPartition and
SearchPartitions. AddToPartition and RemoveFromPartition turn into
simple KV operations since each encoded vector is written as a single KV
value. SearchPartitions reuses a lot of the code for GetPartition, but
without the locking.

Epic: CRDB-42943
Previously, the in-memory and persistent vector stores maintained their
own disjoint tests. This patch moves those tests that test common
functionality into a single joint test that is referenced from both
implementation unit tests.

Epic: CRDB-42943
@mw5h mw5h force-pushed the vector-partition-ops branch from cd29e35 to 780f5a6 Compare January 13, 2025 22:47
@mw5h
Copy link
Contributor Author

mw5h commented Jan 13, 2025

TFTR

@mw5h
Copy link
Contributor Author

mw5h commented Jan 13, 2025

bors r+

craig bot pushed a commit that referenced this pull request Jan 14, 2025
137319: vecstore: implement partition manipulation functions r=mw5h a=mw5h

Stub out the persistent storage vecstore implementation. Implement the methods for partition creation, retrieval and deletion. Add a simple test to ensure that these functions work properly.

137426: sql: make memo.IsStale more efficient when schema objects aren't modified r=fqazi a=fqazi

This patch speeds up memo staleness checks for prepared statements by:

1. Updates the lease manager to expose a generation ID which allows you to know if any descriptors changed
2. Updates the table stats cache to add a generation ID to determine if any new stats were added
3. Updates the IsStale check to confirm if either the generation IDs have changed, zone configs, search_path, changed or current database has changed. If everything was the same in the last execution, the full CheckDependencies can be skipped. This is only used for prepared statements.
5. Updates the schema changer logic to lease new descriptors if the schema is already leased. This allows invalidation to work within a search path if new objects are made.

Fixes: #105867


Numbers from sysbench (before these changes) using `rpsak.sh repeated`:
```
Before the changes (geomean TPS): 1121.75
After the changes (geomean TPS): 1165.47
Before the changes (geomean QPS): 22435.06
After the changes (geomean QPS): 23309.41
Percent improvement: 3.89%
```

Also from the sysbench workload numbers (from `@tbg)` write_only:
```
benchdiff --old lastmerge  ./pkg/sql/tests -b -r 'Sysbench/SQL/3node/oltp_write_only' -d 1000x -c 10

name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_write_only-24    4.38ms ± 1%    4.23ms ± 1%  -3.38%  (p=0.000 n=10+10)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_write_only-24     935kB ± 6%     933kB ± 6%    ~     (p=0.739 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_write_only-24     6.08k ± 2%     6.04k ± 3%    ~     (p=0.060 n=10+10)
```

read_only:
```
benchdiff --old lastmerge  ./pkg/sql/tests -b -r 'Sysbench/SQL/3node/oltp_read_only' -d 1000x -c 10
test binaries already exist for d2bd29e: Merge #137750
test binaries already exist for 3cc7a6c: sql: add avoid_catalog_generation_for_staleness to

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/sql/tests \

name                                  old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_only-24    7.88ms ± 1%    7.44ms ± 2%  -5.62%  (p=0.000 n=10+10)

name                                  old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_only-24    1.20MB ± 4%    1.19MB ± 5%    ~     (p=0.353 n=10+10)

name                                  old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_only-24     4.71k ± 3%     4.64k ± 3%  -1.48%  (p=0.022 n=10+10)
```

Co-authored-by: Matt White <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 14, 2025

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Jan 14, 2025
137319: vecstore: implement partition manipulation functions r=mw5h a=mw5h

Stub out the persistent storage vecstore implementation. Implement the methods for partition creation, retrieval and deletion. Add a simple test to ensure that these functions work properly.

Co-authored-by: Matt White <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 14, 2025

Build failed:

@mw5h
Copy link
Contributor Author

mw5h commented Jan 14, 2025

bors retry

@craig craig bot merged commit 8cb8291 into cockroachdb:master Jan 14, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants