Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: create LinkExternalSSTRequest kv rpc #120570

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Mar 15, 2024

Previously, the AddSStable rpc would process linking external files into
pebble, even though the linking logic is largely seperate from a regular
AddSSTable request. This patch splits external file linking into a new rpc:
LinkExternalSST.

After this patch, the only significant code overlap between the AddSStable and
LinkExternalSST rpcs is in raft sideloading.

Fixes #120526

Release note: none

Copy link

blathers-crl bot commented Mar 15, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

pkg/kv/kvpb/api.proto Show resolved Hide resolved
@@ -100,6 +100,7 @@ func (c *replicatedCmd) getStoreWriteByteSizes() (writeBytes int64, ingestedByte
if c.Cmd.WriteBatch != nil {
writeBytes = int64(len(c.Cmd.WriteBatch.Data))
}
// TODO(msbutler): do we need extra logic for external ssts?
if c.Cmd.ReplicatedEvalResult.AddSSTable != nil {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the answer here is no: writeBytes and ingestedBytes should be 0 for an external file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with 0, to keep the accounting consistent with AddSStable: we only increment for data, not metadata.

@@ -181,6 +181,7 @@ func (r *Replica) evalAndPropose(
// granularity of admission control doing token and size estimation (which
// is 15s). Also, admission control corrects for gaps in reporting.
writeBytes := kvadmission.NewStoreWriteBytes()
// TODO(msbutler): anything to do here for external sst linking?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe its fine to tell admission control that the writeBytes and ingestedBytes is 0.

Copy link
Member

@dt dt Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, or perhaps the size of the struct and thus the underlying manifest write?

pkg/kv/kvserver/kvserverpb/proposer_kv.proto Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-split-addsst branch from ae447ea to ae2eadf Compare March 15, 2024 21:05
@msbutler msbutler requested a review from dt March 15, 2024 21:05
@msbutler msbutler force-pushed the butler-split-addsst branch from ae2eadf to 6b91fd1 Compare March 15, 2024 21:10
@msbutler
Copy link
Collaborator Author

@dt once you give this a look, I'll ask kv to take a look.

@msbutler
Copy link
Collaborator Author

unrelated flakes

@msbutler msbutler marked this pull request as ready for review March 18, 2024 17:55
@msbutler msbutler requested a review from a team as a code owner March 18, 2024 17:55
@msbutler msbutler requested a review from a team March 18, 2024 17:55
@msbutler msbutler requested review from a team as code owners March 18, 2024 17:55
@dt dt requested a review from erikgrinaker March 18, 2024 18:04
// LinkExternalSSTable, like AddSSTable, is a side effect that must execute
// before the Raft application is committed. It must be idempotent to account
// for an ill-timed crash after applying the side effect, but before
// committing the batch.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be transparent: we considered reducing the scope of this pr to be only above raft: i.e. cmd_link_external_sstable would create a addsstable replicated eval result. One reason we want to plumb the logic below raft is because it makes it more clear that regular addsstable requests always have data to sideload.

pkg/kv/kvpb/api.proto Outdated Show resolved Hide resolved
uint64 approximate_physical_size = 4;
bytes synthetic_prefix = 5;
}
ExternalFile external_file = 4 [(gogoproto.nullable) = false];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this repeated then just make it an error response if it is len()!=1 for now, but that way it is clearer that folks should add new file-specific fields in the file struct.

Copy link
Collaborator Author

@msbutler msbutler Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm, is this pattern used elsewhere? I feel like a comment here could also be sufficient.

// before the Raft application is committed. It must be idempotent to account
// for an ill-timed crash after applying the side effect, but before
// committing the batch.
message LinkExternalSSTable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this struct does differ from the request one, e.g. mvcc stats need to make it to kv server but don't go into (this part) of the proposal so I think this justifies the somewhat duplicated structs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. we shouldn't pass the ExternalFile message currently defined in api.proto? agreed.

@@ -288,6 +288,13 @@ func (b *replicaAppBatch) runPostAddTriggersReplicaOnly(
}
res.AddSSTable = nil
}
if res.LinkExternalSSTable != nil {
if res.LinkExternalSSTable.RemoteRewriteTimestamp.IsSet() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this timestamp check for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was copying the AddSSTable req code above. I'll copy the comment:

// We've ingested the SST already (via the appBatch), so all that's left
// to do here is notify the rangefeed, if appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we check the timestamp bc non mvcc ExternalSSTable requests are not sent to rangefeeds.

@nvanbenschoten nvanbenschoten requested a review from dt March 21, 2024 21:35
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 21 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @msbutler)


pkg/kv/kvpb/method.go line 142 at r2 (raw file):

	// Best-effort.
	AdminScatter
	// AddSSTable links a file into the RocksDB log-structured merge-tree.

What's RocksDB? The only log-structured merge tree I know is pebble.


pkg/kv/kvserver/app_batch.go line 186 at r2 (raw file):

		)
		b.updateSSTStats(res, copied)
	} else if res.LinkExternalSSTable != nil {

nit: I don't think this needs to be an else, it can just be a standalone if. That will keep this consistent with the rest of post-application side effect handling and avoid an unnecessary assumption about mutual exclusive handling of these fields.


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

	if res.LinkExternalSSTable != nil {
		if res.LinkExternalSSTable.RemoteRewriteTimestamp.IsSet() {
			b.r.handleSSTableRaftMuLocked(ctx, nil, res.LinkExternalSSTable.Span, res.WriteTimestamp)

This is going to publish the contents of the sst onto any rangefeeds. Is that the desired effect?


pkg/kv/kvserver/replica_app_batch.go line 296 at r2 (raw file):

		}
		res.LinkExternalSSTable = nil

nit: stray line


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 1 at r2 (raw file):

// Copyright 2017 The Cockroach Authors.

2024


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 33 at r2 (raw file):

}

// EvalLinkExternalSSTable evaluates an LinkExternalSSTable command. For details, see doc comment

s/an/a/


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 44 at r2 (raw file):

	var span *tracing.Span
	ctx, span = tracing.ChildSpan(ctx, "LinkExternalSSTable")

Do we need this tracing span? The evaluation looks pretty cheap.


pkg/kv/kvserver/raftlog/payload.go line 48 at r2 (raw file):

		// raft.ConfChange.
		prefix = false
	} else if command.ReplicatedEvalResult.AddSSTable != nil || command.ReplicatedEvalResult.LinkExternalSSTable != nil {

@dt and I discussed this. There's not a good reason to sideload LinkExternalSSTable entries. Doing so leads to a 0 byte file being written and read and requires all of the new addSSTable != nil checks. I think we can skip sideloading without much work and avoid all of this — just revert the changes this PR is making to payload.go and sideload.go.

@msbutler msbutler force-pushed the butler-split-addsst branch from 6b91fd1 to 4ecf40c Compare March 22, 2024 15:10
@msbutler msbutler requested a review from a team as a code owner March 22, 2024 15:10
Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @nvanbenschoten)


pkg/kv/kvpb/method.go line 142 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's RocksDB? The only log-structured merge tree I know is pebble.

Ah good q. RocksDB was crdb's old storage engine. The blog post below explains why we built Pebble to replace RocksDB. I'll update the comment to prevent future confusion :)

https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/


pkg/kv/kvserver/app_batch.go line 186 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I don't think this needs to be an else, it can just be a standalone if. That will keep this consistent with the rest of post-application side effect handling and avoid an unnecessary assumption about mutual exclusive handling of these fields.

Done.

One follow up q: do we still need to update sst stats here if we no longer sideload External SSTs? The comment above may be referring to sideloading, but I don't know enough about the mechanical differences between sideloading and ingesting an sst into pebble.


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is going to publish the contents of the sst onto any rangefeeds. Is that the desired effect?

yeah i think so. If the online restore is running from a source tenant in a physical cluster replication setup, these SSTs must be replicated to the standby tenant.


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 44 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need this tracing span? The evaluation looks pretty cheap.

Good point. Removing.


pkg/kv/kvserver/kvserverpb/proposer_kv.proto line 261 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I'd just leave it for now, and keep this PR a pure code-move to split the RPC and raft proposal paths out of AddSSTable with no behavior change.

sounds good.


pkg/kv/kvserver/raftlog/payload.go line 48 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

@dt and I discussed this. There's not a good reason to sideload LinkExternalSSTable entries. Doing so leads to a 0 byte file being written and read and requires all of the new addSSTable != nil checks. I think we can skip sideloading without much work and avoid all of this — just revert the changes this PR is making to payload.go and sideload.go.

Will do.


pkg/kv/kvserver/replica_raft.go line 184 at r1 (raw file):

Previously, dt (David Taylor) wrote…

yes, or perhaps the size of the struct and thus the underlying manifest write?

Going with 0, to keep the accounting consistent with AddSStable: we only increment for data, not metadata. @nvanbenschoten if you disagree, let us know.

@msbutler msbutler force-pushed the butler-split-addsst branch from 4ecf40c to a708512 Compare March 22, 2024 15:36
@msbutler
Copy link
Collaborator Author

rebased on master

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 10 of 15 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @msbutler)


pkg/kv/kvpb/method.go line 142 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

Ah good q. RocksDB was crdb's old storage engine. The blog post below explains why we built Pebble to replace RocksDB. I'll update the comment to prevent future confusion :)

https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/

One day we'll finish getting rid of all of these stale references :)


pkg/kv/kvserver/app_batch.go line 186 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

Done.

One follow up q: do we still need to update sst stats here if we no longer sideload External SSTs? The comment above may be referring to sideloading, but I don't know enough about the mechanical differences between sideloading and ingesting an sst into pebble.

I think you probably want to handle stats differently for LinkExternalSSTable. numAddSST and numAddSSTCopies are used to drive updates the AddSSTableApplications and AddSSTableApplicationCopies counters. numMutations is used to update load stats for load-based rebalancing. I don't think we want to update the first two, and based on a conversation with @dt, not the third either. So we can probably start by inlining updateSSTStats in the AddSSTable branch, getting rid of the helper, and not updating these fields on the LinkExternalSSTable path.


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

yeah i think so. If the online restore is running from a source tenant in a physical cluster replication setup, these SSTs must be replicated to the standby tenant.

Will this work though? We're passing nil for sst []byte. It doesn't looks like ConsumeSSTable is handling this case, and if it does, does that mean it's going to download the external file synchronously in order to publish it on a rangefeed?


pkg/kv/kvserver/replica_raft.go line 184 at r1 (raw file):

Previously, msbutler (Michael Butler) wrote…

Going with 0, to keep the accounting consistent with AddSStable: we only increment for data, not metadata. @nvanbenschoten if you disagree, let us know.

0 SGTM.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 21 files at r1, 1 of 5 files at r2, 10 of 15 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @msbutler, and @nvanbenschoten)


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Will this work though? We're passing nil for sst []byte. It doesn't looks like ConsumeSSTable is handling this case, and if it does, does that mean it's going to download the external file synchronously in order to publish it on a rangefeed?

What @nvanbenschoten said.


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 48 at r4 (raw file):

	// We have no idea if the SST being ingested contains keys that will shadow
	// existing keys or not, so we need to force its mvcc stats to be estimates.

Is the common case an empty span? Is it worth probing the keyspace to see if it's empty first, and only mark as estimate if it isn't empty?


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 66 at r4 (raw file):

			// Since the remote SST could contain keys at any timestamp, consider it
			// a history mutation.
			MVCCHistoryMutation: &kvserverpb.ReplicatedEvalResult_MVCCHistoryMutation{

Is this still a history mutation when RemoteRewriteTimestamp is used? Is it because we could still write below other keys?

This will cause a rangefeed to error when ingested. Is that the desired result? I guess it has to be if the above is true.

Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @nvanbenschoten)


pkg/kv/kvserver/app_batch.go line 186 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think you probably want to handle stats differently for LinkExternalSSTable. numAddSST and numAddSSTCopies are used to drive updates the AddSSTableApplications and AddSSTableApplicationCopies counters. numMutations is used to update load stats for load-based rebalancing. I don't think we want to update the first two, and based on a conversation with @dt, not the third either. So we can probably start by inlining updateSSTStats in the AddSSTable branch, getting rid of the helper, and not updating these fields on the LinkExternalSSTable path.

Thanks for the clarification. Done.


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

What @nvanbenschoten said.

hrm. It does seem that C2C would currently panic. I'll remove this for now. I'll continue to nil out res.LinkExternalSSTable here to follow the AddSStable pattern.


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 48 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Is the common case an empty span? Is it worth probing the keyspace to see if it's empty first, and only mark as estimate if it isn't empty?

Ah, this comment should be updated. MVCCStats are always going to be estimates, regardless of the empty key space question, as they're currently hacked together from backup file stats here.
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/backupccl/restore_online.go#L303


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 66 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Is this still a history mutation when RemoteRewriteTimestamp is used? Is it because we could still write below other keys?

This will cause a rangefeed to error when ingested. Is that the desired result? I guess it has to be if the above is true.

Oof, really nice catch. It is no longer a history mutation if RemoteRewriteTimestamp is used.

@msbutler msbutler force-pushed the butler-split-addsst branch from a708512 to 5709f0a Compare March 22, 2024 17:49
@msbutler msbutler force-pushed the butler-split-addsst branch from 5709f0a to 160856e Compare March 22, 2024 17:53
Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @erikgrinaker, and @nvanbenschoten)


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

hrm. It does seem that C2C would currently panic. I'll remove this for now. I'll continue to nil out res.LinkExternalSSTable here to follow the AddSStable pattern.

one follow up thought: if ingested external ssts cannot not be pushed onto rangefeeds for now, i feel like we want the rangefeed to error out instead of silently ignoring these linked files. To get this behavior, should we universally set MVCCHistoryMutation during eval? cc @dt

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @msbutler, and @nvanbenschoten)


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

hrm. It does seem that C2C would currently panic. I'll remove this for now. I'll continue to nil out res.LinkExternalSSTable here to follow the AddSStable pattern.

Consider adding a comment for why we don't do anything here.


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 61 at r4 (raw file):

				BackingFileSize:         args.ExternalFile.BackingFileSize,
				Span:                    roachpb.Span{Key: start.Key, EndKey: end.Key},
				RemoteRewriteTimestamp:  sstToReqTS,

I'm not sure this rewrite works correctly.

With AddSSTable, this gives the timestamp written in the SST. If this timestamp is different from the request timestamp (e.g. because the request got pushed), the SST must be rewritten at the request timestamp, otherwise it can e.g. violate the closed timestamp. I don't see us handling that case here anywhere and plumbing through the request timestamp instead.


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 66 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

Oof, really nice catch. It is no longer a history mutation if RemoteRewriteTimestamp is used.

Ok. We don't check for conflicts here though, so we can write below an existing value -- is that right and ok?

I think this still enforces that we write above the closed timestamp, so it should still work with rangefeeds and such.

Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @erikgrinaker, and @nvanbenschoten)


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):
Done. Wdyt of my previous q here?

if ingested external ssts cannot not be pushed onto rangefeeds for now, i feel like we want the rangefeed to error out instead of silently ignoring these linked files. To get this behavior, should we universally set MVCCHistoryMutation during eval?


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 61 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I'm not sure this rewrite works correctly.

With AddSSTable, this gives the timestamp written in the SST. If this timestamp is different from the request timestamp (e.g. because the request got pushed), the SST must be rewritten at the request timestamp, otherwise it can e.g. violate the closed timestamp. I don't see us handling that case here anywhere and plumbing through the request timestamp instead.

Ah. Another nice catch. Fixed.


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 66 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Ok. We don't check for conflicts here though, so we can write below an existing value -- is that right and ok?

I think this still enforces that we write above the closed timestamp, so it should still work with rangefeeds and such.

I don't think this new request needs to check for conflicts -- addsstable requests in conventional restore don't check for conflicts (at least in release builds), and further the client should never write into existing key space below an existing value.

@msbutler msbutler force-pushed the butler-split-addsst branch from 160856e to 4d293fd Compare March 22, 2024 18:46
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @msbutler, and @nvanbenschoten)


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

Done. Wdyt of my previous q here?

if ingested external ssts cannot not be pushed onto rangefeeds for now, i feel like we want the rangefeed to error out instead of silently ignoring these linked files. To get this behavior, should we universally set MVCCHistoryMutation during eval?

I'd rather we returned an appropriate explicit error in this case. Something like:

if res.LinkExternalSSTable != nil {
	b.r.disconnectRangefeedSpanWithErr(res.LinkExternalSSTable.Span, errors.New("LinkExternalSSTable not supported in rangefeeds"))
	res.LinkExternalSSTable = nil
}

pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 61 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

Ah. Another nice catch. Fixed.

nit: replacing sstToReqTS seems a bit iffy here, and we don't actually need it either since we don't appear to care about the original timestamp in the SSTs (this was a safety mechanism in AddSSTable to make sure we didn't clobber SSTs unintentionally). Something like:

var rewriteTimestamp hlc.Timestamp
if sstToReqTS.IsSet() {
  // LinkExternalSSTable doesn't care about the original SST timestamp, so just
  // always use the request write timestamp.
  rewriteTimestamp = cArgs.Header.Timestamp 
}

pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 66 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

I don't think this new request needs to check for conflicts -- addsstable requests in conventional restore don't check for conflicts (at least in release builds), and further the client should never write into existing key space below an existing value.

Got it.

@msbutler msbutler force-pushed the butler-split-addsst branch from 4d293fd to a260342 Compare March 22, 2024 19:38
Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @erikgrinaker, and @nvanbenschoten)


pkg/kv/kvserver/batcheval/cmd_link_external_sstable.go line 61 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: replacing sstToReqTS seems a bit iffy here, and we don't actually need it either since we don't appear to care about the original timestamp in the SSTs (this was a safety mechanism in AddSSTable to make sure we didn't clobber SSTs unintentionally). Something like:

var rewriteTimestamp hlc.Timestamp
if sstToReqTS.IsSet() {
  // LinkExternalSSTable doesn't care about the original SST timestamp, so just
  // always use the request write timestamp.
  rewriteTimestamp = cArgs.Header.Timestamp 
}

Nice idea. that cleans up the proto: no need to manually pass a timestamp anymore!

Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @erikgrinaker, and @nvanbenschoten)


pkg/kv/kvserver/replica_app_batch.go line 293 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I'd rather we returned an appropriate explicit error in this case. Something like:

if res.LinkExternalSSTable != nil {
	b.r.disconnectRangefeedSpanWithErr(res.LinkExternalSSTable.Span, errors.New("LinkExternalSSTable not supported in rangefeeds"))
	res.LinkExternalSSTable = nil
}

Done.

Previously, the AddSStable rpc would process linking external files into
pebble, even though the linking logic is largely seperate from a regular
AddSSTable request. This patch splits external file linking into a new rpc:
LinkExternalSST.

NB: after this patch, External SSTs will no longer sideload into each pebble
instance.

Fixes cockroachdb#120526

Release note: none
@msbutler msbutler force-pushed the butler-split-addsst branch from a260342 to 9320691 Compare March 22, 2024 20:03
@msbutler
Copy link
Collaborator Author

TFTRs!!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 23, 2024

@craig craig bot merged commit c950a9e into cockroachdb:master Mar 23, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvapi: split AddRemoteSSTable out of AddSSTable RPC
5 participants