Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: store replicated time in details #103835

Merged
merged 1 commit into from
May 25, 2023

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented May 24, 2023

The cutover process uses the progress field to record how many ranges need to be reverted. This, however, wipes out the high watermark that was previously stored in that progress field.

Here, we move the canonical copy of the high watermark to the progress details.

For clarity, we rename "high watermark" as "replicated time".

Further, we delete a long skipped test that isn't providing much value.

Fixes #103534

Epic: none

Release note: None

@stevendanna stevendanna requested review from a team as code owners May 24, 2023 12:49
@stevendanna stevendanna requested review from lidorcarmel and msirek and removed request for a team May 24, 2023 12:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna added the backport-23.1.x Flags PRs that need to be backported to 23.1 label May 24, 2023
Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this! I think the only thing I'm unsure about is if we should even continue logging the hwm. I don't think we should as it opens the door to footguns, but curious why you think we should.

@@ -151,6 +151,11 @@ message StreamIngestionProgress {
// consistent state as of the CutoverTime.
util.hlc.Timestamp cutover_time = 1 [(gogoproto.nullable) = false];

// ReplicatedTime is the ingestion frontier. This is the canonical
// value of the frontier. The HighWater in the job progress is for
// informational purposes only.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should never log the HighWater in the job progress?

pkg/ccl/streamingccl/streamingest/stream_ingestion_job.go Outdated Show resolved Hide resolved
// Keep the recorded replicatedTime empty until some advancement has been made
if sf.replicatedTimeAtStart.Less(replicatedTime) {
streamProgress.ReplicatedTime = replicatedTime
// The HighWater is for informational purposes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we even want to record the hwm, as it opens the door to foot guns.

streamProgress := progress.Details.(*jobspb.Progress_StreamIngest).StreamIngest
streamProgress.ReplicatedTime = hlcReplicatedTime
progress.Progress = &jobspb.Progress_HighWater{
HighWater: &hlcReplicatedTime,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I've said elsewhere, I wonder if we should not record the HighWater in Progress.

streamProgress := progress.Details.(*jobspb.Progress_StreamIngest).StreamIngest
streamProgress.ReplicatedTime = cutover
progress.Progress = &jobspb.Progress_HighWater{
HighWater: &cutover,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: highWater log alert.

Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this! I think the only thing I'm unsure about is if we should even continue logging the hwm. I don't think we should as it opens the door to footguns, but curious why you think we should.

@stevendanna
Copy link
Collaborator Author

@msbutler My only hesitation about removing the high watermark completely is that it allows it to be visible in DBConsole, but I agree it is a bit of a footgun.

@msbutler
Copy link
Collaborator

it allows it to be visible in DBConsole

Ah I see. FWIW, I hadn't even realized the dbconsole shows the hwm. I feel its fine to drop it from the the db console. It's a time series metric anyway, and I think the replication lag graph provides richer information.

one other thing: I think there's some roachtest code you'll need to update as well.

The cutover process uses the progress field to record how many ranges
need to be reverted. This, however, wipes out the high watermark that
was previously stored in that progress field.

Here, we move the canonical copy of the high watermark to the progress
details.

For clarity, we rename "high watermark" as "replicated time".

Further, we delete a long skipped test that isn't providing much
value.

Epic: none

Release note: None
@stevendanna stevendanna force-pushed the replicated-time-refactor branch from 0b2499a to 327947f Compare May 25, 2023 10:22
@stevendanna stevendanna requested a review from a team as a code owner May 25, 2023 10:22
@stevendanna stevendanna requested review from smg260 and renatolabs and removed request for a team May 25, 2023 10:22
@stevendanna
Copy link
Collaborator Author

Updated the roachtests and addressed comments other than the high watermark issue. Going to spend a few minutes looking into the ux differences

@msbutler
Copy link
Collaborator

hrm, I don't see any updates to pkg/cmd/roachtest/tests/cluster_to_cluster.go ?

I defer to you on the dbconsole question. I'll approve whatever.

@msbutler
Copy link
Collaborator

one more nit: update the commit/PR message to Fix the original issue? #103534

@stevendanna
Copy link
Collaborator Author

hrm, I don't see any updates to pkg/cmd/roachtest/tests/cluster_to_cluster.go ?

Should be here: https://github.com/cockroachdb/cockroach/pull/103835/files#diff-b713f571096d10de2fe8f435442a825c8b41faaa685a6606a8432a61cf663f02

func (c *streamIngesitonJobInfo) GetHighWater() time.Time { return c.highwaterTime }
// GetHighWater returns the replicated time. The GetHighWater name is
// retained here as this is implementing the jobInfo interface used by
// the latency verifier.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alas.

@stevendanna
Copy link
Collaborator Author

bors r=msbutler

@craig
Copy link
Contributor

craig bot commented May 25, 2023

Build failed:

@msbutler
Copy link
Collaborator

unrelated flake, logged issue here.

@msbutler
Copy link
Collaborator

bors retry

@craig
Copy link
Contributor

craig bot commented May 25, 2023

Build succeeded:

@craig craig bot merged commit 0ab47ef into cockroachdb:master May 25, 2023
@blathers-crl
Copy link

blathers-crl bot commented May 25, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 327947f to blathers/backport-release-23.1-103835: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

c2c: cutover is not resilient to node shutdown
3 participants