Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: initialize tenant span config with rangefeeds enabled #76420

Merged

Conversation

ajwerner
Copy link
Contributor

Not doing this causes problems when we construct new tenants as they won't be
able to establish a rangefeed until their first full reconciliation completes
and then propagates.

Even this is not great. If the preceding range did not have rangefeeds enabled,
it would take a closed timestamp interval for this enablement to propagate.
Perhaps this is evidence that we should always carve out a span at the end of
the keyspace and set it to have rangefeeds enabled.

I'll leave that fix and testing of this to somebody else. Hope this is helpful
enough on its own. cc @irfansharif and @arulajmani. I believe this problem
has been blocking @RaduBerinde.

Fixes #76331

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix looks tenant-specific, but we have seen the same problem on a non-tenant node (see #76331 (comment)).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @irfansharif)

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Could you please add a comment about the placeholder config we install when creating new tenants when you update the reconciler tests?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)

@irfansharif
Copy link
Contributor

I still don't fully understand the failure in #76331. So rangefeeds from the tenant pods, in order to work, need this configuration to have applied over ranges its looking to establish the rangefeed on. That, as things are today, would only happen after the tenant is able to first reconcile (something that does eventually happen). These errors then should be transient. That's where I'm confused -- with Radu's repro, I'm still seeing the test stall indefinitely. What gives? Is some component not retrying properly, and wedging startup? I agree this PR would help, but I think there might be another issue present.

we have seen the same problem on a non-tenant node

This too I'm not following. I looked at the data directory passed around in slack. What were the exact repro steps? Is that a separate bug where in mixed version state we can't rely on the rangefeed.enabled config to have been applied yet, and because we ripped out the code that determines whether things should be enabled or not by looking at key prefixes, we're just blocking all rangefeed attempts?

@ajwerner
Copy link
Contributor Author

we can't rely on the rangefeed.enabled config to have been applied yet, and because we ripped out the code that determines whether things should be enabled or not by looking at key prefixes, we're just blocking all rangefeed attempts?

Yes, correct

Fixes cockroachdb#76331.

Not doing this causes problems when we construct new tenants as they
won't be able to establish a rangefeed until their first full
reconciliation completes and then propagates. This isn't "buggy", but it
is slow.

Even this is not great. If the preceding range did not have rangefeeds
enabled, it would take a closed timestamp interval for this enablement
to propagate. Perhaps this is evidence that we should always carve out a
span at the end of the keyspace and set it to have rangefeeds enabled.
I'll leave that fix and testing of this to somebody else. Hope this is
helpful enough on its own.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
@irfansharif irfansharif force-pushed the ajwerner/fix-tenant-rangefeeds-after-create branch from db28d4a to eee2148 Compare February 11, 2022 22:15
@ajwerner ajwerner requested a review from a team as a code owner February 11, 2022 22:15
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even this is not great. If the preceding range did not have rangefeeds enabled,
it would take a closed timestamp interval for this enablement to propagate.
Perhaps this is evidence that we should always carve out a span at the end of
the keyspace and set it to have rangefeeds enabled.

This doesn't seem like that bad a problem to me, especially with our 3s interval. At least it doesn't seem that urgent a problem.
Even if we didn't have this PR, I don't think it'd be "buggy"; it would however noticeably slow down tenant pod startup (which is what I was seeing in my reproductions for #76331). That's a good enough reason to make this change than not, so LGTM. Took the liberty of updating some tests here.

bors r+

@irfansharif
Copy link
Contributor

This fix looks tenant-specific, but we have seen the same problem on a non-tenant node

I don't actually know yet what's happening there. There's a real migration bug lurking I think (discussed in #76331), I'll send a separate PR for that.

@irfansharif
Copy link
Contributor

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 12, 2022

Already running a review

@craig
Copy link
Contributor

craig bot commented Feb 12, 2022

Build succeeded:

@craig craig bot merged commit 0aacc53 into cockroachdb:master Feb 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: unexpected "rangefeeds require kv.rangefeed.enabled" on system tables
5 participants