-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: initialize tenant span config with rangefeeds enabled #76420
sql: initialize tenant span config with rangefeeds enabled #76420
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix looks tenant-specific, but we have seen the same problem on a non-tenant node (see #76331 (comment)).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @irfansharif)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment about the placeholder config we install when creating new tenants when you update the reconciler tests?
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
I still don't fully understand the failure in #76331. So rangefeeds from the tenant pods, in order to work, need this configuration to have applied over ranges its looking to establish the rangefeed on. That, as things are today, would only happen after the tenant is able to first reconcile (something that does eventually happen). These errors then should be transient. That's where I'm confused -- with Radu's repro, I'm still seeing the test stall indefinitely. What gives? Is some component not retrying properly, and wedging startup? I agree this PR would help, but I think there might be another issue present.
This too I'm not following. I looked at the data directory passed around in slack. What were the exact repro steps? Is that a separate bug where in mixed version state we can't rely on the rangefeed.enabled config to have been applied yet, and because we ripped out the code that determines whether things should be enabled or not by looking at key prefixes, we're just blocking all rangefeed attempts? |
Yes, correct |
Fixes cockroachdb#76331. Not doing this causes problems when we construct new tenants as they won't be able to establish a rangefeed until their first full reconciliation completes and then propagates. This isn't "buggy", but it is slow. Even this is not great. If the preceding range did not have rangefeeds enabled, it would take a closed timestamp interval for this enablement to propagate. Perhaps this is evidence that we should always carve out a span at the end of the keyspace and set it to have rangefeeds enabled. I'll leave that fix and testing of this to somebody else. Hope this is helpful enough on its own. Release note: None Co-authored-by: irfan sharif <[email protected]>
db28d4a
to
eee2148
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even this is not great. If the preceding range did not have rangefeeds enabled,
it would take a closed timestamp interval for this enablement to propagate.
Perhaps this is evidence that we should always carve out a span at the end of
the keyspace and set it to have rangefeeds enabled.
This doesn't seem like that bad a problem to me, especially with our 3s interval. At least it doesn't seem that urgent a problem.
Even if we didn't have this PR, I don't think it'd be "buggy"; it would however noticeably slow down tenant pod startup (which is what I was seeing in my reproductions for #76331). That's a good enough reason to make this change than not, so LGTM. Took the liberty of updating some tests here.
bors r+
I don't actually know yet what's happening there. There's a real migration bug lurking I think (discussed in #76331), I'll send a separate PR for that. |
bors r+ |
Already running a review |
Build succeeded: |
Not doing this causes problems when we construct new tenants as they won't be
able to establish a rangefeed until their first full reconciliation completes
and then propagates.
Even this is not great. If the preceding range did not have rangefeeds enabled,
it would take a closed timestamp interval for this enablement to propagate.
Perhaps this is evidence that we should always carve out a span at the end of
the keyspace and set it to have rangefeeds enabled.
I'll leave that fix and testing of this to somebody else. Hope this is helpful
enough on its own. cc @irfansharif and @arulajmani. I believe this problem
has been blocking @RaduBerinde.
Fixes #76331
Release note: None