-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: unexpected "rangefeeds require kv.rangefeed.enabled" on system tables #76331
Comments
@yuzefovich ran into this as well (he hit |
Not doing this causes problems when we construct new tenants as they won't be able to establish a rangefeed until their first full reconciliation completes and then propagates. Even this is not great. If the preceding range did not have rangefeeds enabled, it would take a closed timestamp interval for this enablement to propagate. Perhaps this is evidence that we should always carve out a span at the end of the keyspace and set it to have rangefeeds enabled. I'll leave that fix and testing of this to somebody else. Hope this is helpful enough on its own. cc @irfansharif and @arulajmani. I believe this problem has been blocking @RaduBerinde. Fixes cockroachdb#76331 Release note: None
#76420 is something and attacks the heart of the problem. There's still some rough edges to iron out here. |
This comment was marked as spam.
This comment was marked as spam.
Some more info here - Yahor ran into a problem and he gave me the cockroach-data dir. I started a regular node with that dir and saw these messages:
And I was able to reproduce |
Any chance I can get this data directory? On some level it's going to have to do with the span configs being incorrect. A scan of |
Sent you a link on slack (out of paranoia of that directory containing some confidential information). |
Classic lack of version upgrade testing I'm not getting some ambient feels of concern about upgrade considerations for the tenants themselves, but I think if we just wholesale put the old logic back, we'll be fine. |
I think @RaduBerinde you'll be fine with #76420 for new tenants in new cluster and that new clusters bootstrapped off of master will be fine. We'll definitely need to do some mixed version testing. Sort of appalling that there's no mixed version test which sets a cluster setting. |
Maybe I'm missing something, but I'm not entirely sure how we end up in this mixed version case. Looking here: cockroach/pkg/config/system.go Lines 311 to 319 in 4b8258c
Doesn't this mean we should be supporting the rangefeeds we need? It also looks like we have this logic to support rangefeeds if a config is not explicitly set on a replica. After typing all of that, I came across this logic where we seem to be swallowing an error and applying the default span config on the replica -- maybe this is what's causing the issue? cockroach/pkg/kv/kvserver/store.go Lines 2316 to 2328 in 14001c5
If we don't want to bring back the old key stripping logic, maybe we can set cockroach/pkg/server/server.go Lines 540 to 559 in 6e1cbfe
|
76394: sql: use IndexFetchSpec in TableReader r=RaduBerinde a=RaduBerinde #### row: replace fetcher args with IndexFetchSpec This commit replaces the row fetcher table args with IndexFetchSpec. Release note: None #### sql: use IndexFetchSpec in TableReader This commit removes the table descriptor from TableReader and replaces it with an IndexFetchSpec. Eventually, we will use the same `IndexFetchSpec` to form the columnar batch directly in KV. Release note: None 76415: dev: address a few paper cuts r=irfansharif a=irfansharif See individual commits, works through a few of the known paper cuts with dev/bazel (#75453). 76420: sql: initialize tenant span config with rangefeeds enabled r=irfansharif a=ajwerner Not doing this causes problems when we construct new tenants as they won't be able to establish a rangefeed until their first full reconciliation completes and then propagates. Even this is not great. If the preceding range did not have rangefeeds enabled, it would take a closed timestamp interval for this enablement to propagate. Perhaps this is evidence that we should always carve out a span at the end of the keyspace and set it to have rangefeeds enabled. I'll leave that fix and testing of this to somebody else. Hope this is helpful enough on its own. cc `@irfansharif` and `@arulajmani.` I believe this problem has been blocking `@RaduBerinde.` Fixes #76331 Release note: None 76437: clisqlshell: handle interactive query cancellation r=rafiss a=knz Fixes #76433. As of this PR, there's a bug in lib/pq which forces the session to terminate when any query gets cancelled. We find this unacceptable and we plan to fix it later. Release note (cli change): The interactive SQL shell (`cockroach sql`, `cockroach demo`) now supports interrupting a currently running query with Ctrl+C, without losing access to the shell. 76442: roachtest: try to stabilize ORM nightlies r=RichardJCai a=rafiss fixes #76428 fixes #76426 fixes #68363 touches #76231 touches #76016 touches #76011 Ruby-pg and psycopg2 tests started passing because we now support pgwire cancel. Other tests (Django, Hibernate, and PGJDBC) seem flaky after the testing settings change, so try to use slightly more conservative values. Release note: None 76450: ci: add extra logging of `$TC_SERVER_URL` in pebble metamorphic nightly r=nicktrav a=rickystewart Release note: None Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
Yes, I think so. There's a migration bug theorized above, but I don't think it holds -- sending a PR with a test to prove it. As for how #74555 could affect the non-system tenant, I'm not sure yet. Do you have a repro I could use to see what you were seeing? |
\cockroachdb#74555 starts using the span configs infrastructure to control whether rangefeeds are enabled over a given range. Before dynamic system table IDs (cockroachdb#76003), we used the range's key boundaries to determine whether the range in question was for a system table ID. In mixed-version clusters, it's possible to have both forms of this check. To ensure things work in this form (something we suspected in cockroachdb#76331), we add a test. NB: The reason things still work is because in cockroachdb#74555 we modified the system config span to hard code the relevant config fields for constant system table IDs -- behaving identically to previous version nodes. Release note: None
I took a look at the zipped store being referenced above, confirming that I saw the same as described -- rangefeeds were wedged on server startup. I also noticed that if the server process was restarted, things worked just fine. Eventually I just disabled the reconciliation job and poked at the store's contents of Anyway, I don't think there's anything to do here, not a bug AFAICT. #76466 confirms that mixed-version state with rangefeeds work just fine; we were careful about hardcoding the right defaults in #74555. |
Fixes cockroachdb#76331. Not doing this causes problems when we construct new tenants as they won't be able to establish a rangefeed until their first full reconciliation completes and then propagates. This isn't "buggy", but it is slow. Even this is not great. If the preceding range did not have rangefeeds enabled, it would take a closed timestamp interval for this enablement to propagate. Perhaps this is evidence that we should always carve out a span at the end of the keyspace and set it to have rangefeeds enabled. I'll leave that fix and testing of this to somebody else. Hope this is helpful enough on its own. Release note: None Co-authored-by: irfan sharif <[email protected]>
\cockroachdb#74555 starts using the span configs infrastructure to control whether rangefeeds are enabled over a given range. Before dynamic system table IDs (cockroachdb#76003), we used the range's key boundaries to determine whether the range in question was for a system table ID. In mixed-version clusters, it's possible to have both forms of this check. To ensure things work in this form (something we suspected in cockroachdb#76331), we add a test. NB: The reason things still work is because in cockroachdb#74555 we modified the system config span to hard code the relevant config fields for constant system table IDs -- behaving identically to previous version nodes. Release note: None
76466: spanconfig: verify migration for rangefeed enablement r=irfansharif a=irfansharif \#74555 starts using the span configs infrastructure to control whether rangefeeds are enabled over a given range. Before dynamic system table IDs (#76003), we used the range's key boundaries to determine whether the range in question was for a system table ID. In mixed-version clusters, it's possible to have both forms of this check. To ensure things work in this form (something we suspected in #76331), we add a test. NB: The reason things still work is because in #74555 we modified the system config span to hard code the relevant config fields for constant system table IDs -- behaving identically to previous version nodes. Release note: None Co-authored-by: irfan sharif <[email protected]>
\#74555 starts using the span configs infrastructure to control whether rangefeeds are enabled over a given range. Before dynamic system table IDs (#76003), we used the range's key boundaries to determine whether the range in question was for a system table ID. In mixed-version clusters, it's possible to have both forms of this check. To ensure things work in this form (something we suspected in #76331), we add a test. NB: The reason things still work is because in #74555 we modified the system config span to hard code the relevant config fields for constant system table IDs -- behaving identically to previous version nodes. Release note: None
\cockroachdb#74555 starts using the span configs infrastructure to control whether rangefeeds are enabled over a given range. Before dynamic system table IDs (cockroachdb#76003), we used the range's key boundaries to determine whether the range in question was for a system table ID. In mixed-version clusters, it's possible to have both forms of this check. To ensure things work in this form (something we suspected in cockroachdb#76331), we add a test. NB: The reason things still work is because in cockroachdb#74555 we modified the system config span to hard code the relevant config fields for constant system table IDs -- behaving identically to previous version nodes. Release note: None
I am running into a problem which seems to be caused by system table rangefeeds in the tenant not working. I distilled the change I was working on to just a delay during tenant initialization.
Apply this diff:
Then run:
Sometimes the test passes, but usually it gets stuck and you see this in logs:
This is on 4df7f50.
@ajwerner suspects #74555.
The text was updated successfully, but these errors were encountered: