-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/schemachanger: declarative state generated on gc_job_mixed is not backwards compatible #91528
Comments
@chengxiong-ruan can you take a look at this when you have a moment, please? @fqazi added these corpus tests and they have found a serious bug for the first time! Unfortunately, reproducing these failures is not straightforward and there is no tooling to help do so. I'm deliberately relying on the fact that you know little about these tests, to ask @fqazi the right questions with a fresh perspective, the goal of this task is not particularly to fix this bug but for you to figure out how to make it easy to repro these test failures to identify the root cause. We'll deal with the bug later (probably me, because I probably was the one that introduced it) |
I finally had a chance to repro this, sorry for the delay
# create the cluster (2 nodes cluster is fine as well)
roachprod create local -n 3
# stage with latest v22.2 branch build and start
roachprod stage local cockroach 87ce2ca0f017e92e86b04f76de12347b47de1dd8
roachprod start local
# stop one node, replace cockroach binary with your binary built and restart the node
roachprod stop local:1
roachprod put local:1 ./cockroach
roachprod start local:1
roachprod run local:1-2 -- touch data/DISABLE_STARTING_BACKGROUND_JOBS
CREATE SEQUENCE s6;
CREATE VIEW v AS SELECT nextval('s6');
DROP SEQUENCE s6 CASCADE; Now you should see panics from the node |
If we can have a tool to replay a logic test file until the failing statement, that'd be super cool that we don't need to figure what's going on with the statements before the failing statement. |
Looks like we are moving columns from |
Looks like this new behavior was introduced by #90544 |
@Xiang-Gu this might be interesting to you. |
Another thing that might help debug this sort of failure might just be to have a string associated with each state in the corpus that tells us the test and line it came from. It's not as fancy, but it's a lot simpler to implement and almost as useful. |
Something is suspicious here:
I will try to look a bit more into this after I'm done with some other work at hand. |
@chengxiong-ruan is right: The error is bc of the change in behavior on column element when dropping the table.
Detailed explanation: So far everything is fine bc ideally we should just allow such an operation to be scheduled. However, the logic inside |
@Xiang-Gu please look into this at your leisure. If this is not enough throw it back into triage please. |
Previously, there is a niche case where we might run into a backward incompatible issue (see cockroachdb#91528). We decided to fix it by relaxing the sanity check that caused the issue and backport the change to relase-22.2, so this backward incompatibility issue is contained only between release-22.2.0 and master, which we think should be a rare occurrance in the wild. Epic: None Release note: None
Reopening to not forget about the backport. |
Previously, there is a niche case where we might run into a backward incompatible issue (see cockroachdb#91528). We decided to fix it by relaxing the sanity check that caused the issue and backport the change to relase-22.2, so this backward incompatibility issue is contained only between release-22.2.0 and master, which we think should be a rare occurrance in the wild. Epic: None Release note: None
The mixed version corpus validation test for gc_job_mixed encountered a potential state that is not backwards compatible between master and 22.2. As a result when we attempt to plan on this state generated on a mixed version state from master:
failed to validate TestLogic_gc_job_mixed_TestLogic_gc_job_mixed_DROP TABLE db.kv_0 with error [[Column:{DescID: 108, ColumnID: 1}, ABSENT], ABSENT] should not yet be scheduled for this stage, ERROR: [[Column:{DescID: 160, ColumnID: 1}, ABSENT], ABSENT] should not yet be scheduled for this stage SQLSTATE: XX000 DETAIL: dependencies graphviz: https://cockroachdb.github.io/scplan/viz.html#H4sIAAAAAAAA/+xc727juBH/HD+F4AJFe9juWv+lXmIgG2fbPbRJEGfRD8HBoEXGZleWVIrabK5Y4F6rfZx7kkKiTVmMbFMWrTjFfow4nOH8ZjicGdKBeEZAMte0f/dO0mzK/gjCLKWITFI9/1zz3Sy+n4RgisKzfkoBRQsU0bT/Y/45su4f4oim+Bd01vf7bxjZ6SkF0xBpQQjS9KwfZxSR/vCUkuEphauvcwRg/jkNkunb8Yrv6TsKh6fvCtKCfFjlVfzBeQ1vEQQBRVCYT+FwdHt9o43O787fn48vtd9+/Q/87df/ahfn44vz0eVzKbsYQImpd2BWP3ttzrtCgWH5pbf61Bu+SecgQWf9KI5Q/+cC397JybdasxgVs1BAZojbxG5tkx8Ko1yBBUoTEKCGRoEoDQhOaEw+Qg6IbnvP2URggTgF3B+myFGk8vVjhMjB1I1L7hQOSRzTFiq7ilT+lCJyQ/AXHKIZSg+meyKIoHBoPKfKUkSu1p0CwAWOWqDkvX6UBlYN4TOgkmwa4qAFUv7rR0oCppabTh8oQmkEKJiCtHFoXU7biE5jhXTFCt3GIRojSnE0U63bko7EIarYdDJJQhCgeRxCRCb58CQ/WCaTNrgYL32ISqKxYcf4Ow7b1uFCN4/vyK1Runrktj1N9Pbp7uGCZI32L3To6qoy0JeDydadLg5dXVXietwO1fbUVZXrjoM5WoCG6OD0Zs3MFA4pyWqieVrw3ghgY51VZa5M5xtAmlfXSTFpJHkSqQZAVUL6MiewM9hxAlN7MHD8SY03KgbSUJWzqjup68BRWhwbqtLaA8TVOuVf6KA2VGW5xw1TW2dSlere5RMaglP8uRGXxqqoSmCvp/9EAd3rVImLqTtNXbAeKw6FqhLTizjMFtEHsMDhU0P1q2UYwQtAalgoNruqNJPp3VDjoJi0rkyNuWfnlBI8zSi6yhbbKBUjoyq1ZMiUYUcpOhWf+XpwTFSlngyTu6fkEJjg9CoLwzKibkzMt4OzIuKL3LW2h7VNT+HwY0Q3hYEYl0KNGpmPGNI5p3CsbSZrbkRV6bOaLV9zTOP0rxhCFO0w34bIUMNQ7S4wVaXN6iJDjc5RNc95xPDguKhKrdVFh8a+8P+66U1V+TyzzQg9gCykl18TgtIUx4eIAuhrUlZ9WYT/laFJ4cd/+OPBPVlVXq8mRFpvDd/yHdfwbfT731nmj4OaBpxk0MTp+CmlaLG+sqYRVmo9ii2iqjxRF3OlUKhE4YDA6QRHFJEIhJPFlyCYULxAKQWL5OAAqi1wVARnSbc+imRuhAK8AKFMbNfdwVZDNTfdcdVopdmsI4tGW9ajeDMdX20ohUK1zZxzjDtIC4+vaJR04aOIPNcYSmWUjtqYo6pI/BhB9PVwlWLOfXs3QK0zW6pqP4W41LVAduPyGUfl+Pju+vZydHDwVBWIN6w5WmDYGL0opQTgiLZGEKefiqqkVXhojKGqQq4Ab4+TRwKY2tvMSfIZHbqdvXwUvXz7DFHCHj5Hlnm//Db48/n78eXVHZsQ2dZqQBdHbD6iCyMuHzGEEWdtZHR7fXNzOVoNeXzIFCb5fMSqjjgDPmILIzofcYQRg4+4wkipqieMmGsj1WU7JQp+dZJVshO4cRAMAVKHA2eIczg6hgCpw9ExBNxcjo4h4OZydAwBN5ejYwi4uRwDQ8DN5YoaAm4uB8cQwSlHhCkcHFMAx+XgmAI4LgfHFLlxcEwBHI+DYwrgeBwcUwDH4+CYAjgeB8cUwLFKfYQBro4g3+JrFsRbXjkg7B2ujLAum+siLMvmqggms7kmvrh1/zTMd3bb2Hr/cxrEydvrmnp+a1Qd8AD4Q8Hgb/Hs8svqBrW3c/plWP2Jylbi9bdOO6P9YZ441XxZaZJr/Z6/f961vvOMzmOCfwGUNyF3TfnU9BEAx03ZT4lkfg+0nHNX/IDn74gCCKik0cZxRgK09ImPo20J9Dib/iMmn9eI9P3sxtaZa5iVTzDYFpOZrgv+P0IhomjE339I7oNywrpGTX1xLclYjw76PSAkfsy3+lkfYrCII9h/E8RhTM76BMFVwOiXz1Y0SOJEI3g2p9oUPcQEaRAlKIIoohpiBtIIWsRfQFiJREaHsswOZVkdyrI7lOXeb+e7PNE0HGmUgCgFQR6wVjLqOHrdrd5Rg1TBGUezNaGPmM41QCnRCHrgiWgu0t1D5HLpzTQsxLmDbsXt4+YtxHULprdPcNpPnJm7Svurn9eQld1VOo0qX/59T7ZeItkyX0+ydUEQoOgvFz/F0w8xKR1xt4dzkD/EZETi5Kd4KonaHr+q3wJHvh9G+++HZwlfEXhsv5MjmAW5fQ6ofWV1k8gyWd0ksquDQl0Sxjh2k0IyWU6HsrpJLpmsDneR26Fnux16tttNicZkdejzboc+73bo826HPu91eHJ4He4vryOf91g1Wst3OTVlcxBcMWVXsPyKoi0HV2clagsORmsOphoO0lGRzdXoU7JmpFJCxYhrtPzuhZXZLZbrtObgsdq7BQe/LQdv0NZsBQfPaMahuATmN0pt/b/g0GoJZksGVm5Kq/2bl9fQaKj+g5ld1M3/r8z3dsNLtBtqnmG8jnZDxR0lbnhW/jja4I/H0ZnYCIZx0IuuhhvzWd+DxcFuMi8mq5sLISarm8qGyeqmsmGy9mn/7yvLV9hjKTja3dQvTJaa+kXyoouJVFN2NhKppvqUE9k7+dY76Z1ENgscS8Y01pbPa96k9ClEZ30I0jmCxZzIYRtfitZlG1eKdrnxpGh9tnGkaPU8RS7fJ+0g1plTyxEbzCvliE32MkCO2GJX+3LENrublyNeXq7LEef2syVtoi83jByxz1xditgYsB67HLHOmuRyxAbrcssRFw1RSaMYFutAyxHbrIUsR+ywik2O2GUNYzlij3V85Yh9VrtKEZsD1t+VI9ZZK0KO2GAdVjlik3UN5Igt1k+VI7ZZQ1SO2GGNAzlil7U/5Yg91r+UI/ZZA1KK2BqwDqIc8bIlIEdssH6hDHHvW+9/AQAA//+FBjL+71wAAA== HINT: You have encountered an unexpected error.
We need to root cause and identify the first post commit phase in this plan has a non-plannable state on 22.2
For manual reproduction fetch the corpus:
Jira issue: CRDB-21304
The text was updated successfully, but these errors were encountered: