-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: -0 is not converted to 0, causing spurious tlp failure #77279
Comments
roachtest.tlp failed with artifacts on master @ 0e1edf3ad9da29b1d3bb28fcfc0f293d700e3cb9:
|
Neither of these failures seem to be a logic error, as the only difference in output is that one includes 0 while the other includes -0. We should figure out how to remove this source of spurious failures. |
Latest one reduced down to: CREATE TABLE table1 (
col1_0 UUID,
col1_1 DATE,
col1_3 INT8,
col1_4 BIT(48) NULL,
col1_5 BOOL,
col1_6 FLOAT4,
col1_7 REGPROC,
col1_8 BIT(9),
col1_10 TIME,
col1_12 STRING AS (lower(CAST(col1_10 AS STRING))) VIRTUAL,
PRIMARY KEY (col1_6 DESC),
UNIQUE (lower(CAST(col1_5 AS STRING)), col1_12 ASC) STORING(col1_1, col1_4, col1_8),
UNIQUE (col1_12 DESC),
INDEX (col1_1 ASC),
INDEX (col1_8)
);
INSERT
INTO table1 (col1_0, col1_1, col1_3, col1_5, col1_6)
VALUES ('a4ecdba0-6c5c-4427-b6d2-a856d72d1b29', '1979-10-30', 0, false, 0);
SELECT col1_6 FROM table1;
SELECT col1_6 FROM table1 WHERE NOT col1_5; Interestingly we never insert -0 into the table. |
So we have some kind of corruption:
|
Reduced a little further: CREATE TABLE table1 (
col1_0 UUID,
col1_1 DATE,
col1_3 INT8,
col1_4 BIT(48),
col1_5 BOOL,
col1_6 FLOAT4,
col1_8 BIT(9),
col1_10 TIME,
col1_12 STRING,
PRIMARY KEY (col1_6 DESC),
UNIQUE (col1_12 ASC) STORING(col1_1, col1_4, col1_8),
INDEX (col1_1 ASC)
);
INSERT
INTO table1 (col1_0, col1_1, col1_3, col1_5, col1_6)
VALUES ('a4ecdba0-6c5c-4427-b6d2-a856d72d1b29', '1979-10-30', 0, false, 0);
SELECT col1_6 FROM table1@table1_pkey; |
I couldn't stop myself from looking into this :) One way to fix this is:
|
Simplest repro:
|
That was fast! |
Is it possible that this wasn't a TLP false-positive all along, and instead this bug? |
I reduced the failure involving MAX to: CREATE TABLE table1 (
col1_0 FLOAT8 NOT NULL,
PRIMARY KEY (col1_0 ASC),
INDEX (col1_0 DESC),
INDEX (col1_0 ASC),
INDEX (col1_0 DESC)
);
INSERT INTO table1 (col1_0) VALUES (0); And in this case I think one query uses an ascending index and the other a descending index (the bug only affects descending indexes). So yeah, I think this bug was probably the underlying issue for all of these. |
That's great news! Shame on us for doubting TLP. 😉 |
Fixes: cockroachdb#77279 Our old friend -0 is back. This time the problem was in DecodeFloatDescending, which was decoding both +0 and -0 as -0. Fix it to decode both as +0. (-0 is then properly decoded as -0 when the value part of the composite encoding is decoded.) Note that the on-disk data was correct. Only the decoding was affected. Release note (bug fix): Queries reading from an index or primary key on FLOAT or REAL columns DESC would read -0 for every +0 value stored in the index. Fix this to correctly read +0 for +0 and -0 for -0. Co-authored-by: Yahor Yuzefovich <[email protected]>
Well, during the last collab session we did come up with a valid case when TLP could produce a false positive with 0 vs -0:
|
@yuzefovich good point, thanks for reminding me. I'll go ahead and make changes to TLP, too. |
79085: ingesting,backupccl: fix bug in privileges of restored descriptors r=adityamaru a=adityamaru Previously, when restoring a table descriptor as part of a non-cluster backup, we were incorrectly generating its privileges without passing in the parent schema's DefaultPrivilegeDescriptor. This change fixes that. Additionally, this change rewrites the `restore-grants` datadriven test to be more understandble. There were also some errors in the test that have been fixed in this change. The expected behavior is that: 1) A cluster restore will just restore all privileges as was backed up. 2) A database restore will first generate ALL privileges for root and admin on the database, and subsequent table, type, schema descriptors will inherit these privileges. 3) A table restore into an existing database will generate its set of privileges based on the default privileges of the parentDB and parentSchema. All restored schema objects are owned by the user running the restore. All backed up privileges on these descriptors are wiped, since there is no guarantee that a backed up user will be present in the restoring cluster. Release note (bug fix): Privileges for restored table's were being generated incorrectly without taking into consideration their parent schema's default privilege descriptor. 79407: cloud: bump orchestrator to v21.2.8 r=aliher1911 a=celiala Release note: None Jira issue: CRDB-14811 79473: encoding: fix DecodeFloatDescending for positive zero r=yuzefovich,rytaft,mgartner a=michae2 Fixes: #77279 Our old friend -0 is back. This time the problem was in DecodeFloatDescending, which was decoding both +0 and -0 as -0. Fix it to decode both as +0. (-0 is then properly decoded as -0 when the value part of the composite encoding is decoded.) Note that the on-disk data was correct. Only the decoding was affected. Release note (bug fix): Queries reading from an index or primary key on FLOAT or REAL columns DESC would read -0 for every +0 value stored in the index. Fix this to correctly read +0 for +0 and -0 for -0. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Celia La <[email protected]> Co-authored-by: Michael Erickson <[email protected]>
Fixes: #77279 Our old friend -0 is back. This time the problem was in DecodeFloatDescending, which was decoding both +0 and -0 as -0. Fix it to decode both as +0. (-0 is then properly decoded as -0 when the value part of the composite encoding is decoded.) Note that the on-disk data was correct. Only the decoding was affected. Release note (bug fix): Queries reading from an index or primary key on FLOAT or REAL columns DESC would read -0 for every +0 value stored in the index. Fix this to correctly read +0 for +0 and -0 for -0. Co-authored-by: Yahor Yuzefovich <[email protected]>
Fixes: #77279 Our old friend -0 is back. This time the problem was in DecodeFloatDescending, which was decoding both +0 and -0 as -0. Fix it to decode both as +0. (-0 is then properly decoded as -0 when the value part of the composite encoding is decoded.) Note that the on-disk data was correct. Only the decoding was affected. Release note (bug fix): Queries reading from an index or primary key on FLOAT or REAL columns DESC would read -0 for every +0 value stored in the index. Fix this to correctly read +0 for +0 and -0 for -0. Release justification: low-risk change to existing functionality. Co-authored-by: Yahor Yuzefovich <[email protected]>
Assists: cockroachdb#77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None
Assists: cockroachdb#77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None
76504: *: fix system config for tenant r=ajwerner a=ajwerner Follow-on from #76279. Fixes #75864. This commit adds a mechanism to combine the system config data of the tenant with the data provided over the GossipSubscription from the system tenant. It then plumbs a version into the zone config methods. In the mixed version state, the tenant uses the existing override from the system tenant. After the span config infrastructure has been activated, the tenant uses the overrides they've set. This affects, realistically, just the GC job, and, to a lesser extent, the optimizer. Release note: None 78595: Week -1: follow-up emails r=rail a=celiala This PR adds follow-up emails for: - the day before `PrepDate` (to be sent if only there are still blockers) - the day of `PrepDate` (to be sent if only there are still blockers) - the day after `PrepDate` (to be sent if only there are still blockers) The `post-blockers` command should be used for all emails before and up to `PrepDate`: ``` # where `DAYS_BEFORE_PREP` is one of: many-days | day-before | day-of SMTP_PASSWORD=$MY_SMTP_PWD SMTP_USER=$MY_SMTP_USER GITHUB_TOKEN=$MY_GITHUB_TOKEN \ $(bazel info bazel-bin)/pkg/cmd/release/release_/release post-blockers \ --release-series 21.1 \ --smtp-user $MY_SMTP_USER --smtp-host smtp.gmail.com --smtp-port 587 \ --to $MY_TO_EMAIL \ --template-dir ./pkg/cmd/release/templates \ --prep-date "Apr 1, 2022" --publish-date "May 1, 2022" \ --days-before-prep-date $DAYS_BEFORE_PREP ``` The `cancel-release-date` should be used for the day after `PrepDate` email: ``` SMTP_PASSWORD=$MY_SMTP_PWD SMTP_USER=$MY_SMTP_USER GITHUB_TOKEN=$MY_GITHUB_TOKEN \ $(bazel info bazel-bin)/pkg/cmd/release/release_/release cancel-release-date \ --release-series 21.1 \ --smtp-user $MY_SMTP_USER --smtp-host smtp.gmail.com --smtp-port 587 \ --to $MY_TO_EMAIL \ --template-dir ./pkg/cmd/release/templates \ --publish-date "May 1, 2022" --next-publish-date "Jun 1, 2022" ``` ### TeamCity changes This will be paired with the following TeamCity jobs: * [Send release blockers - 78595](https://teamcity.cockroachdb.com/buildConfiguration/Internal_Cockroach_Release_Process_SendReleaseBlockers_78595?branch=78595&buildTypeTab=overview&mode=builds#all-projects) * [Cancel release date - 78595](https://teamcity.cockroachdb.com/buildConfiguration/Internal_Cockroach_Release_Process_CancelReleaseDate_78595?branch=78595&mode=builds#all-projects) ![image](https://user-images.githubusercontent.com/3051672/160359980-93ac438f-d8e6-4f6c-a0e0-00fab55b93bc.png) #### Manual tests ##### Test 1: post-blockers --days-before-prep-date many-days <img width="300" src="https://user-images.githubusercontent.com/3051672/160353978-12e8d2e9-1900-471b-b785-f247d19f474a.png"> ##### Test 2: post-blockers --days-before-prep-date day-before <img width="300" src="https://user-images.githubusercontent.com/3051672/160353943-8514a0d4-87f4-4ab7-8d67-78bb42f427c2.png"> ##### Test 3: post-blockers --days-before-prep-date day-of <img width="300" src="https://user-images.githubusercontent.com/3051672/160353904-d18e6d68-594c-48dc-8308-6f0c3621fe91.png"> ##### Test 4: cancel-release-date <img width="300" src="https://user-images.githubusercontent.com/3051672/160353668-1e2886f5-c39f-4d54-884d-f967ba06a114.png"> 79551: roachtest: use SQL comparison in TLP r=michae2 a=michae2 Assists: #77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None 79767: util/contextutil: clarify `RunWithTimeout` error message r=erikgrinaker a=erikgrinaker `RunWithTimeout` gave an error message of this form: ``` operation "send-snapshot" timed out after 1m (took 1m): context deadline exceeded ``` However, this can be misleading because either the caller or callee can set their own context timeout that is less than then one set by `RunWithTimeout`, in which case it looks like: ``` operation "send-snapshot" timed out after 1m (took 1s): context deadline exceeded ``` This is even worse on old versions, where the "took 1s" is not included, leading the reader to believe the operation actually took 1m when it only took 1s. This patch clarifies the error message somewhat: ``` operation "send-snapshot" timed out after 1s (given timeout 1m): context deadline exceeded ``` Resolves #79424. Release note: None 79787: protectedts: fix panic when fetching protectedts records r=adityamaru a=adityamaru Previously, we incorrectly assumed that all records in the `system.protectedts_records` table would have a non NULL target column. While this is enforced for all protected timestamp records written in a 22.1+ cluster, this is not true for records that might have been prior to the 22.1 cluster version being finalized. This change makes the method responsible for reading protectedts records more defensive when encountering a NULL target column. Fixes: #79684 Release note: None 79796: Publish cockroach-sql in publish bleeding edge r=rickystewart a=rail Previously, the `cockroach-sql` tool was published only as a part of the release workflow. The first commit refactors the way how we publish and test. This is very similar to what we use in publish-provisional-artifacts. The second commit adds the cockroach-sql binary to the list of artifacts to publish. Release note: None Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Celia La <[email protected]> Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Rail Aliiev <[email protected]>
Assists: #77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None
Assists: cockroachdb#77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None
roachtest.tlp failed with artifacts on master @ e151a24dc1d495bf694e1e1c3ee8acb518c1424e:
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-13514
The text was updated successfully, but these errors were encountered: