Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
106485: opt: only infer self-join equality with a key over the base table r=DrewKimball a=DrewKimball

#### opt: only infer self-join equality with a key over the base table

Self-join equality inference was added by #105214, so that the `FuncDeps`
for a self-join would include equalities between *every* pair of columns
at the same ordinal position in the base table if there was an equality
between key columns (also at the same ordinal position). However, the
key column check was performed using the FDs of the join inputs rather
than the base table's FDs. This could lead to incorrectly adding self-join
equality filters. For example, consider the following:
```
CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL);
INSERT INTO t106371 VALUES (1, 1), (1, 2);

SELECT * FROM t106371 a JOIN t106371 b ON a.x = b.x;

SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a
JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x;
```
In the first query above, `a.x = b.x` does not consitute joining on
key columns. But in the second query, one input has one row and the
other de-duplicated by the `x` column and so `x` is a key over both
inputs. However, the query as written will select different rows for
each input - `a` will return the `(1, 2)` row, while `b` will return
the `(1, 1)` row. Inferring a `a.y = b.y` filter will incorrectly cause
the join to return no rows.

This patch fixes the problem by requiring the initial self-join
equalities to form a key over the *base* table, not just the inputs
of the join.

Fixes #106371

Release note: None

106779: acceptance: Fix DockerCSharp Test r=rimadeodhar a=rimadeodhar

This PR updates the .net framework within the csproj setup to .net 6. 
Additionally, it also fixes the CS program that we run as
a part of this test with the new npgsql DateTime
changes made as a part of npgsql 6 version update. The old datetime 
types have been deprecated and needed to be removed. 
With these fixes, the test is running successfully and can be unskipped.

Epic: none
Fixes: #86852
Release note: none

107104: cli: fix debug zip with new columns for cluster settings r=maryliag a=maryliag

The addition of new columns on cluster_settings view done on #104449 were causing debug zip fails to add the information from that table, since the query used to gather the information was doing a join and not considering the new columns.

This commit updates the query to use the explicit columns, so even if new columns are added it won't be a problem in the future. It also adds tests for all custom querys used to generate the debug zip, so this type of issue would have been caught.

The file `crdb_internal.cluster_settings.txt` in debug zips was
empty due to this bug on v23.1.5 (only version affected by this bug).

Fixes #107103

Release note (bug fix): Debug zip now are properly showing the information from cluster_settings. The file `crdb_internal.cluster_settings.txt` in debug zips was
empty due to this bug on v23.1.5 (only version affected by this bug).

Co-authored-by: Drew Kimball <[email protected]>
Co-authored-by: rimadeodhar <[email protected]>
Co-authored-by: maryliag <[email protected]>
  • Loading branch information
4 people committed Jul 18, 2023
4 parents cd0381f + 1da6390 + 0d7d2cc + be8c5c1 commit 06b5ba7
Show file tree
Hide file tree
Showing 10 changed files with 162 additions and 34 deletions.
1 change: 0 additions & 1 deletion pkg/acceptance/adapter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ func TestDockerC(t *testing.T) {
}

func TestDockerCSharp(t *testing.T) {
skip.WithIssue(t, 86852, "test not to up to date anymore")
s := log.Scope(t)
defer s.Close(t)

Expand Down
12 changes: 6 additions & 6 deletions pkg/acceptance/testdata/csharp/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ static void Simple(string connString)
using (var cmd = new NpgsqlCommand("SELECT id, balance FROM bank.accounts", conn))
using (var reader = cmd.ExecuteReader())
while (reader.Read())
Console.Write("\taccount {0}: {1}\n", reader.GetString(0), reader.GetString(1));
Console.Write("\taccount {0}: {1}\n", reader.GetInt32(0), reader.GetInt32(1));
}
}

Expand Down Expand Up @@ -138,7 +138,7 @@ static void TxnSample(string connString)
using (var cmd = new NpgsqlCommand("SELECT id, balance FROM bank.accounts", conn))
using (var reader = cmd.ExecuteReader())
while (reader.Read())
Console.Write("\taccount {0}: {1}\n", reader.GetString(0), reader.GetString(1));
Console.Write("\taccount {0}: {1}\n", reader.GetInt32(0), reader.GetInt32(1));
}
}

Expand Down Expand Up @@ -200,8 +200,8 @@ static void UnitTest(string connString)
{
throw new DataException(String.Format("SELECT can't find inserted value: read {0}, expecting 42", a));
}
var ts = reader.GetTimeStamp(1);
var expectedTs = new NpgsqlTypes.NpgsqlDateTime(2015, 5, 7, 18, 20, 0, DateTimeKind.Unspecified);
var ts = reader.GetDateTime(1);
var expectedTs = new DateTime(2015, 5, 7, 18, 20, 0, DateTimeKind.Unspecified);
if (ts != expectedTs)
{
throw new DataException(String.Format("SELECT unexpected value for ts: read {0}, expecting {1}", ts, expectedTs));
Expand All @@ -212,7 +212,7 @@ static void UnitTest(string connString)
using (var cmd = new NpgsqlCommand("INSERT INTO test.f VALUES (@x, @ts)", conn))
{
cmd.Parameters.Add("x", NpgsqlTypes.NpgsqlDbType.Integer).Value = 1;
cmd.Parameters.Add("ts", NpgsqlTypes.NpgsqlDbType.Timestamp).Value = new NpgsqlTypes.NpgsqlDateTime(DateTime.Now);
cmd.Parameters.Add("ts", NpgsqlTypes.NpgsqlDbType.Timestamp).Value = DateTime.Now;
cmd.Prepare();
var rows = cmd.ExecuteNonQuery();
if (rows != 1)
Expand Down Expand Up @@ -311,7 +311,7 @@ static void UnitTest(string connString)
{
cmd.Parameters.Add("id", NpgsqlTypes.NpgsqlDbType.Integer).Value = 1;
cmd.Parameters.Add("balance", NpgsqlTypes.NpgsqlDbType.Integer).Value = 1000;
cmd.Parameters.Add("cdate", NpgsqlTypes.NpgsqlDbType.Date).Value = new NpgsqlTypes.NpgsqlDate(DateTime.Now);
cmd.Parameters.Add("cdate", NpgsqlTypes.NpgsqlDbType.Date).Value = DateTime.Now;
cmd.Prepare();
var rows = cmd.ExecuteNonQuery();
if (rows != 1)
Expand Down
2 changes: 1 addition & 1 deletion pkg/acceptance/testdata/csharp/acceptance.csproj
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.0</TargetFramework>
<TargetFramework>net6.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="npgsql" Version="6.0.4" />
Expand Down
18 changes: 9 additions & 9 deletions pkg/cli/zip_table_registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -217,15 +217,15 @@ var zipInternalTablesPerCluster = DebugZipTableRegistry{
},
},
"crdb_internal.cluster_settings": {
customQueryRedacted: `SELECT * FROM (
SELECT *
FROM crdb_internal.cluster_settings
WHERE type <> 's'
) UNION (
SELECT variable, '<redacted>' as value, type, public, description
FROM crdb_internal.cluster_settings g
WHERE type = 's'
)`,
customQueryRedacted: `SELECT
variable,
CASE WHEN type = 's' AND value != default_value THEN '<redacted>' ELSE value END value,
type,
public,
description,
default_value,
origin
FROM crdb_internal.cluster_settings`,
},
"crdb_internal.cluster_transactions": {
// `last_auto_retry_reason` contains error text that may contain
Expand Down
39 changes: 39 additions & 0 deletions pkg/cli/zip_table_registry_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,17 @@
package cli

import (
"context"
"fmt"
"testing"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/sql/tests"
"github.com/cockroachdb/cockroach/pkg/testutils/serverutils"
"github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func TestQueryForTable(t *testing.T) {
Expand Down Expand Up @@ -134,3 +140,36 @@ you must redact them at the SQL level.`
}
}
}

func executeAllCustomQuerys(
t *testing.T, sqlDB *sqlutils.SQLRunner, tableRegistry DebugZipTableRegistry,
) {
for table, regConfig := range tableRegistry {
if regConfig.customQueryRedacted != "" {
rows := sqlDB.Query(t, regConfig.customQueryRedacted)
require.NoError(t, rows.Err(), "failed to select for table %s redacted", table)
}

if regConfig.customQueryUnredacted != "" {
rows := sqlDB.Query(t, regConfig.customQueryUnredacted)
require.NoError(t, rows.Err(), "failed to select for table %s unredacted", table)
}
}
}

func TestCustomQuery(t *testing.T) {
defer leaktest.AfterTest(t)()
ctx := context.Background()

params, _ := tests.CreateTestServerParams()
cluster := serverutils.StartNewTestCluster(t, 3 /* numNodes */, base.TestClusterArgs{
ServerArgs: params,
})
testConn := cluster.ServerConn(0 /* idx */)
sqlDB := sqlutils.MakeSQLRunner(testConn)
defer cluster.Stopper().Stop(ctx)

executeAllCustomQuerys(t, sqlDB, zipInternalTablesPerCluster)
executeAllCustomQuerys(t, sqlDB, zipInternalTablesPerNode)
executeAllCustomQuerys(t, sqlDB, zipSystemTables)
}
12 changes: 12 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/join
Original file line number Diff line number Diff line change
Expand Up @@ -1205,3 +1205,15 @@ FROM task AS task0_ WHERE EXISTS (
11 taskWithPatient1WithValidSite1 5
12 taskWithPatient2WithValidSite1 6
13 taskWithPatient3WithValidSite2 7

# Regression test for #106371 - only infer self-join equalities if the original
# equality columns form a key over the *base* table, not just the join inputs.
statement ok
CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL);
INSERT INTO t106371 VALUES (1, 1), (1, 2);

query IIII
SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a
JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x;
----
1 2 1 1
8 changes: 4 additions & 4 deletions pkg/sql/opt/memo/logical_props_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -2550,14 +2550,15 @@ func (h *joinPropsHelper) addSelfJoinImpliedFDs(rel *props.Relational) {
return
}
for leftTable, leftTableOrds := range leftTables {
baseTabFDs := MakeTableFuncDep(md, leftTable)
for rightTable, rightTableOrds := range rightTables {
if md.TableMeta(leftTable).Table.ID() != md.TableMeta(rightTable).Table.ID() {
continue
}
// This is a self-join. If there are equalities between columns at the
// same ordinal positions in each (meta) table and those columns form a
// key on each input, *every* pair of columns at the same ordinal position
// is equal.
// key on the base table, *every* pair of columns at the same ordinal
// position is equal.
var eqCols opt.ColSet
colOrds := leftTableOrds.Intersection(rightTableOrds)
for colOrd, ok := colOrds.Next(0); ok; colOrd, ok = colOrds.Next(colOrd + 1) {
Expand All @@ -2567,8 +2568,7 @@ func (h *joinPropsHelper) addSelfJoinImpliedFDs(rel *props.Relational) {
eqCols.Add(rightCol)
}
}
if !eqCols.Empty() && h.leftProps.FuncDeps.ColsAreStrictKey(eqCols) &&
h.rightProps.FuncDeps.ColsAreStrictKey(eqCols) {
if !eqCols.Empty() && baseTabFDs.ColsAreStrictKey(eqCols) {
// Add equalities between each pair of columns at the same ordinal
// position, ignoring those that aren't part of the output.
for colOrd, ok := colOrds.Next(0); ok; colOrd, ok = colOrds.Next(colOrd + 1) {
Expand Down
12 changes: 7 additions & 5 deletions pkg/sql/opt/memo/testdata/logprops/join
Original file line number Diff line number Diff line change
Expand Up @@ -2951,16 +2951,18 @@ inner-join (hash)
├── variable: xyz.x:1 [type=int]
└── variable: foo.x:6 [type=int]

# Self-join filters can be inferred even if the join key wasn't a key in the
# Self-join filters can only be inferred even if the join key was a key in the
# base table.
# TODO(drewk): if the optimizer could see that the same row is being selected
# from each input, we could still infer equalities.
norm
SELECT * FROM xyz INNER JOIN xyz foo ON xyz.x = foo.x AND xyz.y = 1 AND foo.y = 1
----
inner-join (hash)
├── columns: x:1(int!null) y:2(int!null) z:3(int) x:6(int!null) y:7(int!null) z:8(int)
├── multiplicity: left-rows(zero-or-one), right-rows(zero-or-one)
├── key: (6)
├── fd: ()-->(2,7), (1)-->(3), (6)-->(8), (1)==(6), (6)==(1), (2)==(7), (7)==(2), (3)==(8), (8)==(3)
├── fd: ()-->(2,7), (1)-->(3), (6)-->(8), (1)==(6), (6)==(1)
├── prune: (3,8)
├── interesting orderings: (+1 opt(2)) (+6 opt(7))
├── select
Expand Down Expand Up @@ -3002,16 +3004,16 @@ inner-join (hash)
├── variable: xyz.x:1 [type=int]
└── variable: foo.x:6 [type=int]

# The optimizer doesn't detect the contradiction here because the "y" filters
# pushed down before the join properties are calculated.
# Self-join filters cannot be inferred here because different rows are selected
# from each input.
norm
SELECT * FROM xyz INNER JOIN xyz foo ON xyz.x = foo.x AND xyz.y = 1 AND foo.y = 2
----
inner-join (hash)
├── columns: x:1(int!null) y:2(int!null) z:3(int) x:6(int!null) y:7(int!null) z:8(int)
├── multiplicity: left-rows(zero-or-one), right-rows(zero-or-one)
├── key: (6)
├── fd: ()-->(2,7), (1)-->(3), (6)-->(8), (1)==(6), (6)==(1), (2)==(7), (7)==(2), (3)==(8), (8)==(3)
├── fd: ()-->(2,7), (1)-->(3), (6)-->(8), (1)==(6), (6)==(1)
├── prune: (3,8)
├── interesting orderings: (+1 opt(2)) (+6 opt(7))
├── select
Expand Down
Loading

0 comments on commit 06b5ba7

Please sign in to comment.