Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: streaming group by with redundant ordering can produce incorrect results #84191

Open
mgartner opened this issue Jul 11, 2022 · 1 comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Jul 11, 2022

If the SimplifyRootOrdering rule does not fire and a query has an ORDER BY clause with the same column and direction listed consecutively, an incorrect result can be produced. Observe in the example below, where SimplifyRootOrdering has been commented out, that the two SELECT statements are equivalent, but produce difference results:

statement ok
CREATE TABLE t (
  a INT,
  b INT,
  c INT
)

statement ok
INSERT INTO t VALUES
  (NULL, NULL, 100),
  (NULL, NULL, NULL),
  (1, 10, 100),
  (NULL, NULL, 100),
  (NULL, 10, 100),
  (NULL, NULL, NULL)

query IIII
SELECT a, b, c, count(*) FROM t
GROUP BY b, a, c
ORDER BY c, b DESC
----
NULL  NULL  NULL  2
1     10    100   1
NULL  10    100   1
NULL  NULL  100   2

query IIII
SELECT a, b, c, count(*) FROM t
GROUP BY b, a, c
ORDER BY c, b DESC, b DESC
----
NULL  NULL  NULL  2
1     10    100   2
NULL  NULL  100   2

It seems like there is a bug in the execution engine that is hidden by the SimplifyRootOrdering rule. As far as I know, there is no danger of this bug ocurring in a production cluster because this rule should always fire for queries like the second select, making the query plans identical.

Jira issue: CRDB-17517

@mgartner mgartner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 11, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 11, 2022
mgartner added a commit to mgartner/cockroach that referenced this issue Jul 11, 2022
The unoptimized query oracle, which disables rules, found a bug in the
execution engine that is only possible to hit if the
`SimplifyRootOrdering` rule is disabled (see cockroachdb#84191). Until the bug is
fixed, we mark the rule as essential so that it is not disabled by these
tests.

Fixes cockroachdb#84067

Release note: None
craig bot pushed a commit that referenced this issue Jul 11, 2022
82161: ui: Add Jest as test runner to DB Console r=nathanstilwell a=nathanstilwell

DB Console is the last place Cockroach Labs is using a test runner other than [Jest](https://jestjs.io/). This PR adds Jest as the test runner intended to replace [Mocha](https://mochajs.org/). Mocha runs in a headless browser via [Karma](https://karma-runner.github.io/latest/index.html) whereas Jest will run tests in NodeJS and simulate a browser environment using [jsdom](https://github.com/jsdom/jsdom). Due to this change in environment, you will see not only files to set up the Jest test runner, but changes to some tests, some mocking of browser globals that are not included in jsdom by default, and some configuration adjustment to the `tsconfig.json`.

Since configuration changes are infrequent and are highly contextual, we decided to err on the side of verbose inline documentation in configuration files.

Details about individual changes to configs or tests are documented in commit messages.

84068: streamingccl: fix span use-after-finish in ingestion frontier r=samiskin a=stevendanna

This fixes the following use-after-finish panic:

    panic: use of Span after Finish. Span: ingestfntr. Finish previously
    called at: <stack not captured. Set debugUseAfterFinish>

    goroutine 1617744 [running]:
    github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).detectUseAfterFinish(0xc002c5f180)
    	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:186 +0x279
    github.com/cockroachdb/cockroach/pkg/util/tracing.(*Tracer).startSpanGeneric(0xc0138b2a50, {0x72291c0, 0xc019ebbe40}, {0x68e4eb2, 0x1d}, {{0x0}, 0x0, {0x0, 0x0, {{0x0, ...}, ...}, ...}, ...})
    	github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:1207 +0x997
    github.com/cockroachdb/cockroach/pkg/util/tracing.(*Tracer).StartSpanCtx(0xc0138b2a50, {0x72291c0, 0xc019ebbe40}, {0x68e4eb2, 0x1d}, {0xc010cbb760, 0x1, 0x1})
    	github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:1062 +0x1a7
    github.com/cockroachdb/cockroach/pkg/util/tracing.ChildSpan({0x72291c0, 0xc019ebbe40}, {0x68e4eb2, 0x1d})
    	github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:1577 +0x145
    github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient.(*partitionedStreamClient).Heartbeat(0xc00b311080, {0x72291c0, 0xc019ebbe40}, 0xac93aa873318001, {0x16ffc388c7d7c42a, 0x0, 0x0})
    	github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamclient/partitioned_stream_client.go:83 +0xb7
    github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamingest.(*heartbeatSender).maybeHeartbeat(0xc01980f880, {0x72291c0, 0xc019ebbe40}, {0x16ffc388c7d7c42a, 0x0, 0x0})
    	github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamingest/stream_ingestion_frontier_processor.go:180 +0x250
    github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamingest.(*heartbeatSender).startHeartbeatLoop.func1.1()
    	github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamingest/stream_ingestion_frontier_processor.go:207 +0x41b
    github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamingest.(*heartbeatSender).startHeartbeatLoop.func1({0x72291c0, 0xc019ebbe40})
    	github.com/cockroachdb/cockroach/pkg/ccl/streamingccl/streamingest/stream_ingestion_frontier_processor.go:227 +0xb5
    github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
    	github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:169 +0x52
    golang.org/x/sync/errgroup.(*Group).Go.func1()
    	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74 +0xb4
    created by golang.org/x/sync/errgroup.(*Group).Go
    	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:71 +0xdd
    I220708 05:29:42.017167 1 (gostd) testmain.go:90  [-] 1  Test //pkg/ccl/streamingccl/streamingest:streamingest_test exited with error code 2

The use after finish was caused by goroutines in the ingestion
frontier processor that lived past a call to
(*ProcessorBase).InternalClose, which finishes the span attached to
the context passed to the processor in Start.

To address this we:

- ensure that we stop our heartbeat thread before calling
  InternalClose in ConsumerClosed, and

- provide a TrailingMetaCallback so that we can perform cleaned when
  DrainHelper() is called. When a TrailingMetaCallback is provided,
  DrainHelper() calls it instead of InternalClose(), allowing us to
  correctly clean up before the span is closed.

- use context cancellation rather than a channel to control the
  heartbeat loop exit to avoid having to deal with avoiding
  double-closes of the channel.

Fixes #84054

Release note: None

84100: kvserver: Clean up empty range directories after snapshots r=nicktrav a=itsbilal

Previously, we were creating subdirectories for ranges and
range snapshots in the auxiliary directory every time we
accepted a snapshot, but only cleaning up the snapshot
subdirectories after a snapshot scratch space closed. This
left empty parent range directories around on the FS,
slowing down future calls to Pebble.Capacity() and indirectly
slowing down AddSSTable in the future.

This change adds code to clean up empty range directories
in the aux directory if they're not being used. Some coordination
and synchronization code had to be added to ensure we wouldn't
remove a directory that was just created by a concurrent snapshot.

Fixes #83137 

Release note (bug fix, performance improvement): Addresses issue where
imports and rebalances were being slowed down due to the accumulation
of empty directories from range snapshot applications.

84170: sql/sqlstats: record QuerySummary when merging stats r=ericharmeling a=stevendanna

During execution of a transaction, all statement statistics are
collected in an struct local to that transaction, and then flushed to
the main ApplicationStats container when the transaction finishes.

Previously, when flushing, we failed to copy the QuerySummary field,
leading to `metadata->'querySummary'` from being empty in most cases.

Prior to ce1b42b this only affected
statements in an explicit transaction. After that commit, it affected
all statements.

Release note (bug fix): Fix a bug that led to the querySummary field
in crdb_internal.statements_statistics's metadata column being empty.

84194: opt: mark SimplifyRootOrdering as an essential rule r=mgartner a=mgartner

The unoptimized query oracle, which disables rules, found a bug in the
execution engine that is only possible to hit if the
`SimplifyRootOrdering` rule is disabled (see #84191). Until the bug is
fixed, we mark the rule as essential so that it is not disabled by these
tests.

Fixes #84067

Release note: None

Co-authored-by: Nathan Stilwell <[email protected]>
Co-authored-by: Sean Barag <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
Copy link

github-actions bot commented Jan 3, 2024

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Jan 5, 2024
@mgartner mgartner moved this from New Backlog to Bugs to Fix in SQL Queries Jan 5, 2024
@mgartner mgartner added P-3 Issues/test failures with no fix SLA and removed no-issue-activity labels Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team
Projects
Status: Bugs to Fix
Development

No branches or pull requests

1 participant