Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/initial-scan-only failed [NPE in btree-based frontier] #115411

Closed
cockroach-teamcity opened this issue Dec 1, 2023 · 2 comments · Fixed by #115509
Closed

roachtest: cdc/initial-scan-only failed [NPE in btree-based frontier] #115411

cockroach-teamcity opened this issue Dec 1, 2023 · 2 comments · Fixed by #115509
Assignees
Labels
A-cdc Change Data Capture branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Dec 1, 2023

roachtest.cdc/initial-scan-only failed with artifacts on master @ 02e46d54d0b0bf63f43592709d551534edb54be6:

(monitor.go:153).Wait: monitor failure: read tcp 172.17.0.3:51266 -> 34.23.67.255:26257: read: connection reset by peer
test artifacts and logs in: /artifacts/cdc/initial-scan-only/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_metamorphicBuild=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-34017

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-cdc labels Dec 1, 2023
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Dec 1, 2023
@blathers-crl blathers-crl bot added the A-cdc Change Data Capture label Dec 1, 2023
@stevendanna stevendanna changed the title roachtest: cdc/initial-scan-only failed roachtest: cdc/initial-scan-only failed [NPE in btree-based frontier] Dec 1, 2023
@stevendanna
Copy link
Collaborator

This appears to be a NPE in the new btree-based frontier:

goroutine 16380 [running]:
panic({0x5775ae0?, 0xb1ca010?})
	GOROOT/src/runtime/panic.go:1017 +0x3ac fp=0xc05ba089f8 sp=0xc05ba08948 pc=0x49c4cc
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Wait(0xc006376f00)
	github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:614 +0x19b fp=0xc05ba08a88 sp=0xc05ba089f8 pc=0x26cb9db
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run.func1()
	github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:556 +0x25 fp=0xc05ba08aa0 sp=0xc05ba08a88 pc=0x26cb805
runtime.deferCallSave(0xc05ba08b58, 0xc05ba09460?)
	GOROOT/src/runtime/panic.go:798 +0x84 fp=0xc05ba08ab0 sp=0xc05ba08aa0 pc=0x49c084
runtime.runOpenDeferFrame(0xc032eca0a0)
	GOROOT/src/runtime/panic.go:771 +0x1b8 fp=0xc05ba08af0 sp=0xc05ba08ab0 pc=0x49beb8
panic({0x5775ae0?, 0xb1ca010?})
	GOROOT/src/runtime/panic.go:914 +0x21f fp=0xc05ba08ba0 sp=0xc05ba08af0 pc=0x49c33f
runtime.panicmem(...)
	GOROOT/src/runtime/panic.go:261
runtime.sigpanic()
	GOROOT/src/runtime/signal_unix.go:861 +0x378 fp=0xc05ba08c00 sp=0xc05ba08ba0 pc=0x4b3bd8
github.com/cockroachdb/cockroach/pkg/util/span.(*iterator).findNextOverlap(0xc05ba08df0, 0xc004829600?)
	github.com/cockroachdb/cockroach/pkg/util/span/frontierentry_interval_btree.go:1109 +0xdb fp=0xc05ba08c70 sp=0xc05ba08c00 pc=0x1ee1b7b
github.com/cockroachdb/cockroach/pkg/util/span.(*iterator).FirstOverlap(0xc05ba08df0, 0xc0008ce000?)
	github.com/cockroachdb/cockroach/pkg/util/span/frontierentry_interval_btree.go:1071 +0xf4 fp=0xc05ba08ca0 sp=0xc05ba08c70 pc=0x1ee1774
github.com/cockroachdb/cockroach/pkg/util/span.(*btreeFrontier).forward(0xc008a21bd0, {{0xc0e5b2e7c8, 0x7, 0x8}, {0xc0e5b2e7d0, 0x3, 0x8}}, {0x179cb0f9a6341da3, 0x0, 0x0})
	github.com/cockroachdb/cockroach/pkg/util/span/frontier.go:467 +0x1e5 fp=0xc05ba08e90 sp=0xc05ba08ca0 pc=0x1eda2e5
github.com/cockroachdb/cockroach/pkg/util/span.(*btreeFrontier).Forward(0xc008a21bd0, {{0xc0e5b2e7c8, 0x7, 0x8}, {0xc0e5b2e7d0, 0x3, 0x8}}, {0x179cb0f9a6341da3, 0x0, 0x0})
	github.com/cockroachdb/cockroach/pkg/util/span/frontier.go:300 +0x135 fp=0xc05ba08f10 sp=0xc05ba08e90 pc=0x1ed8fb5
github.com/cockroachdb/cockroach/pkg/util/span.(*concurrentFrontier).Forward(0x0?, {{0xc0e5b2e7c8, 0x7, 0x8}, {0xc0e5b2e7d0, 0x3, 0x8}}, {0x179cb0f9a6341da3, 0x0, 0x0})
	github.com/cockroachdb/cockroach/pkg/util/span/frontier.go:817 +0x115 fp=0xc05ba08fa0 sp=0xc05ba08f10 pc=0x1edc3f5
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.(*schemaChangeFrontier).ForwardResolvedSpan(0xc0060123c0?, {{{0xc0e5b2e7c8, 0x7, 0x8}, {0xc0e5b2e7d0, 0x3, 0x8}}, {0x179cb0f9a6341da3, 0x0, 0x0}, ...})
	github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/changefeed_processors.go:1795 +0x222 fp=0xc05ba09038 sp=0xc05ba08fa0 pc=0x4066842
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.(*changeFrontier).forwardFrontier(0xc00638b000, {{{0xc0e5b2e7c8, 0x7, 0x8}, {0xc0e5b2e7d0, 0x3, 0x8}}, {0x179cb0f9a6341da3, 0x0, 0x0}, ...})
	github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/changefeed_processors.go:1410 +0x66 fp=0xc05ba090b0 sp=0xc05ba09038 pc=0x40642c6
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.(*changeFrontier).noteAggregatorProgress(0xc00638b000, {0x0, {0xc0025a07e0, 0xff, 0x102}, {0x7908d90, 0xc068aeaf30}})
	github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/changefeed_processors.go:1401 +0x4d8 fp=0xc05ba09218 sp=0xc05ba090b0 pc=0x40641d8
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.(*changeFrontier).Next(0xc00638b000)
	github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/changefeed_processors.go:1362 +0x21a fp=0xc05ba092e8 sp=0xc05ba09218 pc=0x406393a
github.com/cockroachdb/cockroach/pkg/sql/execinfra.Run({0x78a1ad0, 0xc0074e8210}, {0x78c2900, 0xc00638b000}, {0x7870ba8?, 0xc005666e00?})
	github.com/cockroachdb/cockroach/pkg/sql/execinfra/base.go:198 +0x4f fp=0xc05ba09338 sp=0xc05ba092e8 pc=0x230890f
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBaseNoHelper).Run(0xc00638b000, {0x78a1a98?, 0xc008a21b80?}, {0x7870ba8?, 0xc005666e00})
	github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:726 +0x6d fp=0xc05ba09380 sp=0xc05ba09338 pc=0x230da8d
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.(*changeFrontier).Run(0xc0063ef450?, {0x78a1a98?, 0xc008a21b80?}, {0x7870ba8?, 0xc005666e00?})
	<autogenerated>:1 +0x32 fp=0xc05ba093b8 sp=0xc05ba09380 pc=0x40bf2b2
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run(0xc006376f00, {0x78a1a98?, 0xc008a21b80}, 0x0?)
	github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:579 +0x222 fp=0xc05ba09488 sp=0xc05ba093b8 pc=0x26cb5c2
github.com/cockroachdb/cockroach/pkg/sql/rowflow.(*rowBasedFlow).Run(0xc005245800?, {0x78a1a98?, 0xc008a21b80?}, 0x0?)
	<autogenerated>:1 +0x29 fp=0xc05ba094b8 sp=0xc05ba09488 pc=0x31c1be9
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc000e5fcc0, {0x78a1a98, 0xc008a21b30}, 0xc003d15ef0, 0x0, 0xc007112880, 0xc005666e00, 0xc005245800, 0xc0063eff18)
	github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:910 +0xb67 fp=0xc05ba09e08 sp=0xc05ba094b8 pc=0x363a447
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.startDistChangefeed.func1({0x78a1a98, 0xc008a21ae0})
	github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/changefeed_dist.go:311 +0x3fb fp=0xc05ba09f58 sp=0xc05ba09e08 pc=0x405a43b
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.GoAndWait.Group.GoCtx.func1()
	github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x22 fp=0xc05ba09f78 sp=0xc05ba09f58 pc=0x1e8a082
golang.org/x/sync/errgroup.(*Group).Go.func1()
	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75 +0x56 fp=0xc05ba09fe0 sp=0xc05ba09f78 pc=0x1b05416
runtime.goexit()
	src/runtime/asm_amd64.s:1650 +0x1 fp=0xc05ba09fe8 sp=0xc05ba09fe0 pc=0x4d3121
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 16361
	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:72 +0x96

Similar errors are happening on other roachtests that make use of frontiers such as the C2C roachtests.

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/initial-scan-only failed with artifacts on master @ 74ae9a18d82fdc80e5e9b71b3f06b258ae5cb91b:

(monitor.go:153).Wait: monitor failure: read tcp 172.17.0.3:54874 -> 35.185.19.87:26257: read: connection reset by peer
test artifacts and logs in: /artifacts/cdc/initial-scan-only/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_metamorphicBuild=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Dec 3, 2023
Re-initialize iterator when forwarding span
frontier timestamp.  The underlying btree may be
mutated (by merge operation) invalidating previously
constructed iterator.

Fixes cockroachdb#115411

Release notes: None
craig bot pushed a commit that referenced this issue Dec 4, 2023
113952: log: add protobuf messages for telemetry txn events r=xinhaoz a=xinhaoz

This commit adds the  messages listed below to `telemetry.proto` in
preparation for sending transaction executions to the telemetry
channel. The transaction event that is eventually sent should  contain
all execution information currently being tracked for transaction
fingerprints.

- `SampledTransaction`: contains fields equivalent to the execution
information stored by `CollectedTransactionStatistics` from
app_stats.proto, but represents a single txn execution instead
of aggregated executions of a transaction fingerprint.
- `SampledExecStats`: used as a field in `SampledTransaction`, it
contains execution stats that are sampled. This event is the equivalent
to `ExecStats` from app_stats.proto but for a single execution.
- `MVCCIteratorStats`: used in `SampledExecStats` above, the equivalent of
MVCCIteratorStats from app_stats.proto but for a single execution.

In addition, in order to support the above fields a couple of additional
code templates have been added for generating json log encoding:
- array_of_uint64 type is now being handled for json logs
- `nestedMessage` has been added as a custom type in `gen.go`. Object field
types can be assigned to this type in order to generate them as nested
objects.

Part of: #108284

Release note: None

114666: opt: reduce planning time for queries with many joins r=mgartner a=mgartner

Prior to this commit, some queries with many joins would perform a large
number of allocations calculating the selectivity of null-rejecting join
filters. This was due to `statisticsBuiler.selectivityFromNullsRemoved`
allocating a single-column set for each not-null column, and allocating
column statistics for each set.

Many of those allocations and much unnecessary computations to traverse
the expression tree are now avoided. This is made possible by the
realization that the selectivity of a null-rejecting filter is always 1
if the column was already not-null in the input.

Epic: None

Release note: None


115509: span: Re-initialize iterator when forwarding r=miretskiy a=miretskiy

Re-initialize iterator when forwarding span
frontier timestamp.  The underlying btree may be
mutated (by merge operation) invalidating previously
constructed iterator.

Btree implementation is also hardened against mis-use
when mutating span frontier while iterating.

Fixes #115411
Fixes #115528
Fixes #115512
Fixes #115490
Fixes #115488
Fixes #115487
Fixes #115483

Release notes: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig craig bot closed this as completed in e571747 Dec 4, 2023
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Dec 4, 2023
Re-initialize iterator when forwarding span
frontier timestamp.  The underlying btree may be
mutated (by merge operation) invalidating previously
constructed iterator.

Fixes cockroachdb#115411

Release notes: None
@rharding6373 rharding6373 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants