Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamcity: failed test: TestChangefeedTruncateRenameDrop #41566

Closed
cockroach-teamcity opened this issue Oct 14, 2019 · 8 comments · Fixed by #42650
Closed

teamcity: failed test: TestChangefeedTruncateRenameDrop #41566

cockroach-teamcity opened this issue Oct 14, 2019 · 8 comments · Fixed by #42650
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

The following tests appear to have failed on release-19.2 (testrace): TestChangefeedTruncateRenameDrop, TestChangefeedTruncateRenameDrop/sinkless, TestChangefeedTruncateRenameDrop/enterprise

You may want to check for open issues.

#1537922:

TestChangefeedTruncateRenameDrop/enterprise
... minutes]:
database/sql.(*DB).connectionOpener(0xc00db42a80, 0x6a76780, 0xc00accc080)
	/usr/local/go/src/database/sql/sql.go:1000 +0x140
created by database/sql.OpenDB
	/usr/local/go/src/database/sql/sql.go:670 +0x2b4

goroutine 63441 [chan receive]:
github.com/cockroachdb/cockroach/pkg/storage/closedts/provider.(*Provider).Notify.func1(0x6a76840, 0xc001a40510)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/closedts/provider/provider.go:217 +0x1ae
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc006cd27e0, 0xc00abe86e0, 0xc0092790a0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:189 +0xc4

goroutine 63696 [sync.Cond.Wait]:
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:307
sync.runtime_notifyListWait(0xc008a37890, 0xc000025ea0)
	/usr/local/go/src/runtime/sema.go:510 +0xf9
sync.(*Cond).Wait(0xc008a37880)
	/usr/local/go/src/sync/cond.go:56 +0x8e
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc0010314d0, 0x6a76840, 0xc001a40c00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:192 +0x9c
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6a76840, 0xc001a40c00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:161 +0x56
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0068ccbf0, 0xc00abe86e0, 0xc0068ccbe0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:189 +0xc4

goroutine 65120 [chan receive, 41 minutes]:
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).RangeFeed(0xc00465a900, 0xc00c00b900, 0x6abd9c0, 0xc00c430ec0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_rangefeed.go:229 +0x90d
github.com/cockroachdb/cockroach/pkg/storage.(*Store).RangeFeed(0xc00147bc00, 0xc00c00b900, 0x6abd9c0, 0xc00c430ec0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2213 +0x4c6
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).RangeFeed(0xc007531550, 0xc00c00b900, 0x6abd9c0, 0xc00c430ec0, 0x5ab6980)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:206 +0x24b
github.com/cockroachdb/cockroach/pkg/server.(*Node).RangeFeed(0xc0089d6a00, 0xc00c00b900, 0x6abd9c0, 0xc00c430ec0, 0x6a76780, 0xc002a96300)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1033 +0x7e
github.com/cockroachdb/cockroach/pkg/rpc.internalClientAdapter.RangeFeed.func1(0xc00a2fb870, 0x6a30100, 0xc0089d6a00, 0xc00c00b900, 0x6a76780, 0xc002a96480, 0xc00c356240, 0xc00c3562a0)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:610 +0xcc
created by github.com/cockroachdb/cockroach/pkg/rpc.internalClientAdapter.RangeFeed
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:608 +0x12e

goroutine 63745 [select, 41 minutes]:
github.com/cockroachdb/cockroach/pkg/server.periodicallyPersistHLCUpperBound(0xc00129f4f0, 0xc003923340, 0xc0068cd900, 0x5d09da8, 0xc008a8a480, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1030 +0x1f8
github.com/cockroachdb/cockroach/pkg/server.(*Server).startPersistingHLCUpperBound.func2(0x6a76840, 0xc0018b5f80)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1104 +0xde
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0068cd910, 0xc00abe86e0, 0xc002910690)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:189 +0xc4
I191014 14:23:02.501369 1 rand.go:85  Random seed: -3946147195552034504



TestChangefeedTruncateRenameDrop
--- FAIL: testrace/TestChangefeedTruncateRenameDrop (0.000s)
Test ended in panic.




TestChangefeedTruncateRenameDrop/sinkless
...is range at key /Table/60 [r30] (zone config)
I191014 14:26:51.237618 62064 sql/event_log.go:130  [n1,client=127.0.0.1:43882,user=root] Event: "create_table", target: 60, info: {TableName:d.public.drop Statement:CREATE TABLE drop (a INT8 PRIMARY KEY) User:root}
I191014 14:26:51.954272 62064 sql/sqlbase/structured.go:1516  [n1,client=127.0.0.1:43882,user=root] publish: descID=60 (drop) version=2 mtime=1970-01-01 00:00:00 +0000 UTC
I191014 14:26:52.156068 62064 sql/event_log.go:130  [n1,client=127.0.0.1:43882,user=root] Event: "drop_table", target: 60, info: {TableName:d.public.drop Statement:DROP TABLE drop User:root CascadeDroppedViews:[]}
I191014 14:26:52.225290 62064 sql/sqlbase/structured.go:1516  [n1,client=127.0.0.1:43882,user=root,scExec] publish: descID=60 (drop) version=3 mtime=1970-01-01 00:00:00 +0000 UTC
I191014 14:26:52.749493 61341 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] storage.IntentResolver: cleanup txn records
13     gossip update (merge)
10     gossip update (split)
1      [async] transport racer
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.749900 61341 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] storage.IntentResolver: cleanup txn records
13     gossip update (merge)
10     gossip update (split)
1      [async] transport racer
1      [async] intent_resolver_ir_batcher
1      [async] closedts-rangefeed-subscriber
W191014 14:26:52.750060 63439 storage/intentresolver/intent_resolver.go:818  failed to gc transaction record: could not GC completed transaction anchored at /Table/SystemConfigSpan/Start: node unavailable; try another peer
I191014 14:26:52.755778 61528 storage/queue.go:520  [n1,s1] rate limited in gossip update (merge): node unavailable; try another peer
I191014 14:26:52.772680 61388 kv/transport_race.go:108  transport race promotion: ran 18 iterations on up to 1120 requests
I191014 14:26:52.773013 61341 util/stop/stopper.go:542  quiescing; tasks left:
5      gossip update (split)
3      gossip update (merge)
1      [async] transport racer
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.774161 61341 util/stop/stopper.go:542  quiescing; tasks left:
4      gossip update (split)
3      gossip update (merge)
1      [async] transport racer
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.774449 61341 util/stop/stopper.go:542  quiescing; tasks left:
4      gossip update (split)
2      gossip update (merge)
1      [async] transport racer
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.774709 61341 util/stop/stopper.go:542  quiescing; tasks left:
4      gossip update (split)
2      gossip update (merge)
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.774988 61341 util/stop/stopper.go:542  quiescing; tasks left:
3      gossip update (split)
2      gossip update (merge)
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.775233 61341 util/stop/stopper.go:542  quiescing; tasks left:
3      gossip update (split)
1      gossip update (merge)
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.775539 61341 util/stop/stopper.go:542  quiescing; tasks left:
2      gossip update (split)
1      gossip update (merge)
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.775849 61341 util/stop/stopper.go:542  quiescing; tasks left:
1      gossip update (split)
1      gossip update (merge)
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.776037 61528 storage/queue.go:520  [n1,s1] rate limited in gossip update (split): node unavailable; try another peer
I191014 14:26:52.776500 61341 util/stop/stopper.go:542  quiescing; tasks left:
1      gossip update (merge)
1      [async] closedts-rangefeed-subscriber
I191014 14:26:52.777106 61341 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber




Please assign, take a look and update the issue accordingly.

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 14, 2019
@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Oct 14, 2019
@RaduBerinde
Copy link
Member

panic: test timed out after 45m0s
CC @danhhz

@RaduBerinde
Copy link
Member

Subsequent build passed, so maybe something is flaky or the timeout was too low.

@aayushshah15
Copy link
Contributor

I stressraced this test for about an hour (with a 45s timeout instead of a 45m timeout) and couldn't reproduce it.

@danhhz
Copy link
Contributor

danhhz commented Oct 14, 2019

In the stacktrace, I see it the test stuck on a TRUNCATE

sqlDB.Exec(t, `TRUNCATE TABLE truncate CASCADE`)

goroutine 63567 [IO wait, 41 minutes]:
internal/poll.runtime_pollWait(0x7fd20023dea8, 0x72, 0x6a049e0)
        /usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc0076a3d98, 0x72, 0x700, 0x791, 0xffffffffffffffff)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0xe5
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0076a3d80, 0xc009f56000, 0x791, 0x791, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:169 +0x221
net.(*netFD).Read(0xc0076a3d80, 0xc009f56000, 0x791, 0x791, 0x146, 0x146, 0x141)
        /usr/local/go/src/net/fd_unix.go:202 +0x66
net.(*conn).Read(0xc0027c7ed0, 0xc009f56000, 0x791, 0x791, 0xdc729d, 0xc005b97758, 0x7ffff80000000000)
        /usr/local/go/src/net/net.go:177 +0xa2
crypto/tls.(*atLeastReader).Read(0xc00e6139a0, 0xc009f56000, 0x791, 0x791, 0xc00e6139a0, 0xc00c445180, 0xc00df0ad18)
        /usr/local/go/src/crypto/tls/conn.go:761 +0xa7
bytes.(*Buffer).ReadFrom(0xc005b97758, 0x69fc1e0, 0xc00e6139a0, 0x5ae56a0, 0x6a01d40, 0x0)
        /usr/local/go/src/bytes/buffer.go:207 +0x165
crypto/tls.(*Conn).readFromUntil(0xc005b97500, 0x6a01d40, 0xc0027c7ed0, 0x5, 0xc0027c7ed0, 0xc00df0abb8)
        /usr/local/go/src/crypto/tls/conn.go:783 +0x222
crypto/tls.(*Conn).readRecordOrCCS(0xc005b97500, 0x5d09b00, 0xc005b97638, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:590 +0x2e4
crypto/tls.(*Conn).readRecord(...)
        /usr/local/go/src/crypto/tls/conn.go:558
crypto/tls.(*Conn).Read(0xc005b97500, 0xc00d25c000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:1236 +0x1f6
bufio.(*Reader).Read(0xc006a23140, 0xc00d1c4860, 0x5, 0x200, 0x5d09b88, 0xc000d407d0, 0xc00dbc2a50)
        /usr/local/go/src/bufio/bufio.go:223 +0x7bc
io.ReadAtLeast(0x69fbfe0, 0xc006a23140, 0xc00d1c4860, 0x5, 0x200, 0x5, 0x86, 0x0, 0x0)
        /usr/local/go/src/io/io.go:310 +0x96
io.ReadFull(...)
        /usr/local/go/src/io/io.go:329
github.com/cockroachdb/cockroach/vendor/github.com/lib/pq.(*conn).recvMessage(0xc00d1c4840, 0xc001340c58, 0x1, 0xc00d1c4860, 0xc00dbc2c00)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/lib/pq/conn.go:929 +0x236
github.com/cockroachdb/cockroach/vendor/github.com/lib/pq.(*conn).recv1Buf(0xc00d1c4840, 0xc001340c58, 0x1fb)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/lib/pq/conn.go:978 +0x47
github.com/cockroachdb/cockroach/vendor/github.com/lib/pq.(*conn).recv1(...)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/lib/pq/conn.go:999
github.com/cockroachdb/cockroach/vendor/github.com/lib/pq.(*conn).simpleExec(0xc00d1c4840, 0x5bd382b, 0x1f, 0xc00dbc2d50, 0x0, 0x0, 0xc00dbc2d00, 0xcc64cc, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/lib/pq/conn.go:603 +0x28d
github.com/cockroachdb/cockroach/vendor/github.com/lib/pq.(*conn).Exec(0xc00d1c4840, 0x5bd382b, 0x1f, 0xa36a000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/lib/pq/conn.go:857 +0x2bf
[15:17:29] :                     [TestChangefeedTruncateRenameDrop/enterprise] [Test Output]
github.com/cockroachdb/cockroach/vendor/github.com/lib/pq.(*conn).ExecContext(0xc00d1c4840, 0x6a767c0, 0xc000104010, 0x5bd382b, 0x1f, 0xa36a000, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/lib/pq/conn_go18.go:42 +0x226
database/sql.ctxDriverExec(0x6a767c0, 0xc000104010, 0x7fd20024e8e0, 0xc00d1c4840, 0x0, 0x0, 0x5bd382b, 0x1f, 0xa36a000, 0x0, ...)
        /usr/local/go/src/database/sql/ctxutil.go:31 +0x2e1
database/sql.(*DB).execDC.func2()
        /usr/local/go/src/database/sql/sql.go:1467 +0x2c3
database/sql.withLock(0x6a2f240, 0xc0076a3e00, 0xc00dbc3150)
        /usr/local/go/src/database/sql/sql.go:3097 +0x75
database/sql.(*DB).execDC(0xc008cde6c0, 0x6a767c0, 0xc000104010, 0xc0076a3e00, 0xc00dbc3280, 0x5bd382b, 0x1f, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/database/sql/sql.go:1462 +0x4d9
database/sql.(*DB).exec(0xc008cde6c0, 0x6a767c0, 0xc000104010, 0x5bd382b, 0x1f, 0x0, 0x0, 0x0, 0xc00dbc3301, 0xdde0ff, ...)
        /usr/local/go/src/database/sql/sql.go:1447 +0x175
database/sql.(*DB).ExecContext(0xc008cde6c0, 0x6a767c0, 0xc000104010, 0x5bd382b, 0x1f, 0x0, 0x0, 0x0, 0x55ad5c0, 0xc00e6138e0, ...)
        /usr/local/go/src/database/sql/sql.go:1425 +0xef
github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).Exec(0xc00dbc3518, 0x6afd060, 0xc000d07600, 0x5bd382b, 0x1f, 0x0, 0x0, 0x0, 0x0, 0x6a94040)
        /go/src/github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:52 +0x100
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.TestChangefeedTruncateRenameDrop.func1(0xc000d07600, 0xc008cde6c0, 0x6a2f740, 0xc00d3a2500)
        /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/changefeed_test.go:864 +0x48f
github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl.enterpriseTest.func1(0xc000d07600)
        /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/helpers_test.go:266 +0x847
testing.tRunner(0xc000d07600, 0xc002f99950)
        /usr/local/go/src/testing/testing.go:865 +0x164
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:916 +0x65b

@danhhz
Copy link
Contributor

danhhz commented Oct 14, 2019

Yeah, there's thousands of lines of this in the logs

I191014 14:44:10.368889 63982 sql/sqlbase/structured.go:1516  [n1,client=127.0.0.1:45572,user=root] publish: descID=54 (truncate_cascade) version=3 mtime=1970-01-01 00:00:00 +0000 UTC
I191014 14:44:11.305333 63982 sql/sqlbase/structured.go:1516  [n1,client=127.0.0.1:45572,user=root] publish: descID=53 (truncate) version=3 mtime=1970-01-01 00:00:00 +0000 UTC

@tbg
Copy link
Member

tbg commented Oct 15, 2019

@aayushshah15 have you tried roachprod-stressrace? It's not frequent on master, but did happen a few times, so I we need to keep pushing (timeouts are really shitty). There's a chance you have to tweak the -p flag to STRESSFLAGS, or try stressing the package instead of the particular test. If that doesn't bear fruit, the next thing to do is to instrument the log messages better so that we can figure out something new the next time it does happen.

https://teamcity.cockroachdb.com/project.html?projectId=Cockroach_UnitTests&testNameId=-8759347941440480963&tab=testDetails is the history of that test, fwiw. @ajwerner I think you stressed that test a while ago when it was flaky, any wisdom to share with @aayushshah15?

@ajwerner ajwerner assigned ajwerner and unassigned aayushshah15 Oct 25, 2019
@ajwerner
Copy link
Contributor

I would not be stunned if this is fixed by the bug fix that lives in #41842. I'll take this from @aayushshah15.

@ajwerner
Copy link
Contributor

Here's my best guess. When I did #40581 I forgot to fix https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/truncate.go#L193. This is problematic because it means that TRUNCATE cannot be pushed. I'm not clear on why we sometimes hit this problem. Maybe it has something to do with bad timing in coordination with reading either the jobs or descriptors tables. I verified that the same behavior occurs if I set a very short closed timestamp duration at the beginning of the test. I'm going to address this by fixing the observation of the commit timnestamp in TRUNCATE TABLE (which should have been done in 19.2 and I might try to advocate for a backport depending on how invasive it gets).

This isn't a changefeed bug per se.

craig bot pushed a commit that referenced this issue Nov 25, 2019
42650: sql: stop observing the CommitTimestamp in TRUNCATE r=ajwerner a=ajwerner

In #40581 we stopped observing the commit timestamp to write it into table
descriptors. In this change I overlooked (rather forgot) about this additional
place in the code where we observed the commit timestamp. As far as I can tell
we don't read this field anywhere ever. Furthermore we know that the the table
descriptor in question to which we are referring must be alive and equal to
the provided value at the timestamp at which it was read due to serializability.
In short, this minor change continues to populate the field with a sensible
value and will permit TRUNCATE to be pushed.

Fixes #41566.

Release note (bug fix): Long running transactions which attempt to TRUNCATE
can now be pushed and will commit in cases where they previously could fail
or retry forever.

42746: roachtest/cdc: fix cdc/bank and cdc/schemareg r=nvanbenschoten a=nvanbenschoten

Fixes #41177.
Fixes #42690.

These were both broken by #41793 because prior versions of crdb didn't support the `WITH diff` option.

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in 3003a79 Nov 25, 2019
ajwerner added a commit to ajwerner/cockroach that referenced this issue Nov 26, 2019
In cockroachdb#40581 we stopped observing the commit timestamp to write it into table
descriptors. In this change I overlooked (rather forgot) about this additional
place in the code where we observed the commit timestamp. As far as I can tell
we don't read this field anywhere ever. Furthermore we know that the the table
descriptor in question to which we are referring must be alive and equal to
the provided value at the timestamp at which it was read due to
serializability. In short, this minor change continues to populate the field
with a sensible value and will permit TRUNCATE to be pushed.

Fixes cockroachdb#41566.

Release note (bug fix): Long running transactions which attempt to TRUNCATE
can now be pushed and will commit in cases where they previously could fail
or retry forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants