Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamcity: failed test: TestRestoreReplicas [skipped] #40351

Closed
cockroach-teamcity opened this issue Aug 29, 2019 · 3 comments · Fixed by #56812
Closed

teamcity: failed test: TestRestoreReplicas [skipped] #40351

cockroach-teamcity opened this issue Aug 29, 2019 · 3 comments · Fixed by #56812
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

The following tests appear to have failed on master (testrace): TestRestoreReplicas

You may want to check for open issues.

#1461211:

TestRestoreReplicas
....go:542  quiescing; tasks left:
7      mtc send
I190829 21:57:09.860764 105833 internal/client/txn.go:634  async rollback failed: failed to send RPC: sending to all 2 replicas failed; last error: <nil> failed to send RPC: store is stopped
I190829 21:57:09.867459 106600 util/stop/stopper.go:542  quiescing; tasks left:
6      mtc send
I190829 21:57:09.870267 105918 internal/client/txn.go:634  async rollback failed: failed to send RPC: sending to all 2 replicas failed; last error: <nil> failed to send RPC: store is stopped
I190829 21:57:09.870672 106081 internal/client/txn.go:634  async rollback failed: failed to send RPC: sending to all 2 replicas failed; last error: <nil> failed to send RPC: store is stopped
I190829 21:57:09.870781 106600 util/stop/stopper.go:542  quiescing; tasks left:
2      mtc send
I190829 21:57:09.871656 105718 storage/client_test.go:1359  [txn=65f1a4cb] test clock advanced to: 169.200000311,0
I190829 21:57:09.871806 106432 internal/client/txn.go:634  async rollback failed: failed to send RPC: sending to all 2 replicas failed; last error: <nil> failed to send RPC: store is stopped
W190829 21:57:09.873172 93401 internal/client/txn.go:524  [liveness-hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=failed to send RPC: store is stopped [exhausted])
I190829 21:57:09.873478 93401 storage/node_liveness.go:836  [liveness-hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=failed to send RPC: store is stopped [exhausted])
I190829 21:57:09.873719 105718 internal/client/txn.go:634  async rollback failed: failed to send RPC: sending to all 2 replicas failed; last error: <nil> failed to send RPC: store is stopped
I190829 21:57:09.874511 106273 internal/client/txn.go:634  async rollback failed: failed to send RPC: sending to all 2 replicas failed; last error: <nil> failed to send RPC: store is stopped
W190829 21:57:09.875827 93401 internal/client/txn.go:524  [liveness-hb] failure aborting transaction: node unavailable; try another peer; abort caused by: node unavailable; try another peer
W190829 21:57:09.891890 93401 storage/node_liveness.go:484  [liveness-hb] failed node liveness heartbeat: node unavailable; try another peer
I190829 21:57:09.908716 106599 util/stop/stopper.go:542  quiescing; tasks left:
4      rpc heartbeat
I190829 21:57:09.910447 106599 util/stop/stopper.go:542  quiescing; tasks left:
3      rpc heartbeat
W190829 21:57:09.911913 93150 storage/raft_transport.go:620  while processing outgoing Raft queue to node 1: EOF:
W190829 21:57:09.915158 93320 storage/raft_transport.go:620  while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
    soon.go:35: condition failed to evaluate within 45s: node not live
        goroutine 92967 [running]:
        runtime/debug.Stack(0x62ec8a0, 0xc0034bee00, 0xc005745be0)
        	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
        github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x62ec8a0, 0xc0034bee00, 0xc005745be0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:36 +0x87
        github.com/cockroachdb/cockroach/pkg/storage_test.(*multiTestContext).restartStore(0xc000be6000, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/storage/client_test.go:1055 +0x149
        github.com/cockroachdb/cockroach/pkg/storage_test.(*multiTestContext).restart(0xc000be6000)
        	/go/src/github.com/cockroachdb/cockroach/pkg/storage/client_test.go:1108 +0xc1
        github.com/cockroachdb/cockroach/pkg/storage_test.TestRestoreReplicas(0xc0034bee00)
        	/go/src/github.com/cockroachdb/cockroach/pkg/storage/client_raft_test.go:373 +0x7ea
        testing.tRunner(0xc0034bee00, 0x53922f0)
        	/usr/local/go/src/testing/testing.go:865 +0x164
        created by testing.(*T).Run
        	/usr/local/go/src/testing/testing.go:916 +0x65b




Please assign, take a look and update the issue accordingly.

@tbg
Copy link
Member

tbg commented Aug 30, 2019

Survived standard make stress on my gceworker for... a long time. Trying stressrace now.

@tbg
Copy link
Member

tbg commented Aug 30, 2019

Stressrace repros this quickly. I'm just not sure whether there's an actual problem or whether we're just overloading the machine. Going to dig into this in a bit, may not be today.

@tbg
Copy link
Member

tbg commented Sep 4, 2019

There's not much going on in this test, just a two-node MTC. I'm suspecting it's something like #40362 but I don't have time to dig in before I go on vacation. Going to skip this test.

tbg added a commit to tbg/cockroach that referenced this issue Sep 4, 2019
@tbg tbg changed the title teamcity: failed test: TestRestoreReplicas teamcity: failed test: TestRestoreReplicas [skipped] Sep 4, 2019
tbg added a commit to tbg/cockroach that referenced this issue Sep 5, 2019
craig bot pushed a commit that referenced this issue Sep 5, 2019
40489: storage: skip TestRestoreReplicas r=knz a=tbg

See #40351.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@tbg tbg added the branch-master Failures and bugs on the master branch. label Jan 22, 2020
lunevalex added a commit to lunevalex/cockroach that referenced this issue Nov 19, 2020
Makes progress on cockroachdb#8299
Fixes cockroachdb#40351

multiTestContext is a legacy construct that is deprecated in favor of running
tests via TestCluster. This is one PR out of many to remove the usage of
multiTestContext in the client_raft test cases. This does not remove all the
uses of mtc, just the simple ones. Leaving the more complex uses cases for a later PR.

With this switch we can also clean up some TestingKnobs and TestServer interfaces.
    - DisablePeriodicGossips flag is removed, it does not work with TestCluster
      and is no longer used
    - DontPreventUseOfOldLeaseOnStart flag is removed, it did not work consistently
      in TestCluster. This flag tries to leave the Lease on the same node after a
      restart, but CRDB makes no such guarantees in the real world and artificially
      testing it does not prove anything. The affected tests were re-worked to
      not rely on this condition and can deal with a lease holder moving on a restart.
    - TestServerFactory.New was changed to explicitly return an error rather than
      overloading a single return type. This allows for proper error propagation
      and does not swallow the underlying problem.

Release note: None
lunevalex added a commit to lunevalex/cockroach that referenced this issue Nov 20, 2020
Makes progress on cockroachdb#8299
Fixes cockroachdb#40351

multiTestContext is a legacy construct that is deprecated in favor of running
tests via TestCluster. This is one PR out of many to remove the usage of
multiTestContext in the client_raft test cases. This does not remove all the
uses of mtc, just the simple ones. Leaving the more complex uses cases for a later PR.

With this switch we can also clean up some TestingKnobs and TestServer interfaces.
    - DisablePeriodicGossips flag is removed, it does not work with TestCluster
      and is no longer used
    - DontPreventUseOfOldLeaseOnStart flag is removed, it did not work consistently
      in TestCluster. This flag tries to leave the Lease on the same node after a
      restart, but CRDB makes no such guarantees in the real world and artificially
      testing it does not prove anything. The affected tests were re-worked to
      not rely on this condition and can deal with a lease holder moving on a restart.

Release note: None
lunevalex added a commit to lunevalex/cockroach that referenced this issue Jan 14, 2021
Makes progress on cockroachdb#8299
Fixes cockroachdb#40351
Fixes cockroachdb#57560
Fixes cockroachdb#57537

multiTestContext is a legacy construct that is deprecated in favor of running
tests via TestCluster. This is one PR out of many to remove the usage of
multiTestContext in the client_raft test cases. This does not remove all the
uses of mtc, just the simple ones. Leaving the more complex uses cases for a later PR.

With this switch we can also clean up some TestingKnobs and TestServer interfaces.
    - DisablePeriodicGossips flag is removed, it does not work with TestCluster
      and is no longer used
    - DontPreventUseOfOldLeaseOnStart flag is removed, it did not work consistently
      in TestCluster. This flag tries to leave the Lease on the same node after a
      restart, but CRDB makes no such guarantees in the real world and artificially
      testing it does not prove anything. The affected tests were re-worked to
      not rely on this condition and can deal with a lease holder moving on a restart.
    - GetRaftLeader is ported from multiTestContext to TestCluster

Release note: None
craig bot pushed a commit that referenced this issue Jan 19, 2021
56812: kvserver: replace multiTestContext with TestCluster in client_raft_test r=lunevalex a=lunevalex

Makes progress on #8299
Fixes #40351
Fixes #57560
Fixes #57537

multiTestContext is a legacy construct that is deprecated in favor of running
tests via TestCluster. This is one PR out of many to remove the usage of
multiTestContext in the client_raft test cases. This does not remove all the
uses of mtc, just the simple ones. Leaving the more complex uses cases for a later PR.

With this switch we can also clean up some TestingKnobs and TestServer interfaces.
    - DisablePeriodicGossips flag is removed, it does not work with TestCluster
      and is no longer used
    - DontPreventUseOfOldLeaseOnStart flag is removed, it did not work consistently
      in TestCluster. This flag tries to leave the Lease on the same node after a
      restart, but CRDB makes no such guarantees in the real world and artificially
      testing it does not prove anything. The affected tests were re-worked to
      not rely on this condition and can deal with a lease holder moving on a restart.
    - GetRaftLeader is ported from multiTestContext to TestCluster

Release note: None


58265: sql: fix substring(byte[]) to treat input as raw bytes without escaping r=solongordon a=rafiss

fixes #57367 

Release note (bug fix): The substring function on byte arrays would
treat its input as unicode code points, which would cause the wrong
bytes to be returned. Now it only operates on the raw bytes.

Release note (bug fix): The substring(byte[]) functions were not able to
interpret bytes that had the `\` character since it was treating it as
the beginning of an escape sequence. This is now fixed.

58902: tracing,testutils: detect span leaks r=irfansharif a=irfansharif

With always-on tracing, we're maintaining in-memory registry of active
spans (#58490). Spans are added and removed from this registry when
they're Start()-ed and Finish()-ed. Spans that not explicitly finished
(typically using `defer sp.Finish()`) are now a resource-leak, as they
take up room in the registry. We'll want to find instances of this leak
as soon as they crop up.

To that end we add a check in our TestCluster.Stop codepaths that
asserts against the registry being empty. This should give us wide
coverage given it's usage throughout. We expect this change to capture
the cases described in #58721.

---

This check is currently failing, given we're actually leaking spans in
`txnState.resetForNewSQLTxn`. Fixing that doesn't look simple, it ties
into questions around draining SQL and rolling back open txns.

Release note: None

59049: sql: implement alter sequence/view owner r=arulajmani a=RichardJCai

Resolves #57965

Release note (sql change): Add support for ALTER VIEW/SEQUENCE OWNER TO commands.

59055: colexec: fix external aggregator fallback and bool agg functions reset r=yuzefovich a=yuzefovich

Previously, `reset` method of the ordered aggregator would always set
the flag to reset the internal batch to `true`. However, that batch is
only allocated when `Next` is called at least once with a non-zero batch
coming from the input, which is not the case when the fallback to the
disk-backed strategy occurs in the external aggregator (there, we call
`reset` before we use the operator every time). This would lead to
a nil pointer exception, and it is now fixed.

Our unit tests didn't catch it because we forgot to set the forced
number of repartitions which is now also fixed.

This also revealed a bug with resetting of `bool_and` and `bool_or`
aggregates - we forgot to reset whether they have seen a non-null value
or not.

Fixes: #59043.

Release note: None (no stable release with these bugs)

Co-authored-by: Alex Lunev <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in 3876b44 Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants