Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=3/cpu=4 failed #53282

Closed
cockroach-teamcity opened this issue Aug 23, 2020 · 5 comments · Fixed by #53697
Closed

roachtest: tpccbench/nodes=3/cpu=4 failed #53282

cockroach-teamcity opened this issue Aug 23, 2020 · 5 comments · Fixed by #53697
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).tpccbench/nodes=3/cpu=4 failed on master@7425e857e62fe4280f614f9076f310322cc78649:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=3/cpu=4/run_1
	cluster.go:1612,context.go:135,cluster.go:1601,test_runner.go:823: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2212553-1598162915-15-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		2: dead
		3: 5248
		1: 6976
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1143
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:267
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/[email protected]/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/[email protected]/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/[email protected]/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1839
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=3/cpu=4

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 23, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Aug 23, 2020
@irfansharif
Copy link
Contributor

Same as #53285 (comment).

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=3/cpu=4 failed on master@ef55609797c92e46eb9efd08facc9db49558291d:

test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=3/cpu=4/run_1
	cluster.go:2597,tpcc.go:782,tpcc.go:617,test_runner.go:754: monitor failure: monitor task failed: output in run_063314.808_n4_workload_fixtures_load_tpcc: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2223072-1598421936-04-n4cpu4:4 -- ./workload fixtures load tpcc --warehouses=1000 --scatter --checks=false {pgurl:1} returned: exit status 20
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2585
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2593
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:782
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:754
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2641
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2245
		  | main.loadTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:684
		  | main.runTPCCBench.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:780
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2575
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) output in run_063314.808_n4_workload_fixtures_load_tpcc
		Wraps: (7) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2223072-1598421936-04-n4cpu4:4 -- ./workload fixtures load tpcc --warehouses=1000 --scatter --checks=false {pgurl:1} returned
		  | stderr:
		  | Error: finding fixture: fixture not found: workload/tpcc/version=2.1.0,deprecated-fk-indexes=false,fks=true,interleaved=false,seed=1,warehouses=1000
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./workload fixtures load tpcc --warehouses=1000 --scatter --checks=false {pgurl:1}
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (8) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *main.withCommandDetails (8) *exec.ExitError

More

Artifacts: /tpccbench/nodes=3/cpu=4
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@irfansharif
Copy link
Contributor

Error: finding fixture: fixture not found

@rohany were you in the area recently? (Only the last failure.)

@rohany
Copy link
Contributor

rohany commented Aug 26, 2020

I just bors something to fix this http://github.com/cockroachdb/cockroach/pull/53450

@knz
Copy link
Contributor

knz commented Aug 31, 2020

related to #53285

@knz knz assigned andreimatei and unassigned aayushshah15 Aug 31, 2020
craig bot pushed a commit that referenced this issue Sep 1, 2020
53589: jobs: improve job adoption r=ajwerner a=ajwerner

#### jobs: don't hold mutex during adoption, launch in parallel

#### jobs: break up new stages of job lifecycle movement

In the PR which adopted the sqlliveness sessions, we shoved all of the stages
of adopting jobs into the same stage and we invoked that stage on each adoption
interval and on each sent to the adoption channel.

These stages are:

 * Cancel jobs
 * Serve pause and cancel requests
 * Delete claims due to dead sessions
 * Claim jobs
 * Process claimed jobs

This is problematic for tests which send on the adoption channel at a high
rate. One important thing to note is that all jobs which are sent on the
adoption channel are already claimed.

After this PR we move the first three steps above into the cancellation
loop we were already running. We also increase the default interval for
that loop as it was exceedingly frequent at 1s for no obvious reason.

Much of the testing changes are due to this cancelation loop duration
change. The tests in this package now run 3x faster (10s vs 30s).

Then, upon sends on the adoption channel, we just process claimed jobs.
When the adoption interval rolls around, then we attempt to both claim
and process jobs.

Release justification: bug fixes and low-risk updates to new functionality
Release note: None

53697: kv: attach txn to error from detectIntentMissingDueToIntentResolution r=nvanbenschoten a=nvanbenschoten

Fixes #53189.
Fixes #53282.
Fixes #53285.
Fixes #53469.

3dcb6f1 improved the detection of missing intents during parallel commit attempts to distinguish between certain classes of ambiguous errors and transaction aborted errors. This was a nice improvement, as it dramatically reduced the number of situations where we returned ambiguous errors during normal operation (see #52566).

However, in introducing a new location where transaction retry errors could be generated, it accidentally violated the invariant that all transaction retry errors have transaction protos attached to them. This was causing panics in TPC-C roachtests. This commit fixes this issue by properly attaching transaction protos to these new errors, along with any others returned from `detectIntentMissingDueToIntentResolution`.

Release justification: bug fix

53708: builtins: implement ST_MemCollect and ST_MemUnion r=otan a=erikgrinaker

These are implemented as aliases for ST_Collect and ST_Union. In PostGIS
these are memory-optimized versions, but for now it should be sufficient
to simply make them functional.

Release justification: low risk, high benefit changes to existing functionality

Release note (sql change): Implement the geometry aggregate builtins
`ST_MemCollect` and `ST_MemUnion`.

Closes #48984.
Closes #48986.

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
@craig craig bot closed this as completed in 5430170 Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants