Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disk-stalled/log=true,data=true failed #58021

Closed
cockroach-teamcity opened this issue Dec 17, 2020 · 1 comment · Fixed by #58081
Closed

roachtest: disk-stalled/log=true,data=true failed #58021

cockroach-teamcity opened this issue Dec 17, 2020 · 1 comment · Fixed by #58081
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).disk-stalled/log=true,data=true failed on master@eda9189cecbbc279f1857f6e6b992bdfd363397e:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/disk-stalled/log=true_data=true/run_1
	disk_stall.go:129,disk_stall.go:40,test_runner.go:760: unexpected output: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2525484-1608188837-29-n1cpu4:1 -- timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_DEFAULT=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs: exit status 20 Flag --logtostderr has been deprecated, use --log instead to specify sinks.stderr.filter.
		Flag --log-dir has been deprecated, use --log instead to specify file-defaults.dir.
		Error: COMMAND_PROBLEM: exit status 7
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_DEFAULT=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs
		  | ```
		Wraps: (3) exit status 7
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Dec 17, 2020
@petermattis
Copy link
Collaborator

@itsbilal Can you triage this failure?

craig bot pushed a commit that referenced this issue Dec 21, 2020
57990: eventpb: new JSON serialization with redaction markers r=itsbilal a=knz

Fixes  #45643

All commits but the last from  #58070 and prior 
This is a prereq for presenting structured events "inline" in  #57170. 

This patch introduces a new code generator and infrastructure to emit
structured event payloads using the JSON syntax, but also including
redaction markers for fields not marked explicitly as safe for
reporting.

This is not yet connected to the remainder of the logging system --
this change will be performed in a later commit.

Release note: None

58015: backupccl: fix progress updating test r=pbardea a=pbardea

This test was previously flaky since it was assuming that backup would
issue the same number of requests as spans issued. This assumption was
incorrect, and fixing the progress accounting for backup revealed that
this test was faulty.

While planning the work distribution for backup worker nodes,
PartitionSpans automatically merges spans that are co-located on the
same node, therefore reducing the number of ExportRequests issued.
Before this commit, we used to block on the ExportRequest responses.
The blocking was triggered on a per-range-request level. However,
progress is only updated when the processor-level batch request is
returned. This meant that the test might block a batch request and
therefore the progress of the job would be less than what the test
expected.

This is now fixed by adjusting the blocking mechanism and range counting
to all be at the level of the merged ranges with which the backup
processor operates.

Fixes #57831.

This test would fail under stress pretty quickly (~40) -- I was able to get
over 600 runs without failure with this patch.

Release note: none

58054: opt: generate semi and anti inverted joins on multi-column indexes r=rytaft a=mgartner

This commit adds tests for generating semi and anti inverted joins on
multi-column inverted indexes. A `ConstFilters` field was added to
`InvertedJoinPrivate`, similar to `LookupJoinPrivate` so that row count
estimates for these expressions are more accurate. This was necessary to
make the semi and anti joins the chosen plans for the exploration tests.
A future commit will add more comprehensive stats tests.

Release note: None

58081: roachtest,roachprod: Use new --log flag, fix parameterRE in expander r=itsbilal a=itsbilal

Now that logging configuration is specified using --log, and
not using that flag adds excess output, it's more desirable
to move the disk-stalled roachtest to that flag.

Also fixes a bug in the regular expression expander
in roachprod, as that was conflicting with parts
of YAML arguments that it shouldn't be touching.

Fixes #58021.

Release note: None.

58117: sql/rowexec: don't allocate buf per row in sketchInfo.addRow r=nvanbenschoten a=nvanbenschoten

The `intbuf` array was meant to stay on the stack, but was escaping to the heap because the call through the `hash` function variable was opaque to escape analysis.

At the end of a 4 hour, 2.2 TB IMPORT of TPC-E, this was responsible for **76.70%** of all heap allocations (by object).

<img width="1684" alt="Screen Shot 2020-12-20 at 9 58 04 PM" src="https://user-images.githubusercontent.com/5438456/102735277-fb065a00-430f-11eb-837a-7ad1c903cf55.png">

58120: storage: avoid heap allocation per value in pebbleIterator.FindSplitKey r=nvanbenschoten a=nvanbenschoten

This was fallout from 95b836d. `pebble.Iterator.Value` is unsafe (no copy) but `pebbleIterator.Value` is safe (alloc + copy).

```
name                                     old time/op    new time/op    delta
MVCCFindSplitKey_Pebble/valueSize=32-16    97.0ms ± 5%    76.5ms ±15%   -21.08%  (p=0.000 n=9+10)

name                                     old speed      new speed      delta
MVCCFindSplitKey_Pebble/valueSize=32-16   693MB/s ± 5%   881MB/s ±13%   +27.19%  (p=0.000 n=9+10)

name                                     old alloc/op   new alloc/op   delta
MVCCFindSplitKey_Pebble/valueSize=32-16    27.8MB ± 0%     0.0MB ±27%   -99.99%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
MVCCFindSplitKey_Pebble/valueSize=32-16      580k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+9)
```

At the end of a 4 hour, 2.2 TB IMPORT of TPC-E, this was responsible for 15.96% of all heap allocations (by object).

<img width="1680" alt="Screen Shot 2020-12-20 at 10 13 35 PM" src="https://user-images.githubusercontent.com/5438456/102735999-d6ab7d00-4311-11eb-81fb-a1bd6d807323.png">


58121: storage: re-use buffer across calls to sstIterator.SeekGE r=nvanbenschoten a=nvanbenschoten

At the end of a 4 hour, 2.2 TB IMPORT of TPC-E, this was responsible for 2.67% of all heap allocations (by object) due to the call to `sstIterator.SeekGE` from `checkForKeyCollisionsGo`.

```
      File: cockroach
Type: alloc_objects
Time: Dec 21, 2020 at 2:51am (UTC)
Duration: 5.32s, Total samples = 6963804
Active filters:
   focus=SeekGE
Showing nodes accounting for 212999, 3.06% of 6963804 total
----------------------------------------------------------+-------------
                                            185692   100% |   github.com/cockroachdb/cockroach/pkg/storage.checkForKeyCollisionsGo /go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:3807
         0     0%     0%     185692  2.67%                | github.com/cockroachdb/cockroach/pkg/storage.(*sstIterator).SeekGE /go/src/github.com/cockroachdb/cockroach/pkg/storage/sst_iterator.go:93
                                            185692   100% |   github.com/cockroachdb/cockroach/pkg/storage.EncodeKey /go/src/github.com/cockroachdb/cockroach/pkg/storage/batch.go:126
```

58135: util: add crdb_test_off build tag to disable metamorphization r=yuzefovich a=yuzefovich

We are now metamorphizing most of the test builds. However, there are
cases where that behavior is undesirable (for example, when using
`-rewrite` test flag with logic tests) - in such scenarios we want the
"production" build to occur. In order to have such functionality this
commit makes `crdb_test` build be dependent on the absence of
`crdb_test_off` build tag. An example usage of `-rewrite` option would
be `make testoptlogic TESTFLAGS='-rewrite' TAGS=crdb_test_off`.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in f8fd60c Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants