jobs: flake in TestRegistryLifecycle/rollback #52767

knz · 2020-08-13T13:00:22Z

Found in https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests/2178285?

------- Stdout: -------
=== RUN   TestRegistryLifecycle/rollback
--- FAIL: TestRegistryLifecycle/rollback (2.75s)
jobs_test.go:200: Starting resume
jobs_test.go:208: Exiting resume
jobs_test.go:200: Starting resume
jobs_test.go:208: Exiting resume
jobs_test.go:250: Starting success
jobs_test.go:254: Exiting success
jobs_test.go:200: Starting resume
jobs_test.go:648: expected job status: succeeded but got: running
jobs_test.go:208: Exiting resume

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-08-13T13:00:24Z

Hi @knz, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

knz · 2020-08-13T13:12:08Z

I'll add a commit to skip the test for the time being.

spaskob · 2020-08-18T15:26:27Z

make stress PKG=./pkg/jobs TESTFLAGS='-v ' TESTS=TestRegistryLifecycle/rollback
does not reproduce the flake locally

spaskob · 2020-08-18T18:19:59Z

make stress PKG=./pkg/jobs TESTFLAGS='-v ' TESTS=TestRegistryLifecycle
also does not reproduce the flake locally in >2000 runs.

knz · 2020-08-18T19:37:05Z

Have you tried the git SHA where the issue was found originally.

It's possible the root cause has been fixed on master already. If that is the case (i.e. you can repro in the past but not any more) it would be useful to determine which change was responsible.

spaskob · 2020-08-19T06:45:13Z

What would be that SHA? TC information is extremely hard to parse.

knz · 2020-08-19T07:52:39Z

What would be that SHA? TC information is extremely hard to parse.

You always have two differnt ways to identify this:

identify the PR that the build is coming from. The PR number is at the top left

Then go to the PR, look at the history of the PR and find which build failed, and scan through the list of commit SHAs to find which one was used for the build.
or, go to the full test log like this:
- Dependencies
- Click the failed CI target, then click on the red failure message
- Click "download log" at the top right
- The SHA is at the top

52836: bulkio: Make incremental scheduled backup wait for full backup. r=miretskiy a=miretskiy Fixes #52835 Add ability to record schedule groups: set of related schedules. Use this functionality to makean incremental schedule wait until the full one completes before it begins its execution. Release Notes: None 53016: jobs: unskip TestRegistryLifecycle/rollback r=spaskob a=spaskob The test flakiness was introduced by #52697 and fixed by #52710. Fixes #52767. Release note: none. 53018: colexec: create new message to send metadata in unordered synchronizer r=yuzefovich a=asubiotto This commit fixes a race condition where a metadata message would be double-freed and therefore the same object returned to two different goroutines from a sync.Pool. The root cause of this issue was that input goroutines in the parallel unordered synchronizer use a single message that is sent repeatedly over a channel instead of multiple messages to avoid allocations. A scenario could occur where an input would drain metadata and set its message's metadata field while its message was still unread in the channel. The message would then be sent on the channel again, and the synchronizer's DrainMeta method would read the first message with the metadata field set, followed by the same message a second time. This results in returning the same metadata message twice to the distsql receiver, which would release the same metadata twice. The solution is to instead allocate a new message when draining, which will leave message already present in the channel untouched. Release note: None (no release with bug) Fixes #52890 Fixes #52948 Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Spas Bojanov <[email protected]> Co-authored-by: Alfonso Subiotto Marques <[email protected]>

knz added the C-test-failure Broken test (automatically or manually discovered). label Aug 13, 2020

knz added branch-master Failures and bugs on the master branch. A-jobs labels Aug 13, 2020

thoszhang assigned spaskob Aug 18, 2020

spaskob mentioned this issue Aug 19, 2020

jobs: unskip TestRegistryLifecycle/rollback #53016

Merged

craig bot closed this as completed in 71288b9 Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs: flake in TestRegistryLifecycle/rollback #52767

jobs: flake in TestRegistryLifecycle/rollback #52767

knz commented Aug 13, 2020

blathers-crl bot commented Aug 13, 2020

knz commented Aug 13, 2020

spaskob commented Aug 18, 2020

spaskob commented Aug 18, 2020

knz commented Aug 18, 2020

spaskob commented Aug 19, 2020

knz commented Aug 19, 2020 •

edited

Loading

jobs: flake in TestRegistryLifecycle/rollback #52767

jobs: flake in TestRegistryLifecycle/rollback #52767

Comments

knz commented Aug 13, 2020

blathers-crl bot commented Aug 13, 2020

knz commented Aug 13, 2020

spaskob commented Aug 18, 2020

spaskob commented Aug 18, 2020

knz commented Aug 18, 2020

spaskob commented Aug 19, 2020

knz commented Aug 19, 2020 • edited Loading

knz commented Aug 19, 2020 •

edited

Loading