Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: flake in TestRegistryLifecycle/rollback #52767

Closed
knz opened this issue Aug 13, 2020 · 7 comments · Fixed by #53016
Closed

jobs: flake in TestRegistryLifecycle/rollback #52767

knz opened this issue Aug 13, 2020 · 7 comments · Fixed by #53016
Assignees
Labels
A-jobs branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered).

Comments

@knz
Copy link
Contributor

knz commented Aug 13, 2020

Found in https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests/2178285?

------- Stdout: -------
=== RUN   TestRegistryLifecycle/rollback
--- FAIL: TestRegistryLifecycle/rollback (2.75s)
jobs_test.go:200: Starting resume
jobs_test.go:208: Exiting resume
jobs_test.go:200: Starting resume
jobs_test.go:208: Exiting resume
jobs_test.go:250: Starting success
jobs_test.go:254: Exiting success
jobs_test.go:200: Starting resume
jobs_test.go:648: expected job status: succeeded but got: running
jobs_test.go:208: Exiting resume
@blathers-crl
Copy link

blathers-crl bot commented Aug 13, 2020

Hi @knz, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@knz knz added the C-test-failure Broken test (automatically or manually discovered). label Aug 13, 2020
@knz knz added branch-master Failures and bugs on the master branch. A-jobs labels Aug 13, 2020
@knz
Copy link
Contributor Author

knz commented Aug 13, 2020

I'll add a commit to skip the test for the time being.

@spaskob
Copy link
Contributor

spaskob commented Aug 18, 2020

make stress PKG=./pkg/jobs TESTFLAGS='-v ' TESTS=TestRegistryLifecycle/rollback
does not reproduce the flake locally

@spaskob
Copy link
Contributor

spaskob commented Aug 18, 2020

make stress PKG=./pkg/jobs TESTFLAGS='-v ' TESTS=TestRegistryLifecycle
also does not reproduce the flake locally in >2000 runs.

@knz
Copy link
Contributor Author

knz commented Aug 18, 2020

Have you tried the git SHA where the issue was found originally.

It's possible the root cause has been fixed on master already. If that is the case (i.e. you can repro in the past but not any more) it would be useful to determine which change was responsible.

@spaskob
Copy link
Contributor

spaskob commented Aug 19, 2020

What would be that SHA? TC information is extremely hard to parse.

@knz
Copy link
Contributor Author

knz commented Aug 19, 2020

What would be that SHA? TC information is extremely hard to parse.

You always have two differnt ways to identify this:

  1. identify the PR that the build is coming from. The PR number is at the top left
    image

    Then go to the PR, look at the history of the PR and find which build failed, and scan through the list of commit SHAs to find which one was used for the build.

  2. or, go to the full test log like this:

    • Dependencies
      image

    • Click the failed CI target, then click on the red failure message
      image

    • Click "download log" at the top right
      image

    • The SHA is at the top
      image

craig bot pushed a commit that referenced this issue Aug 19, 2020
52836: bulkio: Make incremental scheduled backup wait for full backup. r=miretskiy a=miretskiy

Fixes #52835

Add ability to record schedule groups: set of related schedules.
Use this functionality to makean incremental schedule wait
until the full one completes before it begins its execution.

Release Notes: None

53016: jobs: unskip TestRegistryLifecycle/rollback r=spaskob a=spaskob

The test flakiness was introduced by #52697 and fixed by #52710.

Fixes #52767.

Release note: none.

53018: colexec: create new message to send metadata in unordered synchronizer r=yuzefovich a=asubiotto

This commit fixes a race condition where a metadata message would be
double-freed and therefore the same object returned to two different goroutines
from a sync.Pool.

The root cause of this issue was that input goroutines in the parallel
unordered synchronizer use a single message that is sent repeatedly over a
channel instead of multiple messages to avoid allocations. A scenario could
occur where an input would drain metadata and set its message's metadata field
while its message was still unread in the channel. The message would then be
sent on the channel again, and the synchronizer's DrainMeta method would read
the first message with the metadata field set, followed by the same message a
second time. This results in returning the same metadata message twice to the
distsql receiver, which would release the same metadata twice.

The solution is to instead allocate a new message when draining, which will
leave message already present in the channel untouched.

Release note: None (no release with bug)

Fixes #52890 
Fixes #52948 

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Spas Bojanov <[email protected]>
Co-authored-by: Alfonso Subiotto Marques <[email protected]>
@craig craig bot closed this as completed in 71288b9 Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-jobs branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants