Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: v20.1.4 #51919

Closed
25 tasks done
RaduBerinde opened this issue Jul 27, 2020 · 9 comments
Closed
25 tasks done

release: v20.1.4 #51919

RaduBerinde opened this issue Jul 27, 2020 · 9 comments
Assignees

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented Jul 27, 2020

Candidate SHA: 12049d3fe3650660e1b6abf1e522d9bb016acb88
Deployment status: restarted 20.1.4 cluster with new SHA

Qualification Suite: https://teamcity.cockroachdb.com/viewLog.html?buildId=2129919

Nightly Suite: https://teamcity.cockroachdb.com/viewLog.html?buildId=2129939

Admin UI for Qualification Clusters:

Release process checklist

Prep date: 2020-07-27

  • Pick a SHA
    • fill in Candidate SHA above
    • email thread on releases@
  • Tag the provisional SHA
  • Publish provisional binaries
  • Ack security@ and release-engineering-team@ on the generated AWS S3 bucket write Alert to confirm these writes were part of a planned release (Just reply on the email received alert email acking that this was part of the release process)

Release Qualification

One day after prep date:

Release date: 2020-08-03

Cleanup:

  • Clean up provisional tag from repository
  • Destroy roachprod clusters
@RaduBerinde
Copy link
Member Author

RaduBerinde commented Jul 28, 2020

Nightly test failures

Roachtest GCE

https://teamcity.cockroachdb.com/viewLog.html?buildId=2129939

[appdev]

  • activerecord
  • sqlalchemy
  • django
  • pgjdbc
  • psycopg

[bulkio]

  • import/tpch/nodes=8
  • jobs/mixed-versions

[kv]

  • kv0bench/nodes=10/cpu=8/shards=20/sequential
  • kv0bench/nodes=20/cpu=8/sequential
  • kv0bench/nodes=20/cpu=8/shards=80/sequential
  • kv0bench/nodes=5/cpu=8/sequential/2nd_idx
  • tpccbench/nodes=9/cpu=4/chaos/partition
  • ycsb/D/nodes=3/cpu=32
  • scaledata/job-coordinator/nodes=3
  • scaledata/job-coordinator/nodes=6

[sql-exec]

  • tpchvec/perf

Random Syntax Tests

  • TestRandomSyntaxSchemaChangeColumn (similar failure as previous release)

SQLite Logic Test High VModule Nightly

@RaduBerinde
Copy link
Member Author

RaduBerinde commented Jul 28, 2020

Signing off for TestRandomSyntaxSchemaChangeColumn (similar failure like last release) and TestRandomSyntaxGeneration (it's a minor issue - I backported a fix for it in #51948, but no need to pick a new SHA).

@RaduBerinde
Copy link
Member Author

Restarting the process with new SHA 97b6fc07480fc845a60f73224d28cf2e6dcda25d (to include #52003).

For the docs team, if you started preparing notes based on the old SHA, these are the release notes of the changes that are on top of that:

    Release note (bug fix): Previously, RESTORE would sometimes block at the
    end of the job when sending its results back if the connection that
    started the job disconnected. This is now fixed.

    Release note (bug fix): CockroachDB previously could crash on some
    queries with merge joins, and this has now been fixed.

    Release note (bug fix): Previously, a BACKUP job would block once it
    finished backing up the data.

    Release note (bug fix): CockroachDB could previously encounter benign
    internal "context canceled" errors when queries were executed by the
    vectorized engine.

    Release note (bug fix): Increase robustness of restore against
    descriptors which may be in an unexpected state.

Full git commit logs here: https://gist.github.com/RaduBerinde/8fad858dbbe41dad9a9b28e8b5571b05

@RaduBerinde
Copy link
Member Author

Restarting the process with new SHA 12049d3fe3650660e1b6abf1e522d9bb016acb88 (to include #52072).

There is only one change on top of the previous SHA, and it has no release note.

@yuzefovich
Copy link
Member

Signing off on tpchvec/perf (vectorize ON is more than 20% slower than vectorize OFF on Q15), this seems like a noise and shouldn't block the release, but I'll look into that more closerly.

@adityamaru
Copy link
Contributor

Signing off on import/tpch/nodes=8. There was a dead node because of clock sync error:
remote wall time is too far ahead (996.848547ms) to be trustworthy [n5] clock synchronization error: this node is more than 500ms away from at least half of the known nodes (4 of 9 are within the offset)

Signing off on jobs/mixed-versions. Will be fixed by #51951.

@pbardea
Copy link
Contributor

pbardea commented Jul 30, 2020

Re: jobs/mixed-versions, this is a different failure than then one that will be fixed by #51951. However, this test has been quite flaky and it looks like we've been seeing this error since at least June 6: #48194 (comment). It's certainly not a regression, but worth looking into separately.

@rafiss
Copy link
Collaborator

rafiss commented Jul 30, 2020

Signed off on AppDev tests. There are a few existing flaky tests that we are excluding now.

@andreimatei
Copy link
Contributor

Signed off on all KVs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants