Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: 20.1.3 #50528

Closed
24 tasks done
asubiotto opened this issue Jun 23, 2020 · 7 comments
Closed
24 tasks done

release: 20.1.3 #50528

asubiotto opened this issue Jun 23, 2020 · 7 comments
Assignees

Comments

@asubiotto
Copy link
Contributor

asubiotto commented Jun 23, 2020

Candidate SHA: 7fd454f
Deployment status: Qualifying
Qualification Suite: https://teamcity.cockroachdb.com/viewType.html?buildTypeId=Cockroach_ReleaseQualification&tab=buildTypeStatusDiv&branch_Cockroach=provisional_202006230817_v20.1.3
Nightly Suite: https://teamcity.cockroachdb.com/viewType.html?buildTypeId=Cockroach_Nightlies_NightlySuite&tab=buildTypeStatusDiv&branch_Cockroach_Nightlies=provisional_202006230817_v20.1.3

Admin UI for Qualification Clusters:

Release process checklist

Prep date: Tuesday 6/23/2020

  • Pick a SHA
    • fill in Candidate SHA above
    • email thread on releases@
  • Tag the provisional SHA
  • Publish provisional binaries
  • Ack security@ on the generated Stackdriver Alert to confirm these writes were part of a planned release (Just reply on the email received alert email acking that this was part of the release process)

Release Qualification

One day after prep date:

Release date: Monday 6/29/2020

@asubiotto asubiotto self-assigned this Jun 23, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Jun 23, 2020
@asubiotto
Copy link
Contributor Author

asubiotto commented Jun 23, 2020

Starting the test failure checkoff process while the build is still running to save time.

Test Failures List

Roachtest GCE

Failures: https://teamcity.cockroachdb.com/viewLog.html?buildId=2032995&buildTypeId=Cockroach_Nightlies_WorkloadNightly

[storage]

  • clearrange/checks=true

[kv]

  • gossip/chaos/nodes=9
  • jepsen/sets/parts-start-kill-2
  • transfer-leases/drain
  • transfer-leases/quit

[appdev]

  • django
  • lib/pq
  • pgx

[bulkio]

  • import/tpch/nodes=8
  • restore2TB/nodes=10

Random Syntax Tests

Failures: https://teamcity.cockroachdb.com/viewLog.html?buildId=2032054&tab=buildResultsDiv&buildTypeId=Cockroach_Nightlies_RandomSyntaxTests

[sql-schema]

  • TestRandomSyntaxSchemaChangeColumn

SQLite Logic Test High VModule Nightly

Failures: https://teamcity.cockroachdb.com/viewLog.html?buildId=2032052&tab=buildResultsDiv&buildTypeId=Cockroach_Nightlies_SqlLogicTestHighVModuleNightly

[sql]

@tbg
Copy link
Member

tbg commented Jun 23, 2020

in mixed-versions, We weren't able to execute ./cockroach-v19.2.7, not sure why. Usually chmod +x ${FILE} is found to not have been run in this case.

   2: done
   3: done
   4: done
10:51:43 test.go:190: test status: 
10:51:43 test.go:190: test status: starting cluster
10:51:43 cluster.go:348: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --binary=./cockroach-19.2.7 --encrypt=false --sequential=false teamcity-2032053-1592909135-13-n4cpu4:1-4
teamcity-2032053-1592909135-13-n4cpu4: starting
0: ~ ././cockroach-19.2.7 version: exit status 126

@asubiotto
Copy link
Contributor Author

asubiotto commented Jun 23, 2020

This roachtest build also ran into CPU quota issues so I'll unfortunately have to restart it. The quota has been increased from 4k to 6k in us-central1 (although I'll still run this new run in us-east1) so this'll hopefully not happen again: https://teamcity.cockroachdb.com/viewLog.html?buildId=2032995&buildTypeId=Cockroach_Nightlies_WorkloadNightly

@tbg
Copy link
Member

tbg commented Jun 24, 2020

gossip/chaos: infra flake

dial tcp 34.70.82.23:26257: connect: connection timed out

jepsen: infra flake

test_runner.go:725: failed to extend cluster: roachprod extend failed: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod extend teamcity-2032995-1592933600-42-n6cpu4 --lifetime=10m: exit status 1

The quit tests failed because they exceeded the one minute timeout in restartNode():

19:31:33 cluster.go:348: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --env=COCKROACH_SCAN_MAX_IDLE_TIME=5ms -a --vmodule=store=1,replica=1,replica_proposal=1 teamcity-2032995-1592933600-22-n3cpu4:2
teamcity-2032995-1592933600-22-n3cpu4: starting...........................................................19:32:33 test.go:325: test failure: 	cluster.go:1969,quit.go:96,quit.go:88,quit.go:142,context.go:135,quit.go:141,quit.go:88,quit.go:46,quit.go:340,test_runner.go:753: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --env=COCKROACH_SCAN_MAX_IDLE_TIME=5ms -a --vmodule=store=1,replica=1,replica_proposal=1 teamcity-2032995-1592933600-22-n3cpu4:2 returned: context deadline exceeded

n2's logs show no signs of the node actually trying to start (the log is still that from previously gracefully shutting down the node), so I would call this an infra flake and am signing off.

@petermattis
Copy link
Collaborator

clearrange/checks=true infra strangeness:

19:31:31 cluster.go:348: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod stop teamcity-2032995-1592933600-30-n10cpu4
teamcity-2032995-1592933600-30-n10cpu4: stopping and waiting.................................................................................................................................
6: exit status 255: 
I200623 19:33:41.767276 1 cluster_synced.go:1749  command failed

I'm not quite sure what is going on yet, but this wasn't a failure of the test itself.

@rafiss
Copy link
Collaborator

rafiss commented Jun 24, 2020

Signed off on appdev tests. We have just pinned the versions of ORMs under test so after we update the list of expected failures, they will be more stable.

@pbardea
Copy link
Contributor

pbardea commented Jun 24, 2020

import/tpch/nodes=8: infra flake

monitor task failed: dial tcp 34.71.144.14:26257: connect: connection timed out

restore2TB/nodes=10

monitor task failed: dial tcp 35.224.219.136:26257: connect: connection timed out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants