-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release: 19.2.8 #50465
Comments
Restarting with a SHA that includes the security fix: 0421678 |
Looks like Roachtest GCE nightly could not be started due to exceeding a CPU quota. Update: re-running Roachtest GCE nightly independently. We'll see if we still run into quota issues. If we do, I'll change zones. |
Looks like it still ran into an issue, we're running close to the 4k cpus limit in us-central. Will use us-east (2400 unused CPUs) instead of us-central and see what happens: https://teamcity.cockroachdb.com/viewLog.html?buildId=2032238&buildTypeId=Cockroach_Nightlies_WorkloadNightly |
Starting the test failure checkoff process while the build is still running to save time. Test Failures ListRoachtest GCE[kv]
[appdev]
[sql-schema]
|
@nvanbenschoten looks like your wheelhouse |
Signed off on the AppDev tests. |
I suspect the quota issue was hit because we are running three releases at the same time on top of the normal nightlies. I've put in a request to raise the quota on CPU as well as in-use IPs and local SSDs as those numbers were close enough that the limit might be reached when running all three as well. |
The quota limit increases were approved. |
I think we should reduce the aggressiveness of the test to avoid some of the starvation that results from these fairness issues under such severe contention. In the meantime, signing off. |
tpccbench timed out. It did run some of the workloads, but look at the last one's logs:
This basically just goes on and on, we never manage to prep all statements before a node gets chaos-killed. @nvanbenschoten is that something we're aware of? I feel like I've seen this test time out in that way a few times. What do we do to fix? Is the chaos timing too aggressive? I assume it's not possible (feasible) to prepare the statements before starting the workload+chaos. |
The schema change one seems potentially bad:
I cannot imagine how that's true. |
Chalking the schema change failure up as #44301 which has been newly prioritized. |
I think there's something more going on. We've dropped the chaos aggressiveness in the past and it didn't seem to help. These prepared statements are usually very quick, so it's surprising to see them stall for over a minute. It makes me wonder if they're getting stuck in some backoff loop if they start the process at the wrong time. I'll confirm the part about prepared statements being quick though. We are initializing 2100 connections and 10500 workers, so maybe I'm misremembering how long that scale takes. There was also some movement in this area when we moved tpcc to pgx. That could be related. |
Candidate SHA: 0421678
Deployment status: Qualifying
Qualification Suite: https://teamcity.cockroachdb.com/viewType.html?buildTypeId=Cockroach_ReleaseQualification&tab=buildTypeStatusDiv&branch_Cockroach=provisional_202006230817_v19.2.8
Nightly Suite: https://teamcity.cockroachdb.com/viewType.html?buildTypeId=Cockroach_Nightlies_NightlySuite&tab=buildTypeStatusDiv&branch_Cockroach_Nightlies=provisional_202006230817_v19.2.8
Admin UI for Qualification Clusters:
Release process checklist
Prep date:
Monday 6/22/2020
Candidate SHA
aboveRelease Qualification
One day after prep date:
Get signoff on roachtest failures
Keep an eye on clusters until release date. Do not proceed below until the release date.
Release date:
Monday 6/29/2020
Check cluster status
Tag release
Bless provisional binaries
If applicable, update the map that roachtests use to map a version to a previous version, to reference the newly tagged version
For production or stable releases in the latest major release series
Update the Homebrew FormulaUpdate the image tag in our Orchestrator configurationsFor production or stable releases
Update docs
External communications for release
Clean up provisional tag from repository
The text was updated successfully, but these errors were encountered: