upgrades: upgrades don't seem to run when `developmentBranch` is set to `false` #100685

knz · 2023-04-05T07:32:31Z

When manually modifying developmentBranch (as in #99387) a couple of tests[1] fail.

For example:

--- FAIL: TestServerStartupGuardrails (0.23s)
    test_server_shim.go:353: migration-manager-find-jobs: system-jobs-scan: relation "system.job_info" does not exist

This suggests that we are not properly running the SQL migrations in that case.

[1] Tests that fail:

TestAlreadyRunningJobsAreHandledProperly, TestPauseMigration - fixed by release-23.1: upgrademanager: fix upgrade manager tests that relied on wrong invariants #100867
TestServerStartupGuardrails - fixed by release-23.1: serverccl: conditionally apply DevOffset in TestServerStartupGuardrails #100729

Jira issue: CRDB-26555

The text was updated successfully, but these errors were encountered:

celiala · 2023-04-05T07:42:18Z

3 tests that fail in #99387 are:

(1) TestAlreadyRunningJobsAreHandledProperly. in pkg/upgrade/upgrademanager/upgrademanager_test

=== RUN   TestAlreadyRunningJobsAreHandledProperly
    manager_external_test.go:120:
          Error Trace:  pkg/upgrade/upgrademanager/upgrademanager_test_test/pkg/upgrade/upgrademanager/manager_external_test.go:120
          Error:        Received unexpected error:
                        sql: no rows in result set

(2) TestPauseMigration. in pkg/upgrade/upgrademanager/upgrademanager_test

=== RUN   TestPauseMigration
    manager_external_test.go:464: error scanning '&{<nil> 0xc003f80200}': sql: no rows in result set

(3) TestServerStartupGuardrails. in github.com/cockroachdb/cockroach/pkg/ccl/serverccl

I230405 05:30:21.943950 189123 upgrade/upgrademanager/manager.go:168  [T1,n1] 1170  running permanent upgrades up to version: 1000022.2
I230405 05:30:21.944731 189876 sql/temporary_schema.go:505  [T1,n1] 1171  running temporary object cleanup background job
I230405 05:30:21.945053 189877 sql/sqlstats/persistedsqlstats/provider.go:129  [T1,n1] 1172  starting sql-stats-worker with initial delay: 10m15.992188656s
I230405 05:30:21.946647 189895 kv/kvclient/rangefeed/rangefeedcache/watcher.go:335  [T1,n1] 1173  system-config-cache: established range feed cache
I230405 05:30:21.947169 189864 kv/kvserver/replica_rangefeed.go:701  [T1,n1,rangefeed=table-stats-cache,mux_n=1,gen=2,r21/1:/Table/2{0-1},s1] 1174  RangeFeed closed timestamp is empty
I230405 05:30:21.950611 189876 sql/temporary_schema.go:555  [T1,n1] 1175  found 0 temporary schemas
I230405 05:30:21.950720 189876 sql/temporary_schema.go:558  [T1,n1] 1176  early exiting temporary schema cleaner as no temporary schemas were found
I230405 05:30:21.950747 189876 sql/temporary_schema.go:559  [T1,n1] 1177  completed temporary object cleanup job
I230405 05:30:21.950765 189876 sql/temporary_schema.go:640  [T1,n1] 1178  temporary object cleaner next scheduled to run at 2023-04-05 06:00:21.944702855 +0000 UTC m=+1944.305053323
E230405 05:30:21.994611 189894 spanconfig/spanconfigmanager/manager.go:141  [T1,n1] 1179  error starting auto span config reconciliation job: get-jobs: system-jobs-scan: relation "system.job_info" does not exist
I230405 05:30:22.008682 189123 upgrade/upgrademanager/manager.go:229  [T1,n1] 1180  the last permanent upgrade (v22.2-98) does not appear to have completed; attempting to run all upgrades
I230405 05:30:22.009905 189123 upgrade/upgrademanager/manager.go:270  [T1,n1] 1181  running permanent upgrade for version 0.0-2
    test_server_shim.go:331: migration-manager-find-jobs: system-jobs-scan: relation "system.job_info" does not exist
        (1) attached stack trace
          -- stack trace:
          | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).execInternal.func1.1

celiala · 2023-04-05T07:45:32Z

Note that TestPauseMigration passed on March 23[1] but then failed on April 4[2].

[1] March 23: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_BazelEssentialCi/9223737?showRootCauses=false&expandBuildChangesSection=true&expandBuildProblemsSection=true&expandBuildDeploymentsSection=true&expandBuildTestsSection=true
[2] April 4: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_BazelEssentialCi/9416194?showRootCauses=false&expandBuildChangesSection=true&expandBuildDeploymentsSection=true&expandBuildProblemsSection=true&expandBuildTestsSection=true

renatolabs · 2023-04-05T13:07:13Z

TestServerStartupGuardrails shouldn't be using versions that unconditionally apply the dev offset.

cockroach/pkg/ccl/serverccl/server_startup_guardrails_test.go

Lines 33 to 35 in ff2c4a3

    
           v := func(major, minor int32) roachpb.Version { 
        
           	return roachpb.Version{Major: clusterversion.DevOffset + major, Minor: minor} 
        
           }

knz · 2023-04-05T13:17:36Z

@renatolabs any suggestion as to how to achieve what the test wants to exercise, instead?

renatolabs · 2023-04-05T13:52:12Z

@renatolabs any suggestion as to how to achieve what the test wants to exercise, instead?

Since that test already relies on version keys to pass a custom TenantLogicalVersionKey, we could be fetching the other version fields in that test by key as well instead of manually creating a roachpb.Version. I can open a PR if you think that's acceptable. Feel free to suggest anything else you may have in mind too.

knz · 2023-04-05T13:58:34Z

I can open a PR if you think that's acceptable

Yes I would love that ❇️

When overwriting versions for that test, we were unconditionally applying `clusterversion.DevOffset` to all versions. While that works on master, it makes the test break when it runs on a release branch (where no offsetting happens). In this commit, we check whether binary versions are offset, and only add the DevOffset if they are. Unfortunately, we can't reference versions by key in this test as it references versions from two releases ago. Informs: cockroachdb#100685 Epic: none Release note: None

98240: release: split release process r=jlinder a=rail Previously, the release publishing step contained building, signing and other operations that can happen in advance. This PR splits the release process into 2 steps: build/sign and publishing. Artifacts generated during the build stage are published to a "staged" bucket and copied to the final bucket during the publishing phase. * Extracted docker verification into a function. * The version is read from a file now. * Use `gsutil` in MacOS signing in order to download from private repos. * Removed `PRE_RELEASE` variable set from TeamCity in favour of "guessing" it from the version. * Simplified dry-run logic to define version/build names, because the version is defined in the source code now. * The publishing phase updates docker image packages. Epic: RE-307 Fixes: RE-211 Release note: None 100641: upgrades: update helper fn ExecForCountInTxns to retry transactions r=rafiss a=andyyang890 This patch updates the ExecForCountInTxns helper function to retry transactions to prevent spurious test failures. Fixes: #100158 Release note: None 100657: builtins: remove crdb_internal.no_constant_folding builtin r=yuzefovich a=yuzefovich This function no longer does what it's intended to do (namely disabling the constant folding of the whole expression, for testing purposes) because it was only supported in the heuristics planner. This builtin is also not used anywhere in our tests. Epic: None Release note: None 100700: kvtenantccl: unskip TestTenantUpgrade r=rafiss a=nicktrav This test was skipped as part of #97076. Re-enable the test. Fix #97076. Release note: None. 100701: serverccl: conditionally apply DevOffset in TestServerStartupGuardrails r=knz a=renatolabs When overwriting versions for that test, we were unconditionally applying `clusterversion.DevOffset` to all versions. While that works on master, it makes the test break when it runs on a release branch (where no offsetting happens). In this commit, we check whether binary versions are offset, and only add the DevOffset if they are. Unfortunately, we can't reference versions by key in this test as it references versions from two releases ago. Informs: #100685 Epic: none Release note: None Co-authored-by: Rail Aliiev <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nick Travers <[email protected]> Co-authored-by: Renato Costa <[email protected]>

When overwriting versions for that test, we were unconditionally applying `clusterversion.DevOffset` to all versions. While that works on master, it makes the test break when it runs on a release branch (where no offsetting happens). In this commit, we check whether binary versions are offset, and only add the DevOffset if they are. Unfortunately, we can't reference versions by key in this test as it references versions from two releases ago. Informs: #100685 Epic: none Release note: None

renatolabs · 2023-04-05T18:34:47Z

I'm not sure if this has already been identified, but setting developmentBranch to false was fine as far as TestPauseMigration is concerned up until #99458, at which point it started failing with that same error message (error scanning '&{<nil> 0xc009df9600}': sql: no rows in result set).

@adityamaru any ideas on this failure?

adityamaru · 2023-04-05T19:22:08Z

@renatolabs I'll spend some time understanding what developmentBranch has to do with the failure but applying this diff seems to fix the test. We don't expect system.jobs to contain any payload after 99458, this information has been moved to the system.job_info table. crdb_internal.system_jobs is a virtual table that joins those two tables so we can read the right columns from the right table:

diff --git a/pkg/upgrade/upgrademanager/manager_external_test.go b/pkg/upgrade/upgrademanager/manager_external_test.go
index f268b226cc1..f2960a57a51 100644
--- a/pkg/upgrade/upgrademanager/manager_external_test.go
+++ b/pkg/upgrade/upgrademanager/manager_external_test.go
@@ -454,7 +454,7 @@ func TestPauseMigration(t *testing.T) {
        var id int64
        tdb.QueryRow(t, `
 SELECT id
-  FROM system.jobs
+  FROM crdb_internal.system_jobs
  WHERE (
         crdb_internal.pb_to_json(
             'cockroach.sql.jobs.jobspb.Payload',

99288: cdc: add apache arrow parquet library and writer r=miretskiy a=jayshrivastava #### cdc: add apache arrow parquet library This commit installs the apache arrow parquet library for Go at version 11. The release can be found here: https://github.com/apache/arrow/releases/tag/go%2Fv11.0.0 This library is licensed under the Apache License 2.0. Informs: #99028 Epic: None Release note: None --- #### util/parquet: create parquet writer library This change implements a `Writer` struct in the new `util/parquet` package. This `Writer` writes datums to the `io.Writer` sink using a configurable parquet version (defaults to v2.6). The package implements several features internally required to write in the parquet format: - schema creation - row group / column page management - encoding/decoding of CRDB datums to parquet datums Currently, the writer only supports types found in the TPCC workload, namely INT, DECIMAL, STRING UUID, TIMESTAMP and BOOL. This change also adds a benchmark and tests which verify the correctness of the writer and test utils for reading datums from parquet files. Informs: #99028 Epic: None Release note: None --- #### changefeedccl: add parquet writer This change adds the file `parquet.go` which contains helper functions to help create parquet writers and export data via `cdcevent.Row` structs. This change also adds tests to ensure rows are written to parquet files correctly. Epic: None Release note: None 100830: upgrademanager: fix upgrade manager tests that relied on wrong invariants r=knz a=adityamaru The two tests in question were low level tests that upgrade clusters from mock version 41 to 42 and override the upgrade flow to test the internals of the upgrade manager. These mock cluster versions were not tied to our real cluster versions and so were not offset with the `DevOffset` when a branch is marked as a development branch. For this reason when a branch was a developmentBranch they would sort under all the "real" cluster versions while when a branch was marked as a release branch they would sort above all the "real" cluster versions. This undeterministic behaviour did not play well with certain job migrations that were added this release. This change rewrites the test to use real cluster versions so that their values are in sync with whether the branch is a developmentBranch or not. This change also removes some overrides to make the test more intuitive. Cocnretely, the test servers now bootstrap at the minimum supported binary version and run all the upgrades until the `startCV` before executing the body of the test. Fixes: #100685 Informs: #100552 Release note: None Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: adityamaru <[email protected]>

…ants The two tests in question were low level tests that upgrade clusters from mock version 41 to 42 and override the upgrade flow to test the internals of the upgrade manager. These mock cluster versions were not tied to our real cluster versions and so were not offset with the `DevOffset` when a branch is marked as a development branch. For this reason when a branch was a developmentBranch they would sort under all the "real" cluster versions while when a branch was marked as a release branch they would sort above all the "real" cluster versions. This undeterministic behaviour did not play well with certain job migrations that were added this release. This change rewrites the test to use real cluster versions so that their values are in sync with whether the branch is a developmentBranch or not. This change also removes some overrides to make the test more intuitive. Cocnretely, the test servers now bootstrap at the minimum supported binary version and run all the upgrades until the `startCV` before executing the body of the test. Fixes: #100685 Informs: #100552 Release note: None

adityamaru · 2023-04-06T21:32:41Z

reopening until the backport is merged

adityamaru · 2023-04-06T22:17:10Z

backport has been merged!

celiala · 2023-04-06T22:22:40Z

Thank you, all! 🙌🏼

knz mentioned this issue Apr 5, 2023

release-23.1: clusterversion: set developmentBranch to false #99387

Merged

renatolabs mentioned this issue Apr 5, 2023

serverccl: conditionally apply DevOffset in TestServerStartupGuardrails #100701

Merged

blathers-crl bot mentioned this issue Apr 5, 2023

release-23.1: serverccl: conditionally apply DevOffset in TestServerStartupGuardrails #100729

Merged

knz added A-jobs T-jobs labels Apr 6, 2023

adityamaru mentioned this issue Apr 6, 2023

upgrademanager: fix upgrade manager tests that relied on wrong invariants #100830

Merged

craig bot closed this as completed in ba03007 Apr 6, 2023

blathers-crl bot mentioned this issue Apr 6, 2023

release-23.1: upgrademanager: fix upgrade manager tests that relied on wrong invariants #100867

Merged

adityamaru reopened this Apr 6, 2023

adityamaru closed this as completed Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrades: upgrades don't seem to run when `developmentBranch` is set to `false` #100685

upgrades: upgrades don't seem to run when `developmentBranch` is set to `false` #100685

knz commented Apr 5, 2023 •

edited by adityamaru

Loading

celiala commented Apr 5, 2023

celiala commented Apr 5, 2023

renatolabs commented Apr 5, 2023

knz commented Apr 5, 2023

renatolabs commented Apr 5, 2023

knz commented Apr 5, 2023

renatolabs commented Apr 5, 2023

adityamaru commented Apr 5, 2023 •

edited

Loading

adityamaru commented Apr 6, 2023

adityamaru commented Apr 6, 2023

celiala commented Apr 6, 2023

upgrades: upgrades don't seem to run when developmentBranch is set to false #100685

upgrades: upgrades don't seem to run when developmentBranch is set to false #100685

Comments

knz commented Apr 5, 2023 • edited by adityamaru Loading

celiala commented Apr 5, 2023

celiala commented Apr 5, 2023

renatolabs commented Apr 5, 2023

knz commented Apr 5, 2023

renatolabs commented Apr 5, 2023

knz commented Apr 5, 2023

renatolabs commented Apr 5, 2023

adityamaru commented Apr 5, 2023 • edited Loading

adityamaru commented Apr 6, 2023

adityamaru commented Apr 6, 2023

celiala commented Apr 6, 2023

upgrades: upgrades don't seem to run when `developmentBranch` is set to `false` #100685

upgrades: upgrades don't seem to run when `developmentBranch` is set to `false` #100685

knz commented Apr 5, 2023 •

edited by adityamaru

Loading

adityamaru commented Apr 5, 2023 •

edited

Loading