Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrades: upgrades don't seem to run when developmentBranch is set to false #100685

Closed
2 tasks done
knz opened this issue Apr 5, 2023 · 11 comments · Fixed by #100830
Closed
2 tasks done

upgrades: upgrades don't seem to run when developmentBranch is set to false #100685

knz opened this issue Apr 5, 2023 · 11 comments · Fixed by #100830
Labels
A-jobs branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-jobs

Comments

@knz
Copy link
Contributor

knz commented Apr 5, 2023

When manually modifying developmentBranch (as in #99387) a couple of tests[1] fail.

For example:

--- FAIL: TestServerStartupGuardrails (0.23s)
    test_server_shim.go:353: migration-manager-find-jobs: system-jobs-scan: relation "system.job_info" does not exist

This suggests that we are not properly running the SQL migrations in that case.

[1] Tests that fail:

Jira issue: CRDB-26555

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 blocks-23.1.0-beta.1 labels Apr 5, 2023
@celiala
Copy link
Collaborator

celiala commented Apr 5, 2023

3 tests that fail in #99387 are:

(1) TestAlreadyRunningJobsAreHandledProperly. in pkg/upgrade/upgrademanager/upgrademanager_test

=== RUN   TestAlreadyRunningJobsAreHandledProperly
    manager_external_test.go:120:
          Error Trace:  pkg/upgrade/upgrademanager/upgrademanager_test_test/pkg/upgrade/upgrademanager/manager_external_test.go:120
          Error:        Received unexpected error:
                        sql: no rows in result set

(2) TestPauseMigration. in pkg/upgrade/upgrademanager/upgrademanager_test

=== RUN   TestPauseMigration
    manager_external_test.go:464: error scanning '&{<nil> 0xc003f80200}': sql: no rows in result set

(3) TestServerStartupGuardrails. in github.com/cockroachdb/cockroach/pkg/ccl/serverccl

I230405 05:30:21.943950 189123 upgrade/upgrademanager/manager.go:168  [T1,n1] 1170  running permanent upgrades up to version: 1000022.2
I230405 05:30:21.944731 189876 sql/temporary_schema.go:505  [T1,n1] 1171  running temporary object cleanup background job
I230405 05:30:21.945053 189877 sql/sqlstats/persistedsqlstats/provider.go:129  [T1,n1] 1172  starting sql-stats-worker with initial delay: 10m15.992188656s
I230405 05:30:21.946647 189895 kv/kvclient/rangefeed/rangefeedcache/watcher.go:335  [T1,n1] 1173  system-config-cache: established range feed cache
I230405 05:30:21.947169 189864 kv/kvserver/replica_rangefeed.go:701  [T1,n1,rangefeed=table-stats-cache,mux_n=1,gen=2,r21/1:/Table/2{0-1},s1] 1174  RangeFeed closed timestamp is empty
I230405 05:30:21.950611 189876 sql/temporary_schema.go:555  [T1,n1] 1175  found 0 temporary schemas
I230405 05:30:21.950720 189876 sql/temporary_schema.go:558  [T1,n1] 1176  early exiting temporary schema cleaner as no temporary schemas were found
I230405 05:30:21.950747 189876 sql/temporary_schema.go:559  [T1,n1] 1177  completed temporary object cleanup job
I230405 05:30:21.950765 189876 sql/temporary_schema.go:640  [T1,n1] 1178  temporary object cleaner next scheduled to run at 2023-04-05 06:00:21.944702855 +0000 UTC m=+1944.305053323
E230405 05:30:21.994611 189894 spanconfig/spanconfigmanager/manager.go:141  [T1,n1] 1179  error starting auto span config reconciliation job: get-jobs: system-jobs-scan: relation "system.job_info" does not exist
I230405 05:30:22.008682 189123 upgrade/upgrademanager/manager.go:229  [T1,n1] 1180  the last permanent upgrade (v22.2-98) does not appear to have completed; attempting to run all upgrades
I230405 05:30:22.009905 189123 upgrade/upgrademanager/manager.go:270  [T1,n1] 1181  running permanent upgrade for version 0.0-2
    test_server_shim.go:331: migration-manager-find-jobs: system-jobs-scan: relation "system.job_info" does not exist
        (1) attached stack trace
          -- stack trace:
          | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).execInternal.func1.1

@renatolabs
Copy link
Contributor

TestServerStartupGuardrails shouldn't be using versions that unconditionally apply the dev offset.

v := func(major, minor int32) roachpb.Version {
return roachpb.Version{Major: clusterversion.DevOffset + major, Minor: minor}
}

@knz
Copy link
Contributor Author

knz commented Apr 5, 2023

@renatolabs any suggestion as to how to achieve what the test wants to exercise, instead?

@renatolabs
Copy link
Contributor

@renatolabs any suggestion as to how to achieve what the test wants to exercise, instead?

Since that test already relies on version keys to pass a custom TenantLogicalVersionKey, we could be fetching the other version fields in that test by key as well instead of manually creating a roachpb.Version. I can open a PR if you think that's acceptable. Feel free to suggest anything else you may have in mind too.

@knz
Copy link
Contributor Author

knz commented Apr 5, 2023

I can open a PR if you think that's acceptable

Yes I would love that ❇️

renatolabs added a commit to renatolabs/cockroach that referenced this issue Apr 5, 2023
When overwriting versions for that test, we were unconditionally
applying `clusterversion.DevOffset` to all versions. While that works
on master, it makes the test break when it runs on a release
branch (where no offsetting happens). In this commit, we check whether
binary versions are offset, and only add the DevOffset if they are.

Unfortunately, we can't reference versions by key in this test as it
references versions from two releases ago.

Informs: cockroachdb#100685

Epic: none

Release note: None
craig bot pushed a commit that referenced this issue Apr 5, 2023
98240: release: split release process r=jlinder a=rail

Previously, the release publishing step contained building, signing and other operations that can happen in advance.

This PR splits the release process into 2 steps: build/sign and publishing. Artifacts generated during the build stage are published to a "staged" bucket and copied to the final bucket during the publishing phase.

* Extracted docker verification into a function.
* The version is read from a file now.
* Use `gsutil` in MacOS signing in order to download from private repos.
* Removed `PRE_RELEASE` variable set from TeamCity in favour of "guessing" it from the version.
* Simplified dry-run logic to define version/build names, because the version is defined in the source code now.
* The publishing phase updates docker image packages.

Epic: RE-307
Fixes: RE-211
Release note: None

100641: upgrades: update helper fn ExecForCountInTxns to retry transactions r=rafiss a=andyyang890

This patch updates the ExecForCountInTxns helper function to retry
transactions to prevent spurious test failures.

Fixes: #100158

Release note: None

100657: builtins: remove crdb_internal.no_constant_folding builtin r=yuzefovich a=yuzefovich

This function no longer does what it's intended to do (namely disabling the constant folding of the whole expression, for testing purposes) because it was only supported in the heuristics planner. This builtin is also not used anywhere in our tests.

Epic: None

Release note: None

100700: kvtenantccl: unskip TestTenantUpgrade r=rafiss a=nicktrav

This test was skipped as part of #97076. Re-enable the test.

Fix #97076.

Release note: None.

100701: serverccl: conditionally apply DevOffset in TestServerStartupGuardrails r=knz a=renatolabs

When overwriting versions for that test, we were unconditionally applying `clusterversion.DevOffset` to all versions. While that works on master, it makes the test break when it runs on a release branch (where no offsetting happens). In this commit, we check whether binary versions are offset, and only add the DevOffset if they are.

Unfortunately, we can't reference versions by key in this test as it references versions from two releases ago.

Informs: #100685

Epic: none

Release note: None

Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Nick Travers <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Apr 5, 2023
When overwriting versions for that test, we were unconditionally
applying `clusterversion.DevOffset` to all versions. While that works
on master, it makes the test break when it runs on a release
branch (where no offsetting happens). In this commit, we check whether
binary versions are offset, and only add the DevOffset if they are.

Unfortunately, we can't reference versions by key in this test as it
references versions from two releases ago.

Informs: #100685

Epic: none

Release note: None
@renatolabs
Copy link
Contributor

I'm not sure if this has already been identified, but setting developmentBranch to false was fine as far as TestPauseMigration is concerned up until #99458, at which point it started failing with that same error message (error scanning '&{<nil> 0xc009df9600}': sql: no rows in result set).

@adityamaru any ideas on this failure?

@adityamaru
Copy link
Contributor

adityamaru commented Apr 5, 2023

@renatolabs I'll spend some time understanding what developmentBranch has to do with the failure but applying this diff seems to fix the test. We don't expect system.jobs to contain any payload after 99458, this information has been moved to the system.job_info table. crdb_internal.system_jobs is a virtual table that joins those two tables so we can read the right columns from the right table:

diff --git a/pkg/upgrade/upgrademanager/manager_external_test.go b/pkg/upgrade/upgrademanager/manager_external_test.go
index f268b226cc1..f2960a57a51 100644
--- a/pkg/upgrade/upgrademanager/manager_external_test.go
+++ b/pkg/upgrade/upgrademanager/manager_external_test.go
@@ -454,7 +454,7 @@ func TestPauseMigration(t *testing.T) {
        var id int64
        tdb.QueryRow(t, `
 SELECT id
-  FROM system.jobs
+  FROM crdb_internal.system_jobs
  WHERE (
         crdb_internal.pb_to_json(
             'cockroach.sql.jobs.jobspb.Payload',

craig bot pushed a commit that referenced this issue Apr 6, 2023
99288: cdc: add apache arrow parquet library and writer r=miretskiy a=jayshrivastava

#### cdc: add apache arrow parquet library
This commit installs the apache arrow parquet library for Go
at version 11. The release can be found here:
https://github.com/apache/arrow/releases/tag/go%2Fv11.0.0

This library is licensed under the Apache License 2.0.

Informs: #99028
Epic: None
Release note: None

---
#### util/parquet: create parquet writer library
This change implements a `Writer` struct in the new `util/parquet` package.
This `Writer` writes datums to the `io.Writer` sink
using a configurable parquet version (defaults to v2.6).


The package implements several features internally required to write in the parquet format:
- schema creation
- row group / column page management
- encoding/decoding of CRDB datums to parquet datums
Currently, the writer only supports types found in the TPCC workload, namely INT, DECIMAL, STRING
UUID, TIMESTAMP and BOOL.

This change also adds a benchmark and tests which verify the correctness of the
writer and test utils for reading datums from parquet files.

Informs: #99028
Epic: None
Release note: None

--- 
#### changefeedccl: add parquet writer 
This change adds the file `parquet.go` which contains
helper functions to help create parquet writers
and export data via `cdcevent.Row` structs.

This change also adds tests to ensure rows are written
to parquet files correctly.

Epic: None
Release note: None

100830: upgrademanager: fix upgrade manager tests that relied on wrong invariants r=knz a=adityamaru

The two tests in question were low level tests that upgrade clusters from mock version 41 to 42 and override the upgrade flow to test the internals of the upgrade manager. These mock cluster versions were not tied to our real cluster versions and so were not offset with the `DevOffset` when a branch is marked as a development branch. For this reason when a branch was a developmentBranch they would sort under all the "real" cluster versions while when a branch was marked as a release branch they would sort above all the "real" cluster versions. This undeterministic behaviour did not play well with certain job migrations that were added this release.

This change rewrites the test to use real cluster versions so that their values are in sync with whether the branch is a developmentBranch or not. This change also removes some overrides to make the test more intuitive. Cocnretely, the test servers now bootstrap at the minimum supported binary version and run all the upgrades until the `startCV` before executing the body of the test.

Fixes: #100685
Informs: #100552

Release note: None

Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: adityamaru <[email protected]>
@craig craig bot closed this as completed in ba03007 Apr 6, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 6, 2023
…ants

The two tests in question were low level tests that upgrade clusters
from mock version 41 to 42 and override the upgrade flow to
test the internals of the upgrade manager. These mock cluster versions
were not tied to our real cluster versions and so were not offset
with the `DevOffset` when a branch is marked as a development branch.
For this reason when a branch was a developmentBranch they would sort
under all the "real" cluster versions while when a branch was marked
as a release branch they would sort above all the "real" cluster versions.
This undeterministic behaviour did not play well with certain job migrations
that were added this release.

This change rewrites the test to use real cluster versions so that their
values are in sync with whether the branch is a developmentBranch or not.
This change also removes some overrides to make the test more intuitive.
Cocnretely, the test servers now bootstrap at the minimum supported binary version
and run all the upgrades until the `startCV` before executing the body of the
test.

Fixes: #100685
Informs: #100552

Release note: None
@adityamaru adityamaru reopened this Apr 6, 2023
@adityamaru
Copy link
Contributor

reopening until the backport is merged

@adityamaru
Copy link
Contributor

backport has been merged!

@celiala
Copy link
Collaborator

celiala commented Apr 6, 2023

Thank you, all! 🙌🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-jobs branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-jobs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants