Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: TestStartableJobMixedVersion failed #97450

Closed
cockroach-teamcity opened this issue Feb 22, 2023 · 2 comments · Fixed by #97539
Closed

jobs: TestStartableJobMixedVersion failed #97450

cockroach-teamcity opened this issue Feb 22, 2023 · 2 comments · Fixed by #97539
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-jobs
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 22, 2023

jobs.TestStartableJobMixedVersion failed with artifacts on master @ 286b3e235171a39b8f9910555affcc7ce310741a:

github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:598 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:600 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???

goroutine 12162624 lock 0xc006a2c518
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:219 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:218 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:189 kvserver.newRebalanceObjectiveManager.func2 ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:227 clusterversion.(*handleImpl).SetOnChange.func1 ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:145 settings.(*Values).settingChanged ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:178 settings.(*Values).setGeneric ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:220 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:219 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:146 server.bumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:101 server.(*migrationServer).BumpClusterVersion.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:598 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:600 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???



Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/jobs

This test on roachdash | Improve this report!

Jira issue: CRDB-24704

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 22, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 22, 2023
@blathers-crl blathers-crl bot added the T-jobs label Feb 22, 2023
@kvoli
Copy link
Collaborator

kvoli commented Feb 22, 2023

Caused by #97424. I'll look into this.

@kvoli kvoli self-assigned this Feb 22, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 23, 2023
Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: cockroachdb#97000
Resolves: cockroachdb#97445
Resolves: cockroachdb#97450
Resolves: cockroachdb#97452
Resolves: cockroachdb#97457

Release note: None
@cockroach-teamcity
Copy link
Member Author

jobs.TestStartableJobMixedVersion failed with artifacts on master @ 3a1564a60dc169edea9e8fdb65747d323b484df9:

github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:599 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:601 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???

goroutine 12266968 lock 0xc01e419658
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:219 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:218 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:189 kvserver.newRebalanceObjectiveManager.func2 ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:227 clusterversion.(*handleImpl).SetOnChange.func1 ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:145 settings.(*Values).settingChanged ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:178 settings.(*Values).setGeneric ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:220 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:219 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:146 server.bumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:101 server.(*migrationServer).BumpClusterVersion.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:599 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:601 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???



Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

kvoli added a commit to kvoli/cockroach that referenced this issue Feb 23, 2023
Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: cockroachdb#97000
Resolves: cockroachdb#97445
Resolves: cockroachdb#97450
Resolves: cockroachdb#97452
Resolves: cockroachdb#97457

Release note: None
craig bot pushed a commit that referenced this issue Feb 24, 2023
97148: changefeedccl: Expire protected timestamps r=miretskiy a=miretskiy

Changefeeds utilize protected timestamp system (PTS)
to ensure that the data targeted by changefeed is not
garbage collected prematurely.  PTS record is managed
by running changefeed by periodically updating
PTS record timestamp, so that the data older than
the that timestamp may be GCed.  However, if the
changefeed stops running when it is paused (either due
to operator action, or due to `on_error=pause` option,
the PTS record remains so that the changefeed can
be resumed at a later time. However, it is also possible
that operator may not notice that the job is paused for
too long, thus causing buildup of garbage data.

Excessive buildup of GC work is not great since it
impacts overall cluster performance, and, once GC can resume,
its cost is proportional to how much GC work needs to be done.
This PR introduces a new changefeed option
`gc_protect_expires_after` to automatically expire PTS records that
are too old.  This automatic expiration is a safety mechanism
in case changefeed job gets paused by an operator or due to
an error, while holding onto PTS record due to `protect_gc_on_pause`
option.
The operator is still expected to monitor changefeed jobs,
and to restart paused changefeeds expediently.  If the changefeed
job remains paused, and the underlying PTS records expires, then
the changefeed job will be canceled to prevent build up of GC data.

Epic: [CRDB-21953](https://cockroachlabs.atlassian.net/browse/CRDB-21953)
Informs #84598

Release note (enterprise change): Changefeed will automatically
expire PTS records for paused jobs if changefeed is configured
with `gc_protect_expires_after` option.

97539: kvserver: fix deadlock on rebalance obj change r=kvoli a=kvoli

Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: #97000
Resolves: #97445
Resolves: #97450
Resolves: #97452
Resolves: #97457

Release note: None

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in 51f8f8e Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-jobs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants