Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typedesc: copy composite type elements for hydration #119866

Closed
cockroach-teamcity opened this issue Mar 4, 2024 · 3 comments · Fixed by #119880
Closed

typedesc: copy composite type elements for hydration #119866

cockroach-teamcity opened this issue Mar 4, 2024 · 3 comments · Fixed by #119880
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 4, 2024

pkg/sql/logictest/tests/fakedist-disk/fakedist-disk_test.TestLogic_composite_types failed with artifacts on master @ 51bbfff84c26be8a2b40e25b4bce3d59ea63dc59:

      github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher.go:177 +0xca8
  github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigsqlwatcher.(*SQLWatcher).WatchForSQLUpdates()
      github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher.go:91 +0x5c
  github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*incrementalReconciler).reconcile()
      github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:497 +0x73b
  github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*Reconciler).Reconcile()
      github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:188 +0x57e
  github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigjob.(*resumer).Resume()
      github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigjob/job.go:162 +0x1323
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func2()
      github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1597 +0x1b0
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine()
      github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1598 +0xf11
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob()
      github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:456 +0x811
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1()
      github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:290 +0x1ab
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x262

Goroutine 3417479 (running) created at:
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).serveImpl()
      github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:996 +0xa4f
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn()
      github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:812 +0xe04
  github.com/cockroachdb/cockroach/pkg/server.(*systemServerWrapper).serveConn()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_sql.go:175 +0x1d7
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).sqlMux()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_sql.go:95 +0x50b
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).sqlMux-fm()
      <autogenerated>:1 +0xc4
  github.com/cockroachdb/cockroach/pkg/server.startServeSQL.func1.1()
      github.com/cockroachdb/cockroach/pkg/server/server_sql.go:1888 +0x26b
  github.com/cockroachdb/cockroach/pkg/util/netutil.(*TCPServer).ServeWith.func1()
      github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:191 +0x1c6
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x262

Goroutine 3413313 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x69c
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0x5e4
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob()
      github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:284 +0x25b
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeClaimedJobs.func3()
      github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:215 +0xc4
  github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeClaimedJobs.gowrap1()
      github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:218 +0x41
==================

Parameters:

  • TAGS=bazel,gss
  • stress=true
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-36362

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team labels Mar 4, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Mar 4, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Mar 4, 2024
@DrewKimball DrewKimball removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Mar 4, 2024
@DrewKimball DrewKimball self-assigned this Mar 4, 2024
@DrewKimball DrewKimball moved this from Triage to Active in SQL Queries Mar 4, 2024
@DrewKimball
Copy link
Collaborator

Looks like (hopefully) once last fix for the type-hydration data races - the elements of a composite type descriptor need to be copied in immutable.AsTypesT().

@yuzefovich
Copy link
Member

Same as #119780?

@DrewKimball DrewKimball changed the title pkg/sql/logictest/tests/fakedist-disk/fakedist-disk_test: TestLogic_composite_types failed typedesc: copy composite type elements for hydration Mar 4, 2024
@DrewKimball
Copy link
Collaborator

Yes, I'll close that one in favor of this.

craig bot pushed a commit that referenced this issue Mar 4, 2024
118943: kvcoord: add DistSender circuit breakers r=nvanbenschoten a=erikgrinaker

This patch adds an initial implementation of DistSender replica circuit breakers. Their primary purpose is to prevent the DistSender getting stuck on non-functional replicas. In particular, the DistSender relies on receiving a NLHE from the replica to update its range cache and try other replicas, otherwise it will keep sending requests to the same broken replica which will continue to get stuck, giving the appearance of an unavailable range. This can happen if:

- The replica stalls, e.g. with a disk stall or mutex deadlock.

- Clients time out before the replica lease acquisition attempt times out, e.g. if the replica is partitioned away from the leader.

If a replica has returned only errors in the past few seconds, or hasn't returned any responses at all, the circuit breaker will probe the replica by sending a `LeaseInfo` request. This must either return success or a NLHE pointing to a leaseholder.  Otherwise, the circuit breaker trips, and the DistSender will skip it for future requests, optionally also cancelling in-flight requests.

Currently, only replica-level circuit breakers are implemented. If a range is unavailable, the DistSender will continue to retry replicas as today. Range-level circuit breakers can be added later if needed, but are considered out of scope here.

The circuit breakers are disabled by default for now. Some follow-up work is likely needed before they can be enabled by default:

* Improve probe scalability. Currently, a goroutine is spawned per replica probe, which is likely too expensive at large scales. We should consider batching probes to nodes/stores, and using a bounded worker pool.

* Consider follower read handling, e.g. by tracking the replica's closed timestamp and allowing requests that may still be served by it even if it's partitioned away from the leaseholder.

* Improve observability, with metrics, tracing, and logging.

* Comprehensive testing and benchmarking.

This will be addressed separately.

Resolves #105168.
Resolves #104262.
Resolves  #81100.
Resolves #80713.
Epic: none
Release note (general change): gateways will now detect faulty or stalled replicas and use other replicas instead, which can prevent them getting stuck in certain cases (e.g. with disk stalls). This behavior can be disabled via the cluster setting `kv.dist_sender.circuit_breaker.enabled`.

119880: typedesc: copy composite type elements in `AsTypesT` r=DrewKimball a=DrewKimball

This commit adds copying for the elements of a composite type in the `TypeDescriptor.AsTypesT` method. This avoids data races during type hydration.

Fixes #119866

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in 1bec08c Mar 4, 2024
@github-project-automation github-project-automation bot moved this from Active to Done in SQL Queries Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants