Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: RPC dialback does not resolve failover/non-system/blackhole-recv #99104

Closed
erikgrinaker opened this issue Mar 21, 2023 · 1 comment · Fixed by #99840
Closed

kvserver: RPC dialback does not resolve failover/non-system/blackhole-recv #99104

erikgrinaker opened this issue Mar 21, 2023 · 1 comment · Fixed by #99840
Assignees
Labels
A-kv-server Relating to the KV-level RPC server branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-kv KV Team

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Mar 21, 2023

In #94778, we added RPC dialback to resolve asymmetric network partitions. This fixed the failover/non-system/blackhole-send roachtest, but the failover/non-system/blackhole-recv test still shows persistent unavailability. We need to find out why and fix it.

Jira issue: CRDB-25705

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-server Relating to the KV-level RPC server T-kv KV Team labels Mar 21, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 27, 2023

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 29, 2023
Fixes: cockroachdb#99104
Informs cockroachdb#84289

As part of the previous fix for partition handling, we tracked the state
of a previous attempt and use that result on the next attempt. However
if there were multiple connections, we may only block system traffic
connections and not default class connections.  This change addresses
that by ensuring a failed dialback attempt is remembered until we are
able to successfully connect back to the pinging node.

Epic: none
Release note: None
@erikgrinaker erikgrinaker added branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Mar 30, 2023
craig bot pushed a commit that referenced this issue Mar 30, 2023
99429: ccl/sqlproxyccl: add PROXY protocol support via CLI flag to sqlproxy r=pjtatlow a=jaylim-crl

This commits adds a new `require-proxy-protocol` flag to `mt start-proxy`, and
that changes the sqlproxy's behavior to support the PROXY protocol. When the
flag is set, the protocol will be enforced on the SQL listener, and supported
on a best-effort basis on the HTTP listener. If the PROXY protocol isn't used,
but is enforced, the connection will be rejected. The rationale behind doing
best-effort basis on the HTTP listener is that some healthcheck systems don't
support the protocol.

This work is needed for the AWS PrivateLink work in CockroachCloud, which
requires the use of the PROXY protocol.

Release note: None

Epic: none

Release justification: SQL Proxy change only. Changes are needed for the AWS
PrivateLink work in CockroachCloud.

99840: rpc: Fix blackhole recv r=erikgrinaker a=andrewbaptist

Fixes: #99104
Informs #84289

As part of the previous fix for partition handling, we tracked the state of a previous attempt and use that result on the next attempt. However if there were multiple connections, we may only block system traffic connections and not default class connections.  This change addresses that by ensuring a failed dialback attempt is remembered until we are able to successfully connect back to the pinging node.

Epic: none
Release note: None

Co-authored-by: Jay <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
@craig craig bot closed this as completed in 53c94ad Mar 30, 2023
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 31, 2023
As part of the fix of cockroachdb#99104, a cast without a nil check was introduced.
This PR addresses that by only casting if it is known to be not nil.

Epic: none
Fixes: cockroachdb#100275
Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 31, 2023
As part of the fix of cockroachdb#99104, a cast without a nil check was introduced.
This PR addresses that by only casting if it is known to be not nil.

Epic: none
Fixes: cockroachdb#100275
Release note: None
andrewbaptist added a commit that referenced this issue Mar 31, 2023
Fixes: #99104
Informs #84289

As part of the previous fix for partition handling, we tracked the state
of a previous attempt and use that result on the next attempt. However
if there were multiple connections, we may only block system traffic
connections and not default class connections.  This change addresses
that by ensuring a failed dialback attempt is remembered until we are
able to successfully connect back to the pinging node.

Epic: none
Release note: None
craig bot pushed a commit that referenced this issue Mar 31, 2023
99312: sqlsmith: add DEFAULT expressions to newly added columns r=mgartner a=mgartner

Sqlsmith now builds `ALTER TABLE .. ADD COLUMN .. DEFAULT` statements
with default expressions that have different types than the column type.
This is allowed if the default expression type can be assignment-casted
to the column's type.

Fixes #98133

Release note: None


99348: testutils: move default test tenant message r=rharding6373 a=herkolategan

In order to reduce logging noise but still inform test authors of the default test tenant, the message has been moved to where there is a `testing.TB` interface.

Epic: CRDB-18499

99835: opt/execbuilder: add panic catching to buildRoutinePlanGenerator r=mgartner a=mgartner

This commit adds a panic catcher to callback functions created in
execbuilder and invoked during evaluation of UDFs and correlated
subqueries. It matches the panic catcher logic in `buildApplyJoin`.

Fixes #98786

Release note: None


100267: roachtest: own autoupgrade to TestEng r=renatolabs a=tbg

Discussed in #99479.

Epic: none
Release note: None


100286: roachtest: prevent aws roachtest panic r=rail a=msbutler

After #99723 merged as a bandaid for #98783, the aws roachtest nightly began to panic because of a different roachtest papercut #96655. Specifically, because roachtest filters which tests run on which cloud within the evaluation of the test closure, tests meant to run on gce will still get registered in an AWS run. During the registration of the gce test
`restore/tpce/400GB/gce/nodes=4/cpus=8/lowmem` _on aws_, the aws test harness panics because the aws roachprod implementation does not have a low memory cpu configuration. This patch prevents this panic and should be reverted once the pr #99402 merges.

Epic: None

Release note: None

100294: tenantcapabilitiestestutils: add a missing default case r=ajwerner a=ajwerner

The test should fail if we ever add a new type of capability and use it in the data driven test but don't update the test to handle it.

Epic: none

Follow-up from #100217 (review)

Release note: None

100296: rpc: correctly check for nil before cast r=ajwerner a=andrewbaptist

As part of the fix of #99104, a cast without a nil check was introduced. This PR addresses that by only casting if it is known to be not nil.

Epic: none
Fixes: #100275
Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Herko Lategan <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: ajwerner <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-server Relating to the KV-level RPC server branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants