Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/pgwire: TestAuthenticationAndHBARules failed #131532

Closed
cockroach-teamcity opened this issue Sep 27, 2024 · 5 comments · Fixed by #135086 or mohini-crl/cockroach#34
Closed

sql/pgwire: TestAuthenticationAndHBARules failed #131532

cockroach-teamcity opened this issue Sep 27, 2024 · 5 comments · Fixed by #135086 or mohini-crl/cockroach#34
Assignees
Labels
branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-product-security

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 27, 2024

sql/pgwire.TestAuthenticationAndHBARules failed with artifacts on master @ cafd11fa98b93cdaf43f4d0210947fb2c0fbcd25:

=== RUN   TestAuthenticationAndHBARules
--- FAIL: TestAuthenticationAndHBARules (32.08s)
=== RUN   TestAuthenticationAndHBARules/insecure=false
    --- FAIL: TestAuthenticationAndHBARules/insecure=false (20.87s)
=== RUN   TestAuthenticationAndHBARules/insecure=false/special_cases
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/9df7e2b43c6703c48b37b40add1a12cd/logTestAuthenticationAndHBARules_insecure=false_special_cases3799389318
    test_server_shim.go:93: cluster virtualization disabled due to issue: #107310 (expected label: C-bug)
    datadriven.go:144: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/14828/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/pgwire/pgwire_test_/pgwire_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/pgwire/testdata/auth/special_cases:3:
        config [1 args]
        <no input to command>
        ----
    datadriven.go:144: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/14828/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/pgwire/pgwire_test_/pgwire_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/pgwire/testdata/auth/special_cases:6:
        sql [0 args]
        CREATE USER passworduser WITH PASSWORD 'pass'
        ----
        ok
    panic.go:626: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/9df7e2b43c6703c48b37b40add1a12cd/logTestAuthenticationAndHBARules_insecure=false_special_cases3799389318
        --- FAIL: TestAuthenticationAndHBARules/insecure=false/special_cases (0.70s)
=== RUN   TestAuthenticationAndHBARules/insecure=false/special_cases/root_user_cannot_use_password
    datadriven.go:259: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/14828/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/pgwire/pgwire_test_/pgwire_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/pgwire/testdata/auth/special_cases:17:
        set_hba [0 args]
        host all root 0.0.0.0/0 password
        ----
        # Active authentication configuration on this node:
        # Original configuration:
        # loopback all all all trust       # built-in CockroachDB default
        # host  all root all cert-password # CockroachDB mandatory rule
        # host all root 0.0.0.0/0 password
        #
        # Interpreted configuration:
        # TYPE   DATABASE USER ADDRESS   METHOD        OPTIONS
        loopback all      all  all       trust
        host     all      root all       cert-password
        host     all      root 0.0.0.0/0 password
    datadriven.go:259: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/14828/execroot/com_github_cockroachdb_cockroach/bazel-out/aarch64-fastbuild/bin/pkg/sql/pgwire/pgwire_test_/pgwire_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/pgwire/testdata/auth/special_cases:32:
         
        expected:
        ERROR: password authentication failed for user root (SQLSTATE 28P01)
        
        found:
        ERROR: pq: SSL is not enabled on the server
            --- FAIL: TestAuthenticationAndHBARules/insecure=false/special_cases/root_user_cannot_use_password (0.02s)
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/sql-foundations @cockroachdb/server @cockroachdb/product-security

This test on roachdash | Improve this report!

Jira issue: CRDB-42588

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-product-security T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Sep 27, 2024
souravcrl added a commit to souravcrl/cockroach that referenced this issue Sep 30, 2024
…test

informs cockroachdb#131532
informs cockroachdb#131110
informs cockroachdb#130253
informs cockroachdb#127745
Epic: CRDB-41958

`TestAuthenticationAndHBARules` fails for special_cases data driven test. We
suspect it might be due to client for `special_cases` test accessing the test
server from a previous test `secure_non_tls` which sets `accept_sql_without_tls`
to true. This results in the following error `ERROR: pq: SSL is not enabled on
the server` while the client was expecting an SSL connection with the server. We
fix this in the PR.

Release note: None
@souravcrl
Copy link
Contributor

@yuzefovich This seems to be a flake as I can't reproduce this locally. This might be related to test tenants bug which prevents TestAuthenticationAndHBARules from running in multi tenant mode #107310.
As mentioned in issue, this might be related to AcceptSQLWithoutTLS which gets set in another data driven test.
I have a PR out which addresses this #131580.
cc: @pritesh-lahoti

@souravcrl
Copy link
Contributor

souravcrl commented Sep 30, 2024

I am also seeing the log is server logs

I240927 18:43:12.286698 42625 1@util/log/event_log.go:44 ⋮ [T1,Vsystem,n1,client=127.0.0.1:41266,hostssl,user=root] 367 ={"Timestamp":1727462592284106044,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"server.host_based_authentication.configuration\" = $1","Tag":"SET CLUSTER SETTING","User":"root","PlaceholderValues":["‹'host all root 0.0.0.0/0 password'›"],"SettingName":"server.host_based_authentication.configuration","Value":"‹'host all root 0.0.0.0/0 password'›"}
W240927 18:43:12.299470 42884 kv/kvserver/intentresolver/intent_resolver.go:862 ⋮ [-] 368  failed to gc transaction record: could not GC completed transaction anchored at /Table/4/1/‹"passworduser"›/‹2›/‹1›: node unavailable; try another peer

The message says could not GC completed transaction anchored at /Table/4/1/‹"passworduser"›/‹2›/‹1›: node unavailable; try another peer
So might be related to kv failure

@rafiss
Copy link
Collaborator

rafiss commented Sep 30, 2024

@souravcrl in case you hadn't seen, this is some information I found when I last investigated this test: #126239 (comment)

@exalate-issue-sync exalate-issue-sync bot added P-2 Issues/test failures with a fix SLA of 3 months and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 30, 2024
craig bot pushed a commit that referenced this issue Oct 1, 2024
131580: sql: fix TestAuthenticationAndHBARules for special_cases data-driven test r=rafiss a=souravcrl

informs #131532
informs #131110
informs #130253
informs #127745
Epic: CRDB-41958

`TestAuthenticationAndHBARules` fails for special_cases data driven test. We suspect it might be due to client for `special_cases` test accessing the test server from a previous test `secure_non_tls` which sets `accept_sql_without_tls` to true. This results in the following error `ERROR: pq: SSL is not enabled on the server` while the client was expecting an SSL connection with the server. We fix this in the PR.

Release note: None

Co-authored-by: souravcrl <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Oct 1, 2024
…test

informs #131532
informs #131110
informs #130253
informs #127745
Epic: CRDB-41958

`TestAuthenticationAndHBARules` fails for special_cases data driven test. We
suspect it might be due to client for `special_cases` test accessing the test
server from a previous test `secure_non_tls` which sets `accept_sql_without_tls`
to true. This results in the following error `ERROR: pq: SSL is not enabled on
the server` while the client was expecting an SSL connection with the server. We
fix this in the PR.

Release note: None
@souravcrl
Copy link
Contributor

@rafiss had missed that. Thanks for the pointer. I am also not sure how this case could be since we start the server in secure mode for the test special_cases and GetServerTLSConfig also does not return an error implying server was started in insecure mode, otherwise we would have seen a certs error:

cockroach/pkg/rpc/tls.go

Lines 147 to 166 in 607c697

// GetServerTLSConfig returns the server TLS config, initializing it if needed.
// If Insecure is true, return a nil config, otherwise ask the certificate
// manager for a server TLS config.
func (ctx *SecurityContext) GetServerTLSConfig() (*tls.Config, error) {
// Early out.
if ctx.config.Insecure {
return nil, nil
}
cm, err := ctx.GetCertificateManager()
if err != nil {
return nil, wrapError(err)
}
tlsCfg, err := cm.GetServerTLSConfig()
if err != nil {
return nil, wrapError(err)
}
return tlsCfg, nil
}
.
It is possible that the goroutine running the client for the test TestAuthenticationAndHBARules/insecure=false had a leak and is accessing the test server setup for TestAuthenticationAndHBARules/insecure=true/....

blathers-crl bot pushed a commit that referenced this issue Oct 1, 2024
…test

informs #131532
informs #131110
informs #130253
informs #127745
Epic: CRDB-41958

`TestAuthenticationAndHBARules` fails for special_cases data driven test. We
suspect it might be due to client for `special_cases` test accessing the test
server from a previous test `secure_non_tls` which sets `accept_sql_without_tls`
to true. This results in the following error `ERROR: pq: SSL is not enabled on
the server` while the client was expecting an SSL connection with the server. We
fix this in the PR.

Release note: None
blathers-crl bot pushed a commit that referenced this issue Oct 1, 2024
…test

informs #131532
informs #131110
informs #130253
informs #127745
Epic: CRDB-41958

`TestAuthenticationAndHBARules` fails for special_cases data driven test. We
suspect it might be due to client for `special_cases` test accessing the test
server from a previous test `secure_non_tls` which sets `accept_sql_without_tls`
to true. This results in the following error `ERROR: pq: SSL is not enabled on
the server` while the client was expecting an SSL connection with the server. We
fix this in the PR.

Release note: None
souravcrl added a commit that referenced this issue Oct 1, 2024
…test

informs #131532
informs #131110
informs #130253
informs #127745
Epic: CRDB-41958

`TestAuthenticationAndHBARules` fails for special_cases data driven test. We
suspect it might be due to client for `special_cases` test accessing the test
server from a previous test `secure_non_tls` which sets `accept_sql_without_tls`
to true. This results in the following error `ERROR: pq: SSL is not enabled on
the server` while the client was expecting an SSL connection with the server. We
fix this in the PR.

Release note: None
souravcrl added a commit to souravcrl/cockroach that referenced this issue Oct 18, 2024
informs cockroachdb#131110
informs cockroachdb#130253
informs cockroachdb#127745
Epic CRDB-41958

TestAuthenticationAndHBARules fails for `special_cases`,
`hba_default_equivalence`, `empty_hba` data driven tests for secure mode. The
failures occur when root user is trying to authenticate with cert-password auth
method and `sslmode` is set to `verify-ca` with `sslcert` being empty. The
expected behavior is root authentication defaults to password method and fails
as no password is set for root, but instead we get:
```
SSL is not enabled on the server
```
Since the failures are there only under stress, it might be because db server
shutdown or paused before responding to request for upgrade connection to SSL
from lib/pq client from here
https://github.com/lib/pq/blob/3d613208bca2e74f2a20e04126ed30bcb5c4cc27/conn.go#L1116-L1130.
Retrying connection establishment when this specific error is obtained might fix
the problem as this logic seems faulty(it checks for absence of 'S' in server
response whereas the correct check should be for 'N' in response).

Release note: None
souravcrl added a commit to souravcrl/cockroach that referenced this issue Oct 19, 2024
informs cockroachdb#131532
informs cockroachdb#131110
informs cockroachdb#130253
informs cockroachdb#127745
Epic CRDB-41958

TestAuthenticationAndHBARules fails for `special_cases`,
`hba_default_equivalence`, `empty_hba` data driven tests for secure mode. The
failures occur when root user is trying to authenticate with cert-password auth
method and `sslmode` is set to `verify-ca` with `sslcert` being empty. The
expected behavior is root authentication defaults to password method and fails
as no password is set for root, but instead we get:
```
SSL is not enabled on the server
```
Since the failures are there only under stress, it might be because db server
shutdown or paused before responding to request for upgrade connection to SSL
from lib/pq client from here
https://github.com/lib/pq/blob/3d613208bca2e74f2a20e04126ed30bcb5c4cc27/conn.go#L1116-L1130.
Retrying connection establishment when this specific error is obtained might fix
the problem as this logic seems faulty(it checks for absence of 'S' in server
response whereas the correct check should be for 'N' in response).

Release note: None
craig bot pushed a commit that referenced this issue Oct 29, 2024
133688: pgwire: add more test logs to debug flaky test r=rafiss a=rafiss

informs #127745
informs #133360
informs #131532
informs #131110
Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in afd5d06 Nov 14, 2024
Copy link

blathers-crl bot commented Nov 14, 2024

Based on the specified backports for linked PR #135086, I applied the following new label(s) to this issue: branch-release-23.1, branch-release-23.2, branch-release-24.1, branch-release-24.2, branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-product-security
Projects
None yet
4 participants