-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: TestDatabasesTablesV2 is flakey under testrace #87074
Comments
Refs: cockroachdb#87074 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None
This is not a SQL experience test. Moving to SQL Observability. |
87075: server: skip TestDatabasesTablesV2 under race r=srosenberg a=adityamaru Refs: #87074 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None Co-authored-by: adityamaru <[email protected]>
Before I understand anything here, on a gceworker, I can get --- a/pkg/rpc/tls.go
+++ b/pkg/rpc/tls.go
@@ -231,7 +231,7 @@ func (ctx *SecurityContext) GetUIServerTLSConfig() (*tls.Config, error) {
// if needed. It uses the client TLS config.
func (ctx *SecurityContext) GetHTTPClient() (http.Client, error) {
ctx.lazy.httpClient.Do(func() {
- ctx.lazy.httpClient.httpClient.Timeout = 10 * time.Second
+ ctx.lazy.httpClient.httpClient.Timeout = 30 * time.Second
var transport http.Transport
ctx.lazy.httpClient.httpClient.Transport = &transport
transport.TLSClientConfig, ctx.lazy.httpClient.err = ctx.getUIClientTLSConfig() I also notice that the failures, both historically and that I've observed in my own testing, have all been on the same endpoint, the Both of which lead me to believe that this endpoint could just be slow? I'll dial down the timeout to see if I can find a reliable threshold (trying 20s now) and also start looking into the implementation to see if there's anything obvious there. If not, options seem to be bumping the timeout, keeping the test skipped, or commenting out just the slow part. |
And, a clue, the first failures here happened in the development of #82617, which would corroborate the hypothesis that the endpoint just got slower. I'll look for the base commit there and try a git bisect to confirm. |
It's indeed the While we could bump the timeout globally (unnecessarily lowering our signal for other slow endpoints), refactor to handle overriding the timeout on a request basis (less-than-straightforward with the http client library), or make a copy of the client with a higher timeout for this one request (current code would require dubious restructuring), I'm inclined to keep this test skipped until the underlying work in #84105 helps speed up the endpoint for real. See also #90196. |
106072: server: close net.Conn unconditionally after accepting r=rafiss,knz a=ecwall Informs #105448 Previously after the server accepted a connection, it was closed in multiple paths and in multiple layers after it was done being used. Now it is always closed in the same layer that accepted it after the serveConn callback returns. Release note: None 106238: server: fix crdb_internal.cluster_inflight_traces in shared process mode r=yuzefovich a=yuzefovich This commit fixes a panic that would previously occur when querying `crdb_internal.cluster_inflight_traces` virtual table when running in shared-process multi-tenant mode. In particular, the problem was that we tried to access node liveness which isn't available, and now we will fall back to the multi-tenant way of doing things (using the instances reader). Additionally, this commit extends the existing test to also run in shared-process multi-tenant config which serves as a regression test for this bug. There is no release note since it's not a user-visible bug. Fixes: #106182. Epic: CRDB-26691 Release note: None 106320: sql: unskip TestSchemaChangeGCJob r=chengxiong-ruan a=chengxiong-ruan Informs: #85876 fixes #60664 A few assumptions has been changed since the test was skipped. For example, the first user defined table descriptor ID and expected error messages. There was actually also bugs in the error assertion in the test for some reason, that is fixed by this PR too. Release note: None 106407: sql: add tests for CTAS, CMVAS with every SHOW statement r=chengxiong-ruan a=ecwall Fixes #105895 Adds tests for CREATE TABLE AS, CREATE MATERIALIZED VIEW AS sourcing from all SHOW statements. Release note: None 106420: server: unskip TestDatabasesTablesV2 r=THardy98 a=THardy98 Resolves: #87074 This test no longer seems flaky after testing (via `./dev test pkg/server --filter=TestDatabasesTablesV2 --stress --stress-args="-p=4" --race --timeout=30m` on gceworker). Release note: None Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]> Co-authored-by: Thomas Hardy <[email protected]>
Resolves: #87074 This test no longer seems flaky after testing (via `./dev test pkg/server --filter=TestDatabasesTablesV2 --stress --stress-args="-p=4" --race --timeout=30m` on gceworker). Release note: None
Recently
TestDatabasesTablesV2
has been flakey under testrace - https://teamcity.cockroachdb.com/project.html?projectId=Cockroach_Ci_Tests&testNameId=-5148906532967248544&tab=testDetails. All the failures seem to be off the form:Jira issue: CRDB-19138
The text was updated successfully, but these errors were encountered: