Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: set 23.1.8 as MinSupportedVersion for sql-stats/mixed-version #123321

Merged
merged 1 commit into from
May 1, 2024

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Apr 30, 2024

There is a bug in the sql stats api prior to 23.1.8, where
passing a requested start time of 0 or nil results in a
nil pointer dereference.

Setting this min supported version also resolves some mixed
version authentication limitations.

Epic: none
Fixes: #123278

Release note: None

@xinhaoz xinhaoz requested a review from a team as a code owner April 30, 2024 20:00
@xinhaoz xinhaoz requested review from DarrylWong and renatolabs and removed request for a team April 30, 2024 20:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz requested review from a team, nkodali, dhartunian and abarganier and removed request for a team April 30, 2024 20:01
@@ -203,7 +203,9 @@ func (s *sqlStatsRequestHelper) requestSQLStats(
for _, addr := range adminUIAddrs {
url := getCombinedStatementStatsURL(addr, statsType, requestedRange)
statsResponse := &serverpb.StatementsResponse{}
if err := s.client.GetJSON(ctx, url, statsResponse, httputil.IgnoreUnknownFields()); err != nil {
if err := retry.ForDuration(10*time.Second, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the retry logic? What kinds of spurious failures do we expect here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly motivated from seeing it applied here:

return client.GetJSON(ctx, url, &details)
, where it seems like the status server was not yet ready to serve requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, but that change was done in 2018 (#29492), does it still apply today? Have we seen this test flake because the status server was not ready to handle requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay that seems pretty outdated. So far this test has failed twice in the nightly run at this step, but with different error messages:

  1. Get "https://3.147.89.154:26258/_status/combinedstmts?fetch_mode.stats_type=0&start=0&end=100&limit=10": EOF [ref]

  2. roachtestutil.addCookie: unable to extract sessionID: roachtestutil.GetSessionID: failed to authenticate: COMMAND_PROBLEM: exit status 1 [ref]

Do these point to an issue with the mixed version test or an issue within roachtest? Is there guidance on restricting the test to certain cloud providers? Running the test locally passed so these failures seemed sporadic to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For failure 2, I see: ERROR: column "user_id" does not exist when trying to auth on 23.1. Since I removed the version gate for that, I think the fix is for the mvt framework to also drop support for 23.1 on master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix is for the mvt framework to also drop support for 23.1 on master?

Ah my bad, I didn't realize this is why the PR had MaxUpgrades(2) originally. I forgot the details on when the auth stuff works. I suggest passing the MinimumSupportedVersion("v23.1.0") option to the test, which will make sure user functions will only run once the cluster is at least that version.

Do these point to an issue with the mixed version test or an issue within roachtest?

I think neither in this case? Failure 2 is a limitation in cockroach with authenticating in mixed-version state, as I understand it. It can be worked around by specifying the minimum version as I mentioned above.

Issue 1 actually seems to be a bug in the stats code -- n4 crashed with this error (only a few frames here for clarity):

E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506  a panic has occurred!
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +runtime error: invalid memory address or nil pointer dereference
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +(1) attached stack trace
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  -- stack trace:
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | runtime.gopanic
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | 	GOROOT/src/runtime/panic.go:884
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | runtime.panicmem
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | 	GOROOT/src/runtime/panic.go:260
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | runtime.sigpanic
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | 	GOROOT/src/runtime/signal_unix.go:835
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | github.com/cockroachdb/cockroach/pkg/server.activityTablesHaveFullData
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | 	github.com/cockroachdb/cockroach/pkg/server/combined_statement_stats.go:232
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | github.com/cockroachdb/cockroach/pkg/server.getCombinedStatementStats
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | 	github.com/cockroachdb/cockroach/pkg/server/combined_statement_stats.go:100
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | github.com/cockroachdb/cockroach/pkg/server.(*statusServer).CombinedStatementStats
E240430 07:54:10.992587 21115 1@util/log/logcrash/crash_reporting.go:188 ⋮ [-] 506 +  | 	github.com/cockroachdb/cockroach/pkg/server/combined_statement_stats.go:65

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, the error message could be more explicit (and it could be if the monitor had noticed the process death soon enough, but the client saw the EOF first, and that was the error reported). A client seeing an EOF from a connection often means that the server crashed, FWIW.

Running the test locally passed so these failures seemed sporadic to me.

As a bit of context, there's a lot of randomization happening in these tests so passing locally doesn't mean that test failures are flakes or spurious necessarily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, thanks, all of that was super helpful. I ended up setting the min supported version as 23.1.8 - the node crash is from a bug prior to that version (my bad for not checking n4 logs).

There is a bug in the sql stats api prior to 23.1.8, where
passing a requested start time of 0 or nil results in a
nil pointer dereference.

Setting this min supported version  also resolves some mixed
version authentication limitaitons.

Epic: none
Fixes: cockroachdb#123278

Release note: None
@xinhaoz xinhaoz changed the title roachtest: add retry to sql-stats/mixed-version http req roachtest: set 23.1.8 as MinSupportedVersion for sql-stats/mixed-version May 1, 2024
@xinhaoz xinhaoz requested a review from renatolabs May 1, 2024 18:38
@xinhaoz
Copy link
Member Author

xinhaoz commented May 1, 2024

TFTR!
bors r+

@craig craig bot merged commit e7e5ea1 into cockroachdb:master May 1, 2024
21 of 22 checks passed
@xinhaoz xinhaoz deleted the fix-roachtest branch May 7, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: sql-stats/mixed-version failed
4 participants