-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlstats: unskip tests hitting combinedstmts and stmts endpoints #109381
Conversation
33c7250
to
096560e
Compare
On the test that is failing can you add a custom message to the require to output the full respStatement? I think it will help understand what is causing the failure, and will be helpful in the future if the test fails. |
4427642
to
1eddec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)
pkg/server/application_api/sql_stats_test.go
line 513 at r1 (raw file):
Previously, j82w (Jake) wrote…
On the test that is failing can you add a custom message to the require to output the full respStatement? I think it will help understand what is causing the failure, and will be helpful in the future if the test fails.
Done. Full response is printed.
580abd7
to
96106c0
Compare
You could cast the value in the query |
array_agg(distinct transaction_fingerprint_id) AS txn_fingerprints, | ||
app_name, | ||
max(aggregated_ts) AS aggregated_ts, | ||
crdb_internal.merge_stats_metadata(array_agg(metadata)) AS merged_metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this query is for the activity table, it needs to use crdb_internal.merge_aggregated_stmt_metadata
because the metadata column in the activity table is already aggregated from the full stmts system table.
96106c0
to
fda4a9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)
pkg/server/combined_statement_stats.go
line 698 at r3 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
If this query is for the activity table, it needs to use
crdb_internal.merge_aggregated_stmt_metadata
because the metadata column in the activity table is already aggregated from the full stmts system table.
Good catch! I forgot to change the function call for this table when re-writing the query.
pkg/server/combined_statement_stats.go
line 801 at r3 (raw file):
Previously, j82w (Jake) wrote…
You could cast the value in the query
COALESCE(merged_metadata ->> 'distSQLCount'::INT,0) AS distSQLCount,
to avoid the parse logic and if the row doesn't have any value.
Makes sense, done.
What is the reason for converting this use a CTE? Does it impact the query plan? |
fda4a9e
to
c5fac81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumped additional timeout to 30s.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)
pkg/server/combined_statement_stats.go
line 654 at r4 (raw file):
Previously, j82w (Jake) wrote…
What is the reason for converting this use a CTE? Does it impact the query plan?
I used the CTE for readability. AFAIK, using a CTE doesn't impact the query plan since its result isn't stored but rather integrated into the main query. I've revised the query, eliminating the CTE and opting for a subquery, and made some minor formatting adjustments for consistency.
c5fac81
to
b62647f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes:
- Increased timeout for
TestStatusAPIStatements
to 30s when under stress. - Improved clarity by renaming variables and helper functions.
- Addressed transient 40001 error by filtering for statements associated with the test app name that also executed successfully (since statements which failed and then succeeded have different fingerprints). We also check that each executed statement is found in the response and if it is not, it's an error.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)
pkg/server/application_api/sql_stats_test.go
line 642 at r5 (raw file):
actualFailed := respStatement.Key.KeyData.Failed stmtJSONString := responseToJSON(respStatement)
I noticed that I'm currently marshalling to JSON (an expensive operation) irrespective of whether the expected data matches the actual data. Ideally, we should only marshal when there's a discrepancy between the expected and actual data. Outside of using a verbose if expectedData.fullScan != actualFullScan || expectedData.FullScan != ...
, I'm uncertain about alternative approaches. I believe the current method should be fine, but I thought it was worth highlighting.
b62647f
to
92b4279
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @gtr and @xinhaoz)
pkg/server/combined_statement_stats.go
line 798 at r5 (raw file):
query := string(tree.MustBeDString(row[7])) querySummary := string(tree.MustBeDString(row[8])) databases := string(tree.MustBeDString(row[9]))
Is this still formatted correctly? Not sure how a JSON string array is converted to a a string.
pkg/server/application_api/sql_stats_test.go
line 624 at r5 (raw file):
// statement statistics received from the server response. actualResponseStatsMap := make(map[string]serverpb.StatementsResponse_CollectedStatementStatistics) for _, respStatement := range resp.Statements {
nit: fix indents
pkg/server/application_api/sql_stats_test.go
line 626 at r5 (raw file):
for _, respStatement := range resp.Statements { // Only consider statements that were successfully executed by the test app. if respStatement.Key.KeyData.App == testAppName && !respStatement.Key.KeyData.Failed {
Please add a note of why the success check is necessary. That way if someone in the future is looking at the test they can see automatic retries are the reason.
pkg/server/application_api/sql_stats_test.go
line 633 at r5 (raw file):
for respQuery, expectedData := range expectedStatementStatsMap { respStatement, exists := actualResponseStatsMap[respQuery] if !exists {
nit: require.True
pkg/server/application_api/sql_stats_test.go
line 642 at r5 (raw file):
Previously, gtr (Gerardo Torres Castro) wrote…
I noticed that I'm currently marshalling to JSON (an expensive operation) irrespective of whether the expected data matches the actual data. Ideally, we should only marshal when there's a discrepancy between the expected and actual data. Outside of using a verbose
if expectedData.fullScan != actualFullScan || expectedData.FullScan != ...
, I'm uncertain about alternative approaches. I believe the current method should be fine, but I thought it was worth highlighting.
This is a test so deserializing a single JSON object shouldn't be an issue. If you wanted to avoid it you could use a func to make it lazy.
getStmtJsonString := func(){
return responseToJSON(respStatement)
}
Previously, j82w (Jake) wrote…
Ignore my func suggestion. The require doesn't take a func, so it will just get evaluated for each call instead of just once. |
92b4279
to
cbc28cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @xinhaoz)
pkg/server/combined_statement_stats.go
line 798 at r5 (raw file):
Previously, j82w (Jake) wrote…
Is this still formatted correctly? Not sure how a JSON string array is converted to a a string.
Fixed. Previously this was resulting in a value like ["db1", "db2"]
now it should just be db1,db2
which is what was returned before.
I was wondering if it was possible to do this string array to string conversion in the SQL query above so we could skip the JSON unmarshalling but I wasn't able to find a reliable query. Using ->
we get a JSON datum so I don't think we could have avoided that but let me know what you think.
pkg/server/application_api/sql_stats_test.go
line 624 at r5 (raw file):
Previously, j82w (Jake) wrote…
nit: fix indents
Done.
pkg/server/application_api/sql_stats_test.go
line 626 at r5 (raw file):
Previously, j82w (Jake) wrote…
Please add a note of why the success check is necessary. That way if someone in the future is looking at the test they can see automatic retries are the reason.
Done.
pkg/server/application_api/sql_stats_test.go
line 633 at r5 (raw file):
Previously, j82w (Jake) wrote…
nit: require.True
Done.
pkg/server/application_api/sql_stats_test.go
line 642 at r5 (raw file):
Previously, j82w (Jake) wrote…
Ignore my func suggestion. The require doesn't take a func, so it will just get evaluated for each call instead of just once.
Ah yes I also wanted to use the lazy func
cbc28cd
to
3088301
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @gtr, and @xinhaoz)
-- commits
line 12 at r7:
This fixes a bug customers have reported so there should be a release note.
pkg/server/application_api/sql_stats_test.go
line 510 at r7 (raw file):
additionalTimeout := 0 * time.Second if skip.Stress() { additionalTimeout = 30 * time.Second
Can there be a const for the stress timeout value? I noticed some tests seem to be using 30 and others 45 seconds.
3088301
to
ab025e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @j82w, and @xinhaoz)
Previously, j82w (Jake) wrote…
This fixes a bug customers have reported so there should be a release note.
Done.
pkg/server/combined_statement_stats.go
line 798 at r5 (raw file):
Previously, gtr (Gerardo Torres Castro) wrote…
Fixed. Previously this was resulting in a value like
["db1", "db2"]
now it should just bedb1,db2
which is what was returned before.I was wondering if it was possible to do this string array to string conversion in the SQL query above so we could skip the JSON unmarshalling but I wasn't able to find a reliable query. Using
->
we get a JSON datum so I don't think we could have avoided that but let me know what you think.
Following up on this: the following query worked for me:
defaultdb> SELECT string_agg(elem::text, ',') AS databases
FROM json_array_elements_text(('{"db": ["mo\"vr","tb2"], "db2": ["movr","tb2"]}'::jsonb)->'db') AS elem;
databases
-------------
mo"vr,tb2
(1 row)
The queries have been updated.
pkg/server/application_api/sql_stats_test.go
line 510 at r7 (raw file):
Previously, j82w (Jake) wrote…
Can there be a const for the stress timeout value? I noticed some tests seem to be using 30 and others 45 seconds.
Done. Also for that 45 second timeout, the test for which I did that times out under stress it seems to be very expensive so I skipped it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @gtr, and @xinhaoz)
Previously, gtr (Gerardo Torres Castro) wrote…
Done.
Looking at the PR it doesn't actually fix anything for users so the release note should be none like you originally had it. Can you revert it?
pkg/server/application_api/sql_stats_test.go
line 692 at r8 (raw file):
// Resource-intensive test, times out under stress. skip.UnderStressRace(t, "expensive tests")
Why is this still skipped for stress race?
Fixes: cockroachdb#109184. Previously, tests which hit the `combinedstmts` and `statements` endpoints were skipped under stress because they would occaisonally fail. This commit unskips these tests and instead of unmarshalling the metadata JSON blob, the select query directly extracts the values needed from it. The http client's timeout is also increased by 30s. Release note: None
ab025e7
to
27ed761
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @j82w, and @xinhaoz)
Previously, j82w (Jake) wrote…
Looking at the PR it doesn't actually fix anything for users so the release note should be none like you originally had it. Can you revert it?
Done.
pkg/server/application_api/sql_stats_test.go
line 692 at r8 (raw file):
Previously, j82w (Jake) wrote…
Why is this still skipped for stress race?
This specific test continues to time out regardless of any timeout extensions for the http client. A related test, TestStatusAPIStatementDetails
, has also been skipped and interestingly, it queries the stmtdetails
endpoint 10 times. This test queries the combinedStmts
endpoint 9 times. It seems the sheer number of endpoint hits may contribute to the timeouts, as other non-skipped tests only access it between 1-4 times.
I'm open to increasing the timeout for this test specifically to 45s but let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ericharmeling and @xinhaoz)
TFTR! |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 27ed761 to blathers/backport-release-22.2-109381: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from 27ed761 to blathers/backport-release-23.1-109381: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from 27ed761 to blathers/backport-release-23.1.9-rc-109381: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.9-rc failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Fixes: #109184.
Previously, tests which hit the
combinedstmts
andstatements
endpoints were skipped under stress because they would occaisonally
fail. This commit unskips these tests and instead of unmarshalling the
metadata JSON blob, the select query directly extracts the values needed
from it.
Release note: None