Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: TestStatusAPICombinedStatements is flaky #69533

Closed
Azhng opened this issue Aug 28, 2021 · 1 comment · Fixed by #69535
Closed

sql: TestStatusAPICombinedStatements is flaky #69533

Azhng opened this issue Aug 28, 2021 · 1 comment · Fixed by #69535
Assignees
Labels
A-sql-observability Related to observability of the SQL layer C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@Azhng
Copy link
Contributor

Azhng commented Aug 28, 2021

It's happening on master. https://teamcity.cockroachdb.com/viewLog.html?buildId=3376012&tab=buildResultsDiv&buildTypeId=Cockroach_UnitTests_Test

My guess is it probably come from here. It's likely a time-related race condition.

In-memory stats gets assigned aggregated_ts when it's returned from the iterator API. The timestamp for in-memory is generated by truncating the current timestamp to nearest hour.

So concretely, suppose we insert stats at 15:43, it will have aggregated_ts = 15:00. If we issue a read with read timestamp at 16:00, and the in-memory stats is not yet flushed to disk, the in-memory stats with be returned with aggregated_ts = 16:00. Hence the test failure.

Solution here would be use the existing testing knobs in SQL stats to stub out the call to timeutil.Now() function to make sure the test is 100% deterministic.

@Azhng Azhng added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-observability Related to observability of the SQL layer T-sql-observability labels Aug 28, 2021
@Azhng
Copy link
Contributor Author

Azhng commented Aug 28, 2021

This does not cause any correctness issue in production. This only causes our test to be non deterministic and flaky

craig bot pushed a commit that referenced this issue Aug 28, 2021
68831: ui: add date range selector in stmts and txns pages r=xinhaoz a=xinhaoz

Resolves #68089, #69474

This commit adds date range state
properties and a date range selector component to the DB
console's statements and transactions pages. This change
is necessary to support surfacing persisted
statistics in the DB console.

This commit adds date range state
properties and a date range selector component to the DB
console's statements and transactions pages. This change
is necessary to support surfacing persisted
statistics in the DB console.

The date range query parameters used by the api to fetch
data is stored in localSettings on db console and
localStorage in cluster-ui. The default date range
is set to 1 hour ago, and is used as the value when user's
reset the date range.

The user is able to select dates as far back as 1 year ago,
this does not mean there is data available.

Release justification: category 4 low risk, high benefit
changes to existing functionality

Release note (ui change): New date range selector component
to the DB console's statements page with the ability to show
historical data. The default date range is set to 1 hour ago,
and is used as the value when user's reset the date range.

------------------
![image](https://user-images.githubusercontent.com/20136951/131007768-9a7bfeb6-a0f0-417a-8160-51f8d636f73b.png)
![image](https://user-images.githubusercontent.com/20136951/131007809-ff98359e-bfcc-4cb1-9b0b-eb34f8d04cd0.png)
![image](https://user-images.githubusercontent.com/20136951/131036176-11b03aa6-16e5-4601-b92e-e13dd0dc03df.png)
![image](https://user-images.githubusercontent.com/20136951/131198830-6b727e68-0983-483e-9385-f23e16cde705.png)

Transactions page:
![image](https://user-images.githubusercontent.com/20136951/131222215-bb2b013c-034e-40bf-8d83-7f5d9784065a.png)


69375: tree: redact name in DEALLOCATE in FmtHideConstants r=ajwerner,maryliag a=rafiss

This will make it so that all DEALLOCATE statements show up in one row
in the SQL Statements UI and in the internal stats data structures.

Recently, a workload was updated (accidentally) to start creating more
DEALLOCATE statements, and it made the UI unusable because there were so
many DEALLOCATE statement keys.

The second commit is a rename to stop implying that statements with
redacted constants are "anonymized" -- they still have names+identifiers.
This also renames the name of a field in a response payload.

Created while working on #69317 but doesn't fix it. 

Release justification: fix a high-priority bugs with a low-risk change.
Release note: None

69457: sql, import: add metrics for max_row_size guardrails r=rytaft,yuzefovich a=michae2

**sql: rename max_row_size guardrails to match transaction row limits**

Addresses: #67400

Rename sql.mutations.max_row_size.{log|err} to
sql.guardrails.max_row_size_{log|err} for consistency with
transaction_rows_{read|written}_{log|err} and upcoming metrics.

Release justification: Low-risk update to new functionality.

Release note (ops change): New variables
sql.mutations.max_row_size.{log|err} were renamed to
sql.guardrails.max_row_size_{log|err} for consistency with other
variables and metrics.

**sql/row: remove dependency on sql/execinfra**

The next commit needs to make sql/row a dependency of sql/execinfra, so
remove two small references to sql/execinfra.

(I will squash this into the next commit before merging.)

Release justification: Low-risk update to new functionality.

Release note: None

**sql, import: plumb RowMetrics from ExecutorConfig to row.Helper**

Addresses: #67400

Add metrics for sql/row and pass them down from ExecutorConfig and
FlowCtx all the way to row.Helper. Like sql.Metrics, there are two
copies, one for user queries and one for internal queries. (These were
originally part of sql.Metrics, but there are several users of sql/row
that do not operate under a sql.Server or connExecutor so we are forced
to add these to the ExecutorConfig and FlowCtx instead.)

I ran into difficulty passing these through import, as FlowCtx itself
is not plumbed through.

There are only two metrics at first, corresponding to
sql.guardrails.max_row_size_{log|err}.

Release justification: Low-risk update to new functionality.

Release note (ops): Added four new metrics,
sql.guardrails.max_row_size_{log|err}.count{.internal} which are
incremented whenever a large row violates
sql.guardrails.max_row_size_{log|err}.

69535: server: remove non-deterministic test r=maryliag a=maryliag

A new test was added but it would fail depending on the
time it was executed. This commit remove this flaky test
so in a future PR we can add new set of tests that make
more sense to test this functionality.

Fixes #69533

Release justification: Category 3 bug fix
Release note: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
@craig craig bot closed this as completed in e123d36 Aug 28, 2021
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Aug 30, 2021
Fixes: cockroachdb#69557

As mentioned in cockroachdb#69533, we have a race condition in tests
where we request statements stats with start=now. In these
tests we expect to see no results, but because in-memory
stats have the aggregated_ts field set on iterator return,
depending on the time the test is run we might see
results returned.

For example, suppose we insert stats at 15:45. If we then request
stats at 16:00, requesting only stats aggregated after or at the
current time, i.e. start=16:00, the aggregated_ts for in-memory
stats will be set to 16:00 and thus we will see results
returned.

To deflake these tests, we stub the aggregated_ts field to
a predetermined value.

Release justification: non-production code changes
Release note: None
craig bot pushed a commit that referenced this issue Aug 30, 2021
69591: sql: deflake TestStatusAPIStatements and TestStatusAPICombinedStatements r=xinhaoz a=xinhaoz

Fixes: #69557

As mentioned in #69533, we have a race condition in tests
where we request statements stats with start=now. In these
tests we expect to see no results, but because in-memory
stats have the aggregated_ts field set on iterator return,
depending on the time the test is run we might see
results returned.

For example, suppose we insert stats at 15:45. If we then request
stats at 16:00, requesting only stats aggregated after or at the
current time, i.e. start=16:00, the aggregated_ts for in-memory
stats will be set to 16:00 and thus we will see results
returned.

To deflake these tests, we stub the aggregated_ts field to
a predetermined value.

Release justification: non-production code changes
Release note: None

69592: sql: proper version gate sql stats r=maryliag,ajwerner a=Azhng

Previously, SQL Stats's implementation for version gating is faulty.
This means that SQL Stats's job monitor would attempt to start sql
stats compaction job in an incompatible cluster.
This commit fixed the faulty implementation.

Resolves #69459
Resolves #69544
Resolves #69565

Release justification: Category 2: Bug fixes and low-risk updates to
new functionality

Release note: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Azhng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-observability Related to observability of the SQL layer C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants