Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers can view all distinct plans in an Explain plan tab including plan details #72129

Closed
kevin-v-ngo opened this issue Oct 29, 2021 · 0 comments · Fixed by #77632
Closed
Assignees
Labels
A-sql-console-general SQL Observability issues on the DB console spanning multiple areas. Includes Cockroach Cloud Console C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented Oct 29, 2021

This issue builds on top of the following issue by providing a guided console UX for statement plan troubleshooting.

Figma mockups: https://www.figma.com/file/xdmwvnFQd6KkO9RJ0XLDH0/22.1_SQL-obsrv_query-performance?node-id=3488%3A22028

Epic: CRDB-6564

Jira issue: CRDB-10959

@kevin-v-ngo kevin-v-ngo added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-console-general SQL Observability issues on the DB console spanning multiple areas. Includes Cockroach Cloud Console labels Oct 29, 2021
@maryliag maryliag self-assigned this Dec 21, 2021
maryliag added a commit to maryliag/cockroach that referenced this issue Jan 31, 2022
Previosuly, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 1, 2022
Previosuly, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 1, 2022
Previosuly, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 3, 2022
Previosuly, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 4, 2022
Previously, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.

Partially addresses cockroachdb#72129

The new query to collect statement now ignores plan hash and
the aggregation ts as keys. So this commits also removes the
aggreagted ts value on the UI, since it no longer applies.

Fixes cockroachdb#74513

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats.

Release note (ui change): We don't show information about
aggregations ts in Statements overview and Details pages, since
now all the statement fingerprints are grouped inside the same
time selection.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 4, 2022
Previously, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.
The combined statements groups the value no using the
plan hash as a key, and creates a list of gist executed by
the same fingerprint id.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats inside Statistics column.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 4, 2022
Previously, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.
The combined statements groups the value no using the
plan hash as a key, and creates a list of gist executed by
the same fingerprint id.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats inside Statistics column.
maryliag added a commit to maryliag/cockroach that referenced this issue Feb 7, 2022
Previously, we didn't have the plan hash/gist values, so
a dummy value was being used instead. Now that we have the value,
this commit uses those values to be corrected stored.
The Plan Hash is saved on its own column and is part of a
statement key. A plan gist is a string saved in the metadata
and can later on converted back into a logical plan.
The combined statements groups the value no using the
plan hash as a key, and creates a list of gist executed by
the same fingerprint id.

Partially addresses cockroachdb#72129

Release note (sql change): Saving plan hash/gist to the Statements
persisted stats inside Statistics column.
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 7, 2022
Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change

Release justification:
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 7, 2022
Previously, the Statement Details page was being populated
by the same data displayed on Statement Table.
With a new endpoint `_status/stmtdetails/{fingerprint_id}`
we can collect more information about the statement.
This first commit changes the source of information of
the Statement Details page to the value retrieved from
the new endpoint. No changes are noticeable from the user's
perspective.
Following commits will add the new information based on the
extra information retrieved by the endpoint.

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change

Release justification:
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 7, 2022
Previously, the Statement Details page was being populated
by the same data displayed on Statement Table.
With a new endpoint `_status/stmtdetails/{fingerprint_id}`
we can collect more information about the statement.
This first commit changes the source of information of
the Statement Details page to the value retrieved from
the new endpoint. No changes are noticeable from the user's
perspective.
Following commits will add the new information based on the
extra information retrieved by the endpoint.

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change

Release justification:
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 8, 2022
Previously, the Statement Details page was being populated
by the same data displayed on Statement Table.
With a new endpoint `_status/stmtdetails/{fingerprint_id}`
we can collect more information about the statement.
This first commit changes the source of information of
the Statement Details page to the value retrieved from
the new endpoint. No changes are noticeable from the user's
perspective.
Following commits will add the new information based on the
extra information retrieved by the endpoint.

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change

Release justification:
craig bot pushed a commit that referenced this issue Mar 9, 2022
76671: importccl: test EXPORT PARQUET on randomly generated tables r=stevendanna a=msbutler

Release note: None

Release justification: bug fixes and low-risk updates to new functionality

76930: sql: set hash-sharded index column type to INT8 r=mgartner a=mgartner

Shard columns of hash-sharded indexes are computed columns with an
expression in the form `mod(fnv32(crdb_internal.datums_to_bytes(...)))`.
The `mod()` builtin returns a value of type `INT8`, while the shard
column's type was previously `INT4`. Because these types did not match,
`mod` expressions were wrapped in the assignment cast function,
`crdb_internal.assignment_cast`, in mutation query plans.

This commit changes the type of newly created shard columns to `INT8`,
eliminating the need for an assignment cast. Because all integers are
encoded as varints in keys and values, this will not increase the amount
of space on disk required for these columns.

Release justification: This is a minor change to hash-sharded indexes,
which are a newly un-experimentalized feature in the upcoming release.

Release note (sql change): The type of shard columns created for
hash-sharded indexes have changed from `INT4` to `INT8`. This should
have no effect on behavior or performance.

77215: ui: Statement Details page using new details endpoint r=maryliag a=maryliag

Previously, the Statement Details page was being populated
by the value saved on cache, where we would filter the
existing statement and retrieve the correct one.
With a new endpoint `_status/stmtdetails/{fingerprint_id}`
we can collect more information about the statement.
This first commit changes the source of information of
the Statement Details page to the value retrieved from
the new endpoint. No changes are noticeable from the user's
perspective.
Following commits will add the new information based on the
extra information retrieved by the endpoint.

Partially addresses #72129

NoteFor Reviewers: Cluster-ui selector are still being addressed

Release note: None
Release Justification: Low risk, high benefit change

77372: jobs: Set a timeout when executing schedules. r=miretskiy a=miretskiy

Improve jobs system resilience to misbehaving schedules, which take
unreasonable time to execute.

Job scheduler now sets a timeout when executing each schedule.
Scheduled jobs must ensure that they complete their execution within
specified timeout.  This does not imply that you cannot have long running
scheduled jobs.

Failed execution due to timeout are handled based on schedule policy.
The timeout is controlled via the `jobs.scheduler.schedule_execution.timeout` setting,
and can be disabled by setting timeout value to 0.

Release Notes: Improve jobs system resilience to scheduled jobs that
may lock up jobs/scheduled job table for long periods of time.  Each schedule
now has a limited amount of time to complete its execution.  The timeout
is controlled via  `jobs.scheduler.schedule_execution.timeout` setting.

Release Justification: System stability improvement.

77502: sql/contention/txnidcache: avoid recording nil txnID r=maryliag a=Azhng

Resolves #77371

Previously, there are edge cases where the txnID in the resolvedTxnID
emitted from connExecutor will be nil, (e.g. when the connExecutor closes
while a transaction is still active). Writing invalid txnID into the
writer can potentially cause data loss, since the TxnID cache stops
processing the input batch when it encounters the first invalid txnID.
This commit prevents invalid txnID from being written into TxnID Cache.

Release justification: bug fix to existing functionality
Release note: None

77507: sql: do no collect multi-column stats for indexes with only virtual columns r=mgartner a=mgartner

Fixes #76867

Release justification: This is a low-risk fix for a minor bug.

Release note (bug fix): A bug has been fixed that caused errors when
attempting to create table statistics (with CREATE STATISTICS or
ANALYZE) for a table containing an index which indexed only virtual
computed columns. This bug has been present since version 21.1.0.

Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Azhng <[email protected]>
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 9, 2022
Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 9, 2022
Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 9, 2022
Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses cockroachdb#72129

Release note: None
Release Justification: Low risk, high benefit change
craig bot pushed a commit that referenced this issue Mar 10, 2022
72991: server,sql: implement connection_wait for graceful draining r=ZhouXing19 a=ZhouXing19

Currently, the draining process is consist of three consecutive periods:

1. Server enters the "unready" state: The `/health?ready=1` http endpoint starts to show that the node is shutting down, but new SQL connections and new queries are still allowed. The server does a hard wait till the timeout. This phrase's duration is set with cluster setting `server.shutdown.drain_wait`.

2. Drain SQL connections: New SQL connections are not allowed. SQL Connections with no queries in flight will be closed by the server immediately. The rest of these SQL connections will be terminated by the server as soon as their queries are finished. Early exit if all queries are finished. This phrase's maximum duration is set with cluster setting `server.shutdown.query_wait`.

3. Drain range lease: the server keeps retrying forever until all range leases on this draining node have been transferred. Each retry iteration's duration is specified by the cluster setting `server.shutdown.lease_transfer_timeout`.

This commit reorganizes the draining process by adding a phrase where the server waits SQL connections to be closed, and once all SQL connections are closed before timeout, the server proceeds to the next draining phase.

The newly proposed draining process is:

1. (unchanged) Server enters the "unready" state: The `/health?ready=1` http endpoint starts to show that the node is shutting down, but new SQL connections and new queries are still allowed. The server does a hard wait till the timeout. This phrase's duration is set with cluster setting `server.shutdown.drain_wait`.

2. (new phase) Wait SQL connections to be closed: New SQL connections are not allowed now. The server waits for the remaining SQL connections to be closed or timeout. Once all SQL connections are closed, the draining proceed to the next phase. The maximum duration of this phase is determined by the cluster setting `server.shutdown.connection_wait`.

3. (unchanged) Drain SQL connections: New SQL connections are not allowed. SQL Connections with no queries in flight will be closed by the server immediately. The rest of these SQL connections will be terminated by the server as soon as their queries are finished. Early exit if all queries are finished. This phrase's maximum duration is set with cluster setting `server.shutdown.query_wait`.

4. (unchanged) Drain range lease: the server keeps retrying forever until all range leases on this draining node have been transferred. Each retry iteration's duration is specified by the cluster setting `server.shutdown.lease_transfer_timeout`.

The duration of the new phase ("Wait SQL connections to close") can be set similarly to the other 3 existing draining phases:
```
SET CLUSTER SETTING server.shutdown.connection_wait = '40s'
```

Resolves #66319

Release note (ops change):  add `server.shutdown.connection_wait` to the
draining process configuration. This provides a workaround when customers
encountered intermittent blips and failed requests when they were performing
operations that are related to restarting nodes.

Release justification: Low risk, high benefit changes to existing functionality
(optimize the node draining process).

76430: [CRDB-9550] kv: adjust number of voters needed calculation when determining replication status r=Santamaura a=Santamaura

Currently, when a range has non-voting replicas and it is queried through replication
stats, it will be reported as underreplicated. This is because in the case where a
zone is configured to have non-voting replicas, for the over/under replicated counts,
we compare the number of current voters to the total number of replicas which is
erroneus. Instead, we will compare current number of voters to the total number of
voters if voters has been set and otherwise will defer to the total number of replicas.
This patch ignores the desired non-voters count for the purposes of this report, for
better or worse. Resolves #69335.

Release justification: low risk bug fix

Release note (bug fix): use total number of voters if set when determining replication
status

Before change:
![Screen Shot 2022-02-11 at 10 03 57 AM](https://user-images.githubusercontent.com/17861665/153615571-85163409-5bac-40f4-9669-20dce77185cf.png)

After change:
![Screen Shot 2022-02-11 at 9 53 04 AM](https://user-images.githubusercontent.com/17861665/153615316-785b156b-bd23-4cfa-a76d-7c9fa47fbf1e.png)

77315: backupccl: backup correctly tries reading in from base directory if l… r=DarrylWong a=DarrylWong

…atest/checkpoint files aren't found

Before, we only tried reading from the base directory if we caught a ErrFileDoesNotExist error. However
this does not account for the potential error thrown when the progress/latest directories don't exist.
This changes it so we now correctly retry reading from the base directory.

We also put the latest directory inside of a metadata directory, in order to avoid any potential
conflicts with there being a latest file and latest directory in the same base directory.

Also wraps errors in findLatestFile and readLatestCheckpointFile for more clarity when both base and
latest/progress directories fail to read.

Fixes #77312

Release justification: Low risk bug fix
Release note: none

77406: backupccl: test ignore ProtectionPolicy for exclude_data_from_backup r=dt a=adityamaru

This change adds an end to end test to ensure that a table excluded
from backup will not holdup GC on its replica even in the presence
of a protected timestamp record covering the replica

From a users point of view, this allows them to mark a table whose
row data will be excluded from backup, and to set that tables gc.ttl
to a very low value. Backups that write PTS records will no longer
holdup GC on such low GC TTL tables.

Fixes: #73536

Release note: None

Release justification: low risk update to new functionality

77450: ui: add selected period as part of cached key r=maryliag a=maryliag

Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses #72129

Release note: None
Release Justification: Low risk, high benefit change

77597: kv: Add `created` column to `active_range_feeds` table. r=miretskiy a=miretskiy

Add `created` column to `active_range_feeds` table.
This column is initialized to the time when the partial range feed
was created.  This allows us to determine, among other things,
whether or not the rangefeed is currently performing a catchup scan
(i.e. it's resolved column is 0), and how long the scan has been running
for.

Release Notes (enterprise): Add created time column
to `crdb_internal.active_range_feeds` virtual table to improve observability
and debugability of rangefeed system.

Fixes #77581

Release Justification: Low impact observability/debugability improvement.

Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Santamaura <[email protected]>
Co-authored-by: Darryl <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 10, 2022
Previously, the Explain Plan tab on Statement Details was
showing only one plan. This commit introduces a table of plan
with their respective executions stats.
When a plan is clicked on the table, it shows the Plan and
its statistics.

Fixes cockroachdb#72129

Release justification: Category 4
Release note (ui change): Explain Plan tab on Statement Details
shows statistics for all the plans executed by the selected statement
on the selected period.
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 10, 2022
Previously, the Explain Plan tab on Statement Details was
showing only one plan. This commit introduces a table of plan
with their respective executions stats.
When a plan is clicked on the table, it shows the Plan and
its statistics.

Fixes cockroachdb#72129

Release justification: Category 4
Release note (ui change): Explain Plan tab on Statement Details
shows statistics for all the plans executed by the selected statement
on the selected period.
maryliag added a commit to maryliag/cockroach that referenced this issue Mar 11, 2022
Previously, the Explain Plan tab on Statement Details was
showing only one plan. This commit introduces a table of plan
with their respective executions stats.
When a plan is clicked on the table, it shows the Plan and
its statistics.

Fixes cockroachdb#72129

Release justification: Category 4
Release note (ui change): Explain Plan tab on Statement Details
shows statistics for all the plans executed by the selected statement
on the selected period.
craig bot pushed a commit that referenced this issue Mar 11, 2022
77349: spanconfig: add Record constructor and validation r=adityamaru a=adityamaru

This change adds a constructor method that returns
a new `spanconfig.Record` and conditionally performs
some validation if the target is a SystemTarget.

Informs: #73727

Release note: None

Release justification: low-risk updates to new functionality

77608: optbuilder: do not create invalid casts when building COALESCE and IF r=mgartner a=mgartner

The optbuilder no longer creates invalid casts when building COALESCE
and IF expressions that have children with different types. Expressions
that previously caused internal errors now result in user-facing errors.
Both UNION and CASE expressions had similar bugs that were recently
fixed in #75219 and #76193.

This commit also updates the `tree.ReType` function to return `ok=false`
if there is no valid cast to re-type the expression to the given type.
This forces callers to explicitly deal with situations where re-typing
is not possible and it ensures that the function never creates invalid
casts. This will make it easier to track down future related bugs
because internal errors should originate from the call site of
`tree.ReType` rather than from logic further along in the optimization
process (in the case of #76807 the internal error originated from the
logical props builder when it attempted to lookup the volatility of the
invalid cast).

This commit also adds special logic to make casts from any tuple type to
`types.AnyTuple` valid immutable, implicit casts. Evaluation of these
casts are no-ops. Users cannot construct these casts, but they are built
by optbuilder in some cases.

Fixes #76807

Release justification: This is a low-risk change that fixes a minor bug.

Release note (bug fix): A bug has been fixed that caused internal errors
when COALESCE and IF expressions had inner expressions with different
types that could not be cast to a common type.


77632: ui: new plan table on statement details r=maryliag a=maryliag

Previously, the Explain Plan tab on Statement Details was
showing only one plan. This commit introduces a table of plan
with their respective executions stats.
When a plan is clicked on the table, it shows the Plan and
its statistics.

Fixes #72129

Page on DB Console: [video](https://www.loom.com/share/0898f48021eb4037a6f86760053a5e85)
Page on CC Console: [video](https://www.loom.com/share/84d8ec40ae7e4eb19291788721ab7133)

<img width="1058" alt="Screen Shot 2022-03-10 at 2 42 25 PM" src="https://user-images.githubusercontent.com/1017486/157742210-12b79f5d-274c-48e6-8fb6-dafc74fd25b3.png">
<img width="988" alt="Screen Shot 2022-03-10 at 2 42 36 PM" src="https://user-images.githubusercontent.com/1017486/157742211-daa2a07c-b025-4b36-a49a-4cafe7117bc8.png">



Release justification: Category 4
Release note (ui change): Explain Plan tab on Statement Details
shows statistics for all the plans executed by the selected statement
on the selected period.

77688: ci: ensure all nightlies post github issues when tests fail r=rail a=rickystewart

Release justification: ensure failing nightlies post github issues
Release note: None

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in a81c820 Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-console-general SQL Observability issues on the DB console spanning multiple areas. Includes Cockroach Cloud Console C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants