Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate sql-o11y endpoints to be tenant-scoped #89429

Closed
THardy98 opened this issue Oct 5, 2022 · 1 comment
Closed

Migrate sql-o11y endpoints to be tenant-scoped #89429

THardy98 opened this issue Oct 5, 2022 · 1 comment
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker

Comments

@THardy98
Copy link

THardy98 commented Oct 5, 2022

This is an issue tracking the status of migrating sql-o11y endpoints that need work to become tenant-scoped:

ENDPOINTS

  • Databases: /_admin/v1/databases
  • Database Details: /_admin/v1/databases/{database}
  • Table Details: /_admin/v1/databases/{database}/tables/{table}
  • Table Stats: /_admin/v1/databases/{database}/tables/{table}/stats
  • Events: /_admin/v1/events
  • StatementDiagnosticsRequest: /_status/stmtdiagreports (GET)
  • CreateStatementDiagnosticsRequest: /_status/stmtdiagreports (POST)
  • CancelStatementDiagnosticsRequest: /_status/stmtdiagreports/cancel

CORRESPONDING V2 ENDPOINTS
Note that the /api/v2 endpoints use the same logic as the as their corresponding v1 endpoints above. Work done to make the v1 endpoint tenant-scoped will correspondingly make its v2 endpoint tenant-scoped.

Databases
/api/v2/databases/
/api/v2/databases/{database}/
/api/v2/databases/{database}/grants/
/api/v2/databases/{database}/tables/
/api/v2/databases/{database}/tables/{table}/

Events
/api/v2/events


TRACKING ISSUES
server: the TableDetails, TableStats, DatabaseDetails API don't decode with a proper tenant prefix:
#82879

server: tenant span stats for TableStats, TableDetails & DatabaseDetails endpoints
#90267

server: APIv2 HTTP interface missing for SQL-only servers (blocks sql-over-http endpoints)
#80789

server: make Databases endpoint tenant-scoped
#90257

server: make the DatabaseDetails endpoint tenant-scoped (has blockers)
#90261

server: make TableDetails endpoint tenant-scoped (has blockers)
#90264

server: make TableStats endpoint tenant-scoped (has blockers)
#90268

server: make the Events endpoint tenant-scoped
#90272

server: make the StatementDiagnosticsRequests endpoint tenant-scoped
#90273

server: make CreateStatementDiagnosticsReport tenant-scoped
#90274

server: make CancelStatementDiagnosticsReport tenant-scoped
#90275

Epic CRDB-16704

@THardy98 THardy98 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 5, 2022
@THardy98 THardy98 changed the title Migrate sql-o11y endpoints to use sql-over-http Migrate sql-o11y endpoints to be tenant-scoped Oct 18, 2022
@THardy98 THardy98 self-assigned this Oct 18, 2022
craig bot pushed a commit that referenced this issue Nov 10, 2022
91081: ui: migrate statement diagnostics requests to use sql-over-http endpoint r=THardy98 a=THardy98

Part of: #89429

Addresses: #90273, #90274, #90275

Blocked from resolving by #80789

DB-Console Demo: https://www.loom.com/share/d74c02eb3dfb41fc913577d0e8992441
CC-Console Demo: https://www.loom.com/share/2adba8a53b8c48ac9f031e4481f284cf

This change migrates the existing statement diagnostics requests to use the sql-over-http endpoint on our apiV2, making these requests tenant-scoped once the sql-over-http endpoint is scoped to tenants (this should be the case when #91323 is completed).

91534: lint: remove Golint r=yuzefovich a=yuzefovich

This commit removes `TestGolint` linter as well as all mentions of `golint` I could find. The rationale is that
- the linter is no longer maintained
- is partially duplicated with the `staticcheck`
- due to a recent change in `gcexportdata.Find`, if we bump some dependencies, it takes on the order of an hour when run via bazel, and we'd need to fork `golint` to go around that.

Epic: None

Release note: None

91626: kvstreamer: improve the behavior with very low budget r=yuzefovich a=yuzefovich

This commit makes it so that the streamer tries a bit harder to not hit its low budget limit in `Enqueue`. Previously, due to some of the internal state being reused across different `Enqueue` batches (in particular, the truncation helper as well as some things in the results buffer), the streamer's budget might be insufficient to account for the newly-enqueued requests, and we would error out. This commit makes it so that - if the budget limit is reached - we first discard the overhead (freeing up some budget) and then try to consume the requests' footprint again. This shouldn't be a big deal with reasonably large workmem limits, but we have seen some problems when it is on the order of 100KiB or less.

Fixes: #91587.

Release note: None

91662: multi-tenant: Bump uninitialized storage cluster version r=knz a=ajstorm

Previously, in cases where the version value in system.tenant_settings was uninitialized, we assumed that that the version was 22.1. With the change of the build tag to 23.1, we need to adjust the previous version to 22.1.76. This will need one more bump when we finally mint the 22.2 version number in master.

Resolves: #91589

Release note: None

Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Adam Storm <[email protected]>
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Nov 11, 2022
Part of: cockroachdb#89429

Addresses: cockroachdb#90272 (blocked from resolving by cockroachdb#80789)

This change migrates the existing `/events` request to use the
sql-over-http endpoint on apiV2, making this request tenant-scoped once
the sql-over-http endpoint is scoped to tenants (this should be the case
when cockroachdb#91323 is completed).

Release note: None
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Nov 15, 2022
Part of: cockroachdb#89429

Addresses: cockroachdb#90272 (blocked from resolving by cockroachdb#80789)

This change migrates the existing `/events` request to use the
sql-over-http endpoint on apiV2, making this request tenant-scoped once
the sql-over-http endpoint is scoped to tenants (this should be the case
when cockroachdb#91323 is completed).

Release note: None
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Nov 21, 2022
Part of: cockroachdb#89429

Addresses: cockroachdb#90272 (blocked from resolving by cockroachdb#80789)

This change migrates the existing `/events` request to use the
sql-over-http endpoint on apiV2, making this request tenant-scoped once
the sql-over-http endpoint is scoped to tenants (this should be the case
when cockroachdb#91323 is completed).

Release note: None
craig bot pushed a commit that referenced this issue Nov 21, 2022
91745: ui: move events endpoint to use sql-over-http r=THardy98 a=THardy98

Part of: #89429

Addresses: #90272 (blocked from resolving by #80789)

Demo (DB-Console only): https://www.loom.com/share/bb67044a8560407fb703af2a6bbb2d9a

This change migrates the existing `/events` request to use the sql-over-http endpoint on apiV2, making this request tenant-scoped once the sql-over-http endpoint is scoped to tenants (this should be the case when #91323 is completed).

Release note: None

Co-authored-by: Thomas Hardy <[email protected]>
@blathers-crl
Copy link

blathers-crl bot commented Feb 27, 2023

Hi @maryliag, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@maryliag maryliag added the branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 label Feb 27, 2023
dhartunian added a commit to dhartunian/cockroach that referenced this issue Feb 28, 2023
Replaced usages of `TODOSQLCodec` with the codec from `sqlServer.execCfg`. This
enables the DB and Table stats endpoints to work from tenants.

Resolves: cockroachdb#82879

Relates to: cockroachdb#90261, cockroachdb#90267, cockroachdb#90268, cockroachdb#90264, cockroachdb#89429

Epic: CRDB-12100

Release note: None
craig bot pushed a commit that referenced this issue Mar 1, 2023
95141: storage: Add support for TargetBytes for EndTxn r=nvanbenschoten a=KaiSun314

Fixes: #77228

Intent resolution batches are sequenced on raft and each batch can
consist of 100-200 intents. If an intent key or even value in some cases
are large, it is possible that resolving all intents in the batch would
result in a raft command size exceeding the max raft command size
kv.raft.command.max_size.

To address this, we add support for TargetBytes in resolve intent and
resolve intent range commands, allowing us to stop resolving intents in
the batch as soon as we exceed the TargetBytes max bytes limit.

This PR adds byte pagination for synchronous intent resolution (i.e.
EndTxn / End Transaction).

Release note: None

97511: status: set codec from context in table stats requests r=knz a=dhartunian

Replaced usages of `TODOSQLCodec` with the codec from `sqlServer.execCfg`. This enables the DB and Table stats endpoints to work from tenants.

Resolves: #82879

Relates to: #90261, #90267, #90268, #90264, #89429

Epic: CRDB-12100

Release note: None

97657: sql,mon: expose the memory monitors as virtual table r=yuzefovich a=yuzefovich

This commit adjusts our memory monitors to be able to traverse the whole
monitor tree starting at the root monitor. In particular, this commit
introduces a doubly-linked list of "siblings" and stores the reference
to the head in the parent. Whenever a monitor is `Start`ed, it is
included as the new head of its parent's children list, whenever a monitor
is `Stop`ped, it is removed from that list. The overhead of this
additional tracking should be negligible since only the parent's lock
needs to be acquired twice throughout the lifetime of a monitor (thus,
assuming relatevily long-lived sessions, this wouldn't affect the root
monitor) and the increase in allocations is minor.

This required clarification on how locks on a parent and a child can be
held at the same time. In particular, since the main code path is
acquiring locks "upwards" (meaning when growing the child's budget we
might need to grow the parent's budget, and "growing" locks the
corresponding monitor), whenever we want to traverse the tree from the
root down, we have to unlock the parent's monitor before recursing into
the children. As a result, the traversal might give us an inconsistent
view (where a recently stopped child can contribute to the usage of the
parent while we don't recurse into that child). This seems acceptable.

This ability to traverse the whole monitor tree is now exposed as a new
virtual table `crdb_internal.node_memory_monitors` which includes a line
for each monitor active at the time of table generation (subject to
possible inconsistency mentioned above). The table includes the name of
the monitors which can be suggestive about the activity on the cluster,
thus, access to this table is gated on the "view activity" permissions.
The usage of the virtual table to expose the memory monitors information
results in flattening of the tree; however, one of the fields is
a "level" (or "generation") in relation to the root, plus the ordering
of rows is very specific, so we can still format the output to see the
hierarchy. We also assign IDs to the monitors (which is their pointer
address). Exposing this information as a virtual table allows us to use
SQL to analyze it.

Here is one example of visualizing it:
```
[email protected]:26257/defaultdb> SELECT repeat('    ', level) || name || ' ' || crdb_internal.humanize_bytes(used) FROM crdb_internal.node_memory_monitors;
                             ?column?
-------------------------------------------------------------------
  root 0 B
      internal-planner.‹root›.‹resume-job-101› 0 B
      internal-planner.‹node›.‹resume-job-100› 0 B
      internal-planner.‹node›.‹resume-job-842810460057567233› 0 B
      sql 900 KiB
          session root 20 KiB
              txn 10 KiB
                  flow e595eb80 10 KiB
              session 0 B
              txn-fingerprint-id-cache 0 B
          internal SQL executor 0 B
          internal SQL executor 0 B
          internal sql executor 0 B
          conn 105 KiB
          internal SQL executor 70 KiB
          internal SQL executor 60 KiB
          SQLStats 540 KiB
          SQLStats 0 B
      distsql 0 B
      server-cache-mon 0 B
      bulk-mon 0 B
          backup-mon 0 B
          backfill-mon 0 B
      pre-conn 105 KiB
      closed-session-cache 190 KiB
      timeseries-results 0 B
      timeseries-workers 0 B
      kv-mem 20 KiB
          rangefeed-monitor 0 B
          rangefeed-system-monitor 0 B
(30 rows)
```

There are a couple of additional minor improvements:
- we now include the short FlowID into the flow's memory monitor name.
Combined with the distsql_flows virtual table we'll be able to get the
stmt fingerprint for the remote flows running on a node.
- new `crdb_internal.humanize_bytes` builtin function is introduced.

Note that the corresponding `cluster_memory_monitors` virtual table is
not introduced out of caution. In particular, this would lead to RPCs
issued to all nodes in the cluster, and since each node can have on the
order of hundreds of thousands monitors, the response to each RPC could
have non-trivial network cost. We can revisit this decision later if we
find that a cluster level view of the memory monitors is desirable, but
for now a node level view seems like a big improvement on its own.

Addresses: #35097.
Fixes: #90551.

Release note (sql change): New internal virtual table
`crdb_internal.memory_monitors` is introduced. It exposes all of the
current reservations with the memory accounting system on a single node.
Access to the table requires VIEWACTIVITY or VIEWACTIVITYREDACTED
permissions.

97853: builtins: fix crdb_internal.hide_sql_constants array overload r=xinhaoz a=xinhaoz

Previously, erroring on parsing a stmt provided in one of the array elements to crdb_internal.hide_sql_constants would result in an error. This commit ensures that the empty string is returned for an unparseable stmt.

Epic: none

Release note: None

Co-authored-by: Kai Sun <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
craig bot pushed a commit that referenced this issue Mar 15, 2023
93209: cluster-ui: move database details endpoint to use sql-over-http r=THardy98 a=THardy98

Part of: #89429
Addresses: #90261

This change moves the existing database details endpoint to use sql-over-http.

The data for database details is populated using 7 queries, which fetch
the database's:
- ID
- grants
- tables
- replicas and regions
- index usage statistics
- zone configuration
- span statistics

This PR also changes the structure of the database details response.
Each field within the overlying database details response encapsulates
the response of one of the queries. Query failures are captured at the
top-level of the response and within the corresponding response field
for the query (scoping query failures to each transaction).

Additionally, we no longer fetch `missing_tables` as we are no longer fetching table stats per table, but in a single query.

**DEMO**
https://www.loom.com/share/3a51bf5a8f3c41089fc5acae9ff29fcf

Release note: None

98651: execinfra: explicitly pass the output RowReceiver to Processor.Run r=yuzefovich a=yuzefovich

**colflow: simplify the vectorized flow creator a bit**

This commit refactors the vectorized flow creator to directly embed two
structs that implement interfaces used by the creator. This allows for
those structs to not be allocated on the main code path. Also,
`flowCreatorHelper` no longer implements the `Releasable` interface.

Release note: None

**execinfra: explicitly pass the output RowReceiver to Processor.Run**

This commit modifies how the output of a processor is plumbed through.
Previously, we would pass it on the processor construction and store in
the `ProcessorBaseNoHelper` to later be used when calling
`execinfra.Run` method. However, the upcoming work on multiple active
portals will require updating the output when resuming the flow, so this
commit makes it so that we now pass the output explicitly into
`execinfra.Processor.Run` method. This will make the work on multiple
active portals cleaner, but also this change seems like a good one
independent of that because we're able to store all outputs in the
single place (`FlowBase.outputs`).

Epic: CRDB-17622

Release note: None

98678: cloudccl/gcp: use correct testing.T in subtests r=dt a=dt

Release note: none.
Epic: none.

Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: David Taylor <[email protected]>
craig bot pushed a commit that referenced this issue Mar 19, 2023
98326: cluster-ui: move table details endpoint to use sql-over-http r=maryliag a=THardy98

Part of: #89429
Addresses: #90264, #90268

This PR migrates our existing admin API table details endpoint to
use sql-over-http in cluster-ui. We change the structure of the table
details response, allowing each field to be a self-contained query
response. The exception here is the `stats` field, which is comprised of
a couple query responses. Changing the structure scopes errors at the
query-level, instead of the response level. This is important as it
allows us to still make use of partial data on the UI when only a subset
of the queries succeed.

The commit also coalesces the former table stats and table details API
into a single table details API. The only difference/movement here is
that the table details endpoint now fetches span statistics.

The data for table details is populated using 10 queries, which fetch
the table's:
- ID
- grants
- schema details (columns & indexes)
- create statement
- zone configuration statement
- zone configuration
- heuristics details (specifically, the timestamp at which table
  heuristics were last updated)
- span statistics
- index usage statistics
- replicas

This largely reflects like for like how we fetch data from the existing
admin api endpoint. We can improve upon this (i.e. remove bloating) in
follow up PRs.

**Short Demo**
https://www.loom.com/share/698c10c5752e439a9edb0ec2301771c1

Release note: None

98977: sql/parser: fix the numeric tenant_spec grammar r=cucaroach a=knz

Informs #95908.
Epic: CRDB-23559
Release note: None

Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker
Projects
None yet
Development

No branches or pull requests

2 participants