-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.1: storage: skip TestPebbleMetricEventListener #99734
Closed
jbowens
wants to merge
10,000
commits into
cockroachdb:release-22.1
from
jbowens:skip-test-TestPebbleMetricEventListener
Closed
release-22.1: storage: skip TestPebbleMetricEventListener #99734
jbowens
wants to merge
10,000
commits into
cockroachdb:release-22.1
from
jbowens:skip-test-TestPebbleMetricEventListener
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reduces the amount of work. Informs cockroachdb#98904. Epic: None Release note: None
This patch allows tsdb to support tenant-sperated timeseries metrics. The changes can be summarized as: (1) Changes in the write path: In `pkg/status` we now store metrics corresponding to non-system tenants with their tenant IDs. This makes the `source` component of the ts key includes both node IDs and tenant IDs: ``` .../[NodeID]-[TenantID] ``` Note: the system tenant does not include the tenant ID to allow for backwards compatibility. (2) Changes in the query path: (2.1) as system tenant: If the request to ts server comes annotated as system tenant, we aggregate timeseries across all tenants in that metric name's keyspace. (2.2) as secondary tenant: When a secondary tenant queries the server, it is routed through tenant connect and checked for having tenant capabilites for viewing tsdb data. During this stage that request is annotated with the Tenant ID. When the system tenant ts server recevies this query with a Tenant ID set: - If no sources are requested as part of the query, it will aggregate across all sources that match the tenant ID. - If specific sources are specified along with the tenant ID, it will scan the keyspace for the NodeID and then filter out results that do not match with the TenantID. These changes ensure that the system tenant is able to have a view of metrics across all tenants and application tenants have access to their own metrics. These changes are all done entirely server-side so no client changes are needed. The client will be automatically served the timeseries data it has access to based on its tenant capability. Fixes: cockroachdb#96438. Release note (ui change): Timeseries metrics in db concole will show tenant-specific metrics. For the system tenant, these timeseries will be aggregated across all tenants. For a secondary tenant, only metrics belonging to that particular tenant will be shown.
Previously, both the stmt and txns fingerprints pages were using the same api response from the /statements. This was because we need the stmts response to build txns pages. Howevever having to fetch both types of stats can slow down the request. Now that we can specify the fetch_mode as part of the request, we can split the 2 into their own api calls and redux fields. Epic: none Part of: cockroachdb#97875 Release note: None
This commit adds limit and sort fields to the combined statements request. These params can be used to specify how much data is returned and the priority at which to return data (e.g. top-k). The current sort options are: - service latency - cpu time - p99 (statements only) - contention time - execution count Epic: none Part of: cockroachdb#97876 Part of: cockroachdb#97875 Release note: None
This commit changes the table used for the combined stats endpoint from the view combining persisted and in-memory stats to the view that is just a wrapper around the system table. Thus, we are now reading only persisted stats from the system table for the combined stats endpoint. This commit updates tests using the combined api to flush stats before using the api. Epic: None Release note (ui change): On the SQL Activity fingerprints pages, users will not see stats that have not yet been flushed to disk.
This commit adds new knobs to the sql stats fingerprint pages. Users can now specify a limit and sort priority with their sql stats fingerprints requests. Closes: cockroachdb#97876 Part of: cockroachdb#97875 Release note: None
Previously, the calculatation for the `% of runtime` column for the sql activity pages, was performed on the client app by summing the (execCount* avgAvcLat) from all statements returned. This was okay when we were returning a large number of stmts (20,000) but now that we have reduced that limit significantly (default 100, with a max provided option of 500), performing this calculation on the client side won't give us the full picture of total runtime of the workload in the requested time interval. This commit modifies the statements API to return the total estimated runtime of the workload so that we can continue to show the '% of runtime` column. We also now provide this as a server sort option, so that users can request the top-k stmts or txns by % of runtime. Epic: none Release note (ui change): Users can request top-k stmts by % runtime in sql activity fingerprints pages.
Create new Search Criteria component, and add to Statements and Transactions page. This commit also updates the UI, with the new position of filters and reset sql stats. Part Of cockroachdb#98891 Release note (ui change): Add Search Criteria to Statements and Transactions pages, updates UX with improvements.
97789: docs: Pass in Jira tickets in the body of product change issues r=nickvigilante a=nickvigilante Epic and issue references are now pointing to the associated Jira tickets. Fixes DOC-7019 Release note: None 98873: storage: add storage.ingest_as_flushable.enabled setting r=nicktrav a=jbowens Add a cluster setting controlling whether to allow lazy ingestion of sstables that overlap with a memtable. This cluster setting defaults to enabling lazy ingestion, but is only respected by Pebble if the cluster version is ratcheted appropriately high. Epic: None Close cockroachdb#97194. Release note: None 98874: opt: set udf CalledOnNullInput correctly for scalar UDFs r=DrewKimball a=DrewKimball Previously, the `UDFPrivate.CalledOnNullInput` field would be set to true for scalar UDFs, since strict behavior is handled using a CASE statement. However, this is confusing and it is also correct to just set `CalledOnNullInput` to the value implied by the function's definition. Informs CRDB-20535 Release note: None Co-authored-by: Nick Vigilante <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Drew Kimball <[email protected]>
Fixes a hint for some instances of an "unknown function" error that was not properly applied previously. Informs: cockroachdb#99002 Epic: None Release note: None
This adds two new system privileges: - `REPLICATION`: Allows the user to call the internal functions that produce a cross-cluster replication stream. - `MANAGETENANT`: Allows the user to create and manage tenants. A user with the MANAGETENANT privileges is now able to execute the following statements: - SHOW TENANT - SHOW TENANTS - CREATE TENANT - CREATE TENANT FROM REPLICATION STREAM - DROP TENANT (if it is part of an active stream) - ALTER TENANT A user with the REPLICATION privileges is able to call the following functions: - crdb_internal.start_replication_stream - crdb_internal.replication_stream_progress - crdb_internal.stream_partition - crdb_internal.replication_stream_spec - crdb_internal.complete_replication_stream Fixes cockroachdb#95425 Release note: None
…db#99062 98353: kvserver: make some cluster settings system only r=andrewbaptist a=kvoli Update cluster settings in the `kv/kvserver` pkg to be `SystemOnly`. Previously, there were many cluster settings which which were `TenantWritable` or `TenantReadOnly`. These settings, even if altered by a tenant have no effect. There are settings which are not updated, due due to tests relying on modifying the setting value using a non-system tenant. We ignore these in this commit and defer to cockroachdb#98723 for handling these. These settings are updated to be `SystemOnly`: ``` kv.bulk_io_write.max_rate kv.bulk_sst.max_allowed_overage kv.bulk_sst.target_size kv.closed_timestamp.follower_reads_enabled kv.log_range_and_node_events.enabled kv.range_split.by_load_enabled kv.range_split.load_cpu_threshold kv.range_split.load_qps_threshold kv.replica_stats.addsst_request_size_factor kv.replication_reports.interval server.shutdown.lease_transfer_wait ``` Resolves: cockroachdb#98347 Release note (ops change): Some KV server cluster settings are now system only. These settings could previously be written or read by tenants, however writing to these settings had no effect. 99037: kvserver: skip `TestStoreRangeSplitRaceUninitializedRHS` under race/deadlock r=erikgrinaker a=erikgrinaker The Raft groups are unable to maintain quorum when stressed under race/deadlock. Resolves cockroachdb#98840. Epic: none Release note: None 99052: sql: add `switchToAnotherPortal` signal for result consumer r=ZhouXing19 a=ZhouXing19 This PR is part of the effort to implement the multiple active portals. (Extracted from cockroachdb#96358) --- ### sql/settings: add sql.pgwire.multiple_active_portals.enabled cluster setting This commit is to add a non-public `sql.pgwire.multiple_active_portals.enabled` setting. This setting is only for a PREVIEWABLE feature. With it set to true, all non-internal portals with read-only queries without sub/post queries can be paused and resumed in an interleaving manner but are executed with local plan. --- ### sql: add switchToAnotherPortal signal for result consumer Previously, after pushing the result to the consumer, we only accept the following actions as the next step: 1. Pushing more data from the same source to the same consumer; 2. Drain or close the pipeline. This commit is to add another option: pause the current execution, and switch to the execution to another portal. I.e. when receiving an `ExecPortal` cmd but for another portal, we do nothing and return the control to the connExecutor. This allows us to execute different portals interleaving-ly. Epic: CRDB-17622 Release note (sql change): add a non-public `sql.pgwire.multiple_active_portals.enabled` setting. This setting is only for a PREVIEWABLE feature. With it set to true, all non-internal portals with read-only queries without sub/post queries can be paused and resumed in an interleaving manner but are executed with local plan. 99062: sql: deflake `TestTrace` r=yuzefovich a=erikgrinaker This has been seen to flake in CI: ``` === RUN TestTrace/ShowTraceForVectorized/TracingOff/node-1 trace_test.go:386: expected span: "session recording", got: "pendingLeaseRequest: requesting lease" trace_test.go:397: remaining span: "session recording" trace_test.go:397: remaining span: "sql query" trace_test.go:397: remaining span: "sql txn" trace_test.go:397: remaining span: "txn coordinator send" --- FAIL: TestTrace/ShowTraceForVectorized/TracingOff/node-1 (0.02s) ``` There was already an exception for this span, but with a `storage.` prefix. This patch removes the prefix, and makes it match on substrings. This flake has possibly been made worse with the introduction of a metamorphic setting to only use expiration-based leases in ecc931b. Resolves cockroachdb#98971. Epic: none Release note: None Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Jane Xing <[email protected]>
98833: kv: don't signal raft scheduler when already on scheduler goroutine r=erikgrinaker a=nvanbenschoten Informs cockroachdb#96800. This commit avoids unnecessary interactions with the Raft scheduler as a way to relieve pressure on the scheduler's mutex. In cockroachdb#96800 (comment), we found that calls into the scheduler in response to synchronously handled `MsgStorageApply`s were responsible for **17.8%** raftScheduler enqueue calls. These enqueue calls were unnecessary because they were performed while already running on the raft scheduler, so the corresponding local messages were already going to be flushed and processed before the scheduler is yielded. By recognizing when we are in these situations, we can omit the unnecessary scheduler interactions. ``` # benchmarking on AWS with instance stores # see cockroachdb#96800 for why this has little to no effect on GCP of AWS with EBS name old ops/sec new ops/sec delta kv0/enc=false/nodes=3/cpu=32 51.5k ± 2% 57.3k ± 3% +11.18% (p=0.008 n=5+5) name old p50(ms) new p50(ms) delta kv0/enc=false/nodes=3/cpu=32 2.40 ± 0% 2.10 ± 5% -12.50% (p=0.000 n=4+5) name old p99(ms) new p99(ms) delta kv0/enc=false/nodes=3/cpu=32 11.0 ± 0% 9.6 ± 4% -12.36% (p=0.000 n=4+5) ``` Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Previously COPY would allow a wide range of syntax in the COPY TO substatement. Now like PG we limit it to a few things. PG grammar is: ``` PreparableStmt: SelectStmt | InsertStmt | UpdateStmt | DeleteStmt | MergeStmt ``` And now we do something similar. This prevents the wheels from coming off when RSG generates EXPLAIN's in the substatement for instance. Release note: none Epic: none
Release note: none. Epic: none.
Release note: none. Epic: none.
99031: ccl/multiregionccl: skip flaky regional_by_table test r=matthewtodd a=matthewtodd Part of cockroachdb#98020. Release note: None Co-authored-by: Matthew Todd <[email protected]>
98993: jobs: enhance the InfoStorage API r=adityamaru a=knz Needed for cockroachdb#98459. Epic: CRDB-23559 98998: kvserver: unskip TestBehaviorDuringLeaseTransfer r=shralex a=shralex Unskip this test, its not failing locally. Previous teamcity logs are no longer available. Epic: https://cockroachlabs.atlassian.net/browse/CRDB-21501 Fixes: cockroachdb#91948 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: shralex <[email protected]>
Before this change, we could encounter internal errors while attempting to add result columns during a `CREATE OR REPLACE VIEW` if the number of columns in the new view was less than the number of columns in the old view. This led to an inconsistency with postgres, which would only return the error `cannot drop columns from view`. This PR moves the check comparing the number of columns before and after the view replacement earlier so that the correct error returns. Co-authored-by: [email protected] Fixes: cockroachdb#99000 Epic: None Release note (bug fix): Fixes an internal error that can occur when `CREATE OR REPLACE VIEW` replaces a view with fewer columns and another entity depended on the view.
98077: ts: add support for multitenant timeseries metrics r=aadityasondhi a=aadityasondhi This patch allows tsdb to support tenant-sperated timeseries metrics. The changes can be summarized as: (1) Changes in the write path: In `pkg/status` we now store metrics corresponding to non-system tenants with their tenant IDs. This makes the `source` component of the ts key includes both node IDs and tenant IDs: ``` .../[NodeID]-[TenantID] ``` Note: the system tenant does not include the tenant ID to allow for backwards compatibility. (2) Changes in the query path: (2.1) as system tenant: If the request to ts server comes annotated as system tenant, we aggregate timeseries across all tenants in that metric name's keyspace. (2.2) as secondary tenant: When a secondary tenant queries the server, it is routed through tenant connect and checked for having tenant capabilites for viewing tsdb data. During this stage that request is annotated with the Tenant ID. When the system tenant ts server recevies this query with a Tenant ID set: - If no sources are requested as part of the query, it will aggregate across all sources that match the tenant ID. - If specific sources are specified along with the tenant ID, it will scan the keyspace for the NodeID and then filter out results that do not match with the TenantID. These changes ensure that the system tenant is able to have a view of metrics across all tenants and application tenants have access to their own metrics. These changes are all done entirely server-side so no client changes are needed. The client will be automatically served the timeseries data it has access to based on its tenant capability. Fixes: cockroachdb#96438. Release note (ui change): Timeseries metrics in db concole will show tenant-specific metrics. For the system tenant, these timeseries will be aggregated across all tenants. For a secondary tenant, only metrics belonging to that particular tenant will be shown. 98815: server, ui: add sort and limit to sql activity pages, switch sql activity pages to read only from system table r=maryliag a=xinhaoz See individual commits. DB-Console: https://www.loom.com/share/73248244baa54cdf80389d0eec788447 DB-Console Details pages: https://www.loom.com/share/2d74c6cf082f4dcfb5e7f690166c2baf Co-authored-by: Aaditya Sondhi <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: maryliag <[email protected]>
98897: server: fix tenant auth for status server r=knz,THardy98 a=dhartunian Previously, the authentication for gRPC endpoints that are exposed via HTTP on the tenant was not implemented correctly. Because the HTTP session is decoded into gRPC metadata, and that metadata was contained in the Context object passed through to the Tenant Connector, the username from the tenant could leak into the kv layer and be treated as an authenticated username. If that username happened to match one in the system tenant it would be accepted as valid. Additinally, some endpoints were missing their authentication code. This did not break functionality because a gRPC request without any metadata is treated as an internal request with admin permissions. *Warning*: If a request contains a validated username as part of gRPC metadata and that metadata is preserved as the request is handed down to the KV layer, it could be interpreted as a valid user on the system tenant and cause an escalation of privileges. This commit adds authentication to the HotRangesV2 endpoint and SpanStats endpoints which were missing it, and contains tests that ensure that the endpoints return errors when the user does not have the correct permissions. Epic: CRDB-12100 Release note: None 98958: sql,backupccl: set system table user ID columns to be NOT NULL r=rafiss,stevendanna a=andyyang890 This PR sets the user ID columns in system tables to be NOT NULL and when applicable, updates the `RESTORE` logic to account for the case where a backup may have been created before the user ID column was added. Part of cockroachdb#87079 Release note: None Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Andy Yang <[email protected]>
This patch adds a missing case for the public role in the backfilling portion of the system.privileges user ID migration. Release note: None
This patch adds a missing case for the empty role in the backfilling portion of the system.database_role_settings user ID migration. Release note: None
98854: kvserver: shard Raft scheduler r=erikgrinaker a=erikgrinaker The Raft scheduler mutex can become very contended on machines with many cores and high range counts. This patch shards the scheduler by allocating ranges and workers to individual scheduler shards. By default, we create a new shard for every 16 workers, and distribute workers evenly. We spin up 8 workers per CPU core, capped at 96, so 16 is equivalent to 2 CPUs per shard, or a maximum of 6 shards. This significantly relieves contention at high core counts, while also avoiding starvation by excessive sharding. The shard size can be adjusted via `COCKROACH_SCHEDULER_SHARD_SIZE`. This results in a substantial performance improvement on high-CPU nodes: ``` name old ops/sec new ops/sec delta kv0/enc=false/nodes=3/cpu=4 7.71k ± 5% 7.93k ± 4% ~ (p=0.310 n=5+5) kv0/enc=false/nodes=3/cpu=8 15.6k ± 3% 14.8k ± 7% ~ (p=0.095 n=5+5) kv0/enc=false/nodes=3/cpu=32 43.4k ± 2% 45.0k ± 3% +3.73% (p=0.032 n=5+5) kv0/enc=false/nodes=3/cpu=96/cloud=aws 40.5k ± 2% 61.7k ± 1% +52.53% (p=0.008 n=5+5) kv0/enc=false/nodes=3/cpu=96/cloud=gce 35.6k ± 4% 44.5k ± 0% +24.99% (p=0.008 n=5+5) name old p50 new p50 delta kv0/enc=false/nodes=3/cpu=4 21.2 ± 6% 20.6 ± 3% ~ (p=0.397 n=5+5) kv0/enc=false/nodes=3/cpu=8 10.5 ± 0% 11.2 ± 8% ~ (p=0.079 n=4+5) kv0/enc=false/nodes=3/cpu=32 4.16 ± 1% 4.00 ± 5% ~ (p=0.143 n=5+5) kv0/enc=false/nodes=3/cpu=96/cloud=aws 3.00 ± 0% 2.00 ± 0% ~ (p=0.079 n=4+5) kv0/enc=false/nodes=3/cpu=96/cloud=gce 4.70 ± 0% 4.10 ± 0% -12.77% (p=0.000 n=5+4) name old p95 new p95 delta kv0/enc=false/nodes=3/cpu=4 61.6 ± 5% 60.8 ± 3% ~ (p=0.762 n=5+5) kv0/enc=false/nodes=3/cpu=8 28.3 ± 4% 30.4 ± 0% +7.34% (p=0.016 n=5+4) kv0/enc=false/nodes=3/cpu=32 7.98 ± 2% 7.60 ± 0% -4.76% (p=0.008 n=5+5) kv0/enc=false/nodes=3/cpu=96/cloud=aws 12.3 ± 2% 6.8 ± 0% -44.72% (p=0.000 n=5+4) kv0/enc=false/nodes=3/cpu=96/cloud=gce 10.2 ± 3% 8.9 ± 0% -12.75% (p=0.000 n=5+4) name old p99 new p99 delta kv0/enc=false/nodes=3/cpu=4 89.8 ± 7% 88.9 ± 6% ~ (p=0.921 n=5+5) kv0/enc=false/nodes=3/cpu=8 46.1 ± 0% 48.6 ± 5% +5.47% (p=0.048 n=5+5) kv0/enc=false/nodes=3/cpu=32 11.5 ± 0% 11.0 ± 0% -4.35% (p=0.008 n=5+5) kv0/enc=false/nodes=3/cpu=96/cloud=aws 14.0 ± 3% 12.1 ± 0% -13.32% (p=0.008 n=5+5) kv0/enc=false/nodes=3/cpu=96/cloud=gce 14.3 ± 3% 13.1 ± 0% -8.55% (p=0.000 n=4+5) ``` The basic cost of enqueueing ranges in the scheduler (without workers or contention) only increases slightly in absolute terms, thanks to `raftSchedulerBatch` pre-sharding the enqueued ranges: ``` name old time/op new time/op delta SchedulerEnqueueRaftTicks/collect=true/ranges=10/workers=8-24 457ns ± 2% 564ns ± 2% +23.36% (p=0.001 n=7+7) SchedulerEnqueueRaftTicks/collect=true/ranges=10/workers=16-24 461ns ± 3% 563ns ± 2% +22.14% (p=0.000 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=10/workers=32-24 459ns ± 2% 591ns ± 2% +28.63% (p=0.000 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=10/workers=64-24 455ns ± 0% 776ns ± 5% +70.60% (p=0.001 n=6+8) SchedulerEnqueueRaftTicks/collect=true/ranges=10/workers=128-24 456ns ± 2% 1058ns ± 1% +132.13% (p=0.000 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=100000/workers=8-24 7.15ms ± 1% 8.18ms ± 1% +14.33% (p=0.000 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=100000/workers=16-24 7.13ms ± 1% 8.18ms ± 1% +14.77% (p=0.000 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=100000/workers=32-24 7.12ms ± 2% 7.86ms ± 1% +10.30% (p=0.000 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=100000/workers=64-24 7.20ms ± 1% 7.11ms ± 1% -1.27% (p=0.001 n=8+8) SchedulerEnqueueRaftTicks/collect=true/ranges=100000/workers=128-24 7.12ms ± 2% 7.16ms ± 3% ~ (p=0.721 n=8+8) ``` Furthermore, on an idle 24-core 3-node cluster with 50.000 unquiesced ranges, this reduced CPU usage from 12% to 10%. Resolves cockroachdb#98811. Touches cockroachdb#96800. Hat tip to `@nvanbenschoten` for the initial prototype. Epic: none Release note (performance improvement): The Raft scheduler is now sharded to relieve contention during range Raft processing, which can significantly improve performance at high CPU core counts. Co-authored-by: Erik Grinaker <[email protected]>
Epic: none Release note: None
98225: ui: show normalized CPU Usage metric on Node Map r=koorosh a=koorosh Before, Node map (on Overview page) displayed current system and user CPU usage that didn't represent the same data as CPU Percent metric on Metrics page. Now, Node Map displays the same metric to provide users consistent information. Release note (admin ui, bug fix): show normalized CPU usage on Node Map. Resolve: cockroachdb#87664 98985: storage: scan local keyspace in `TestMVCCHistories` r=erikgrinaker a=erikgrinaker This patch processes the local keyspace along with the user keyspace in `TestMVCCHistories`, which is useful for MVCC stats tests of system keys (e.g. `SysBytes`). This was motivated by tracking down an observed discrepancy in `SysBytes`, but that hasn't borne fruit yet. Touches cockroachdb#93896. Epic: none Release note: None 99050: ccl/sqlproxyccl/acl: fix TestParsingErrorHandling flake r=adityamaru,jaylim-crl a=pjtatlow TestParsingErrorHandling asserts that the error count metric is updated correctly when reading a file fails or succeeds, and the files are checked at a regular interval. For the tests that interval is set to 100ms, and we waited 200ms to ensure the metric would be updated, but that seems to not be reliable. This change increases the wait to 500ms which should ensure the file is re-read before we check the value of the error metics. Fixes cockroachdb#98839 Co-authored-by: Andrii Vorobiov <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: PJ Tatlow <[email protected]>
This code change adds a `binary_version` column to the instances table. This is done by adding the column to the bootstrap schema for system.sql_instances, and piggy-backing on the existing code for the V23_1_SystemRbrReadNew migration that overwrites the live schema for this table by the bootstrap copy. This redefinition of the meaning of the V23_1_SystemRbrReadNew is backward-incompatible and is only possible because this commit is merged before the v23.1 branch is cut. Release note: None Epic: CRDB-20860
99051: admin: statement diagnostics uses correct auth helpers r=knz a=dhartunian Previously, the statement diagnostics HTTP handler was initialized using the incorrect `ctx` value. This caused the HTTP request context to not be correctly handed down to the handler. Furthermore, the call to `userFromIncomingRPCContext` was incorrect in this scenario since it relied on a gRPC context being populated with HTTP session information, which did not exist. That code sets a `root` user when context is missing because the gRPC handlers are used for inter-node communication *and* HTTP with the DB console and external tools. This commit attaches the request context to the diagnostics bundle handler correctly, and amends the authorization code to use `userFromHTTPAuthInfoContext` which correctly reads the session cookie info from the request (like many `api/v2` handlers do since those exist outside the gRPC infrastructure). Resolves cockroachdb#99049 Epic: None Release note (security update): Previously, users could gain unauthorized access to statement diagnostic bundles they did not create if they requested the bundle through an HTTP request to `/_admin/v1/stmtbundle/<id>` and correctly guessed its (non-secret) ID. This change locks down this endpoint behind the usual SQL gating that correctly uses the SQL user in the HTTP session as identified by their cookie. Co-authored-by: David Hartunian <[email protected]>
98906: backupccl: move openSSTs call into the restore workers r=lidorcarmel a=lidorcarmel Currently we see openSSTs being a bottleneck during restore when using more than 4 workers. This patch moves the openSSTs call into the worker itself, so that this work can be parallelized. This is needed for later PR which will increase the number of workers. Also, this change simplifies the code a bit and makes it easier to implement cockroachdb#93324, because in that PR we want to produce a few partial SSTs that need to be processed serially. Before this patch it wasn't trivial to make sure that the N workers will not process those partial SSTs in the wrong order, and with this patch each worker will process a single mergedSST, and therefore can serialize the partial SSTs created from that mergedSST. Tested by running a roachtest (4 nodes, 8 cores) with and without this change. The fixed version was faster: 80MB/s/node vs 60 but some of it is noise, we do expect a perf improvement when using more workers and other params tuned, which is the next step. Informs: cockroachdb#98015 Epic: CRDB-20916 Release note: None Co-authored-by: Lidor Carmel <[email protected]>
This change removes the following log spam: ``` could not run claimed job 102: no resumer is available for AUTO CONFIG RUNNER ``` Release note: None
jbowens
changed the title
storage: skip TestPebbleMetricEventListener
release-22.1: storage: skip TestPebbleMetricEventListener
Mar 27, 2023
jbowens
requested review from
mgartner,
rhu713 and
samiskin
and removed request for
a team
March 27, 2023 21:34
Sorry for the pings; disastrous skip-test with the wrong base. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Informs #97843.
Release note: None.
Release justification: Non-production code changes.