Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: enable resumption of a flow for pausable portals #99173

Merged
merged 3 commits into from
Mar 22, 2023

Conversation

ZhouXing19
Copy link
Collaborator

@ZhouXing19 ZhouXing19 commented Mar 21, 2023

This PR is part of the implementation of multiple active portals. (Extracted from #96358)

We now introduce a Resume() method for flow, and when a pausable portal is being re-executed, rather than generating a new flow, we resume the persisted flow to continue the previous execution.


sql: add telemetry MultipleActivePortalCounter

This commit added a telemetry counter MultipleActivePortalCounter that would
be incremented each time the cluster setting
sql.pgwire.multiple_active_portals.enabled is set to true


sql: add Resume method for flowinfra.Flow and execinfra.Processor

For pausable portals, each execution needs to resume the processor with new
output receiver. We don't need to restart the processors, and this Resume()
step can be called many times after Run() is called.


sql: reuse flow for pausable portal

To execute portals in an interleaving manner, we need to persist the flow and
queryID so that we can continue fetching the result when we come back to the same
portal.

We now introduce pauseInfo field in sql.PreparedPortal that stores this
metadata. It's set during the first-time execution of an engine, and all
following executions will reuse the flow and the queryID. This also implies that
these resources should not be cleaned up with the end of each execution.
Implementation for the clean-up steps is included in the next commit.

Also, in this commit we hang a *PreparedPortal to the planner, and how it is
set can be seen in the next commit as well.

Epic: CRDB-17622

Release note: None

@ZhouXing19 ZhouXing19 requested review from yuzefovich and a team March 21, 2023 23:13
@ZhouXing19 ZhouXing19 requested review from a team as code owners March 21, 2023 23:13
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ZhouXing19
Copy link
Collaborator Author

@yuzefovich since vectorizedFlow.Resume() will always call f.FlowBase.Resume(recv), maybe we don't need to have resumeCtx for vectorizedFlow?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments, and it seems like the linter is unhappy about some of the unused stuff.

since vectorizedFlow.Resume() will always call f.FlowBase.Resume(recv), maybe we don't need to have resumeCtx for vectorizedFlow?

Yeah, sounds good.

Reviewed 3 of 3 files at r1, 11 of 11 files at r2, 6 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ZhouXing19)


pkg/sql/conn_executor.go line 1077 at r1 (raw file):

	ex.extraTxnState.createdSequences = make(map[descpb.ID]struct{})

	enableMultipleActivePortals.SetOnChange(&s.cfg.Settings.SV, func(ctx context.Context) {

I know I was the one who suggested to put this into newConnExecutor, but this is incorrect. newConnExecutor is called for each new session which results in adding a new callback to this cluster setting for each session, and we don't want that. We want to add a callback only once per SQL server, so ideally we'd do so when evaluating server.newSQLServer.

Perhaps we should define a new exported method next to where enableMultipleActivePortals is defined, and that method would be called by server.newSQLServer to add the callback. Another alternative is to export enableMultipleActivePortals itself so that we'd call SetOnChange directly from server/server_sql.go. Probably I like the latter option the most.


pkg/sql/colflow/vectorized_flow.go line 299 at r2 (raw file):

				Err: errors.AssertionFailedf(
					"batchFlowCoordinator should be nil for vectorizedFlow",
				)})

nit: we should call recv.ProducerDone here.


pkg/sql/physicalplan/aggregator_funcs_test.go line 87 at r3 (raw file):

		t.Fatal(err)
	}
	flow.Run(ctx, false /* noWait */)

nit: this change belongs in the second commit.

@ZhouXing19 ZhouXing19 requested a review from a team as a code owner March 22, 2023 02:52
This commit added a telemetry counter `MultipleActivePortalCounter` that would
be incremented each time the cluster setting
`sql.pgwire.multiple_active_portals.enabled` is set to true

Release note: None
For pausable portals, each execution needs to resume the processor with new
output receiver. We don't need to restart the processors, and this `Resume()`
step can be called many times after `Run()` is called.

Release note: None
To execute portals in an interleaving manner, we need to persist the flow and
queryID so that we can _continue_ fetching the result when we come back to the same
portal.

We now introduce `pauseInfo` field in `sql.PreparedPortal` that stores this
metadata. It's set during the first-time execution of an engine, and all
following executions will reuse the flow and the queryID. This also implies that
these resources should not be cleaned up with the end of each execution.
Implementation for the clean-up steps is included in the next commit.

Also, in this commit we hang a `*PreparedPortal` to the planner, and how it is
set can be seen in the next commit as well.

Release note: None
Copy link
Collaborator Author

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the linter is unhappy about some of the unused stuff.

I removed fields that are unused for now.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/conn_executor.go line 1077 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I know I was the one who suggested to put this into newConnExecutor, but this is incorrect. newConnExecutor is called for each new session which results in adding a new callback to this cluster setting for each session, and we don't want that. We want to add a callback only once per SQL server, so ideally we'd do so when evaluating server.newSQLServer.

Perhaps we should define a new exported method next to where enableMultipleActivePortals is defined, and that method would be called by server.newSQLServer to add the callback. Another alternative is to export enableMultipleActivePortals itself so that we'd call SetOnChange directly from server/server_sql.go. Probably I like the latter option the most.

Yup, I took the export solution.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Also, we'll need to backport this to 23.1, so add the corresponding label and then request an approval in release-backports slack channel from Rafi (I'm not a TL).

Reviewed 20 of 20 files at r4, 13 of 13 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ZhouXing19)

@ZhouXing19 ZhouXing19 added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 22, 2023
@ZhouXing19
Copy link
Collaborator Author

TFTR!!

bors r=yuzefovich

@craig
Copy link
Contributor

craig bot commented Mar 22, 2023

Build succeeded:

@craig craig bot merged commit 704f5ad into cockroachdb:master Mar 22, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 22, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

craig bot pushed a commit that referenced this pull request Apr 7, 2023
99663: sql: update connExecutor logic for pausable portals r=ZhouXing19 a=ZhouXing19

This PR replaces #96358 and is part of the initial implementation of multiple active portals.

----

This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it):

1. Not an internal query;
2. Read-only query;
3. No sub-queries or post-queries.

And such a portal will only have the statement executed with a _non-distributed_ plan. 

This feature is gated by a session variable `multiple_active_portals_enabled`. When it's set `true`, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire `Bind` stmt. 

The core idea of this implementation is 
1. Add a `switchToAnotherPortal` status to the result-consumption state machine. When we receive an `ExecPortal` message for a different portal, we simply return the control to the connExecutor. (#99052)
2. Persist `flow` `queryID` `span` and `instrumentationHelper` for the portal, and reuse it when we re-execute a portal. This is to ensure we _continue_ the fetching rather than starting all over. (#99173)
3. To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR)

Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code. 

Also, we don't support distributed plan yet, as it involves much more complicated changes. See `Start with an entirely local plan` section in the [design doc](https://docs.google.com/document/d/1SpKTrTqc4AlGWBqBNgmyXfTweUUsrlqIaSkmaXpznA8/edit). Support for this will come as a follow-up.

Epic: CRDB-17622

Release note (sql change): initial support for multiple active portals. Now with session variable `multiple_active_portals_enabled` set to true,  portals satisfying all following restrictions can be executed in an interleaving manner:  1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan. 





99947: ui: small fixes to DB Console charts shown for secondary tenants r=dhartunian a=abarganier

#97995 updated the
DB Console to filter out KV-specific charts from the metrics page
when viewing DB Console as a secondary application tenant.

The PR missed a couple small details. This patch cleans those
up with the following:

- Removes KV latency charts for app tenants
- Adds a single storage graph for app tenants showing livebytes
- Removes the "Capacity" chart on the Overview dashboard for app
  tenants

Release note: none

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-12100

NB: Please only review the final commit. 1st commit is being reviewed separately @ #99860

100188: changefeedccl: pubsub sink refactor to batching sink r=rickystewart a=samiskin

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237

This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework.

The changes involve:
1. Moves the Pubsub code to match the `SinkClient` interface, moving to using the lower level v1 pubsub API that lets us publish batches manually
3. Removing the extra call to json.Marshal
4. Moving to using the `pstest` package for validating results in unit tests
5. Adding topic handling to the batching sink, where batches are created per-topic
6. Added a pubsub_sink_config since it can now handle Retry and Flush config settings
7. Added metrics even to the old pubsub for the purpose of comparing the two versions

At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka.

Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds):
<img width="637" alt="Screenshot 2023-03-30 at 3 38 25 PM" src="https://user-images.githubusercontent.com/6236424/229863386-edaee27d-9762-4806-bab6-e18b8a6169d6.png">

Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them):
<img width="642" alt="Screenshot 2023-04-04 at 12 53 45 PM" src="https://user-images.githubusercontent.com/6236424/229863507-1883ea45-d8ce-437b-9b9c-550afec68752.png">

In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together.
<img width="574" alt="Screenshot 2023-04-04 at 12 59 51 PM" src="https://user-images.githubusercontent.com/6236424/229864083-758c0814-d53c-447e-84c3-471cf5d56c44.png">

Publish requests remained the same despite way more messages in v2
<img width="1150" alt="Screenshot 2023-04-04 at 1 46 51 PM" src="https://user-images.githubusercontent.com/6236424/229875314-6e07177e-62c4-4c15-b13f-f75e8143e011.png">



Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting.

100620: pkg/server: move DataDistribution to systemAdminServer r=dhartunian a=abarganier

The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within.

Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see #98820).

For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant.

Given all of this, we have decided to make this endpoint only available for the system tenant. This patch
accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants.

Release note: none

Informs: #97942

Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Shiranka Miskin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants