Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: reduce allocations in ProcOutputHelper.ProcessRow in some cases #22462

Closed
asubiotto opened this issue Feb 7, 2018 · 5 comments
Closed
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Milestone

Comments

@asubiotto
Copy link
Contributor

In distsql queries, processors use a ProcOutputHelper to output rows to the next stage. At this output stage, a new row is allocated and this new row is sent over. Experimentation has shown that these allocations are a significant overhead (#20621 (comment)).

The reason for these allocations lies in our use of RowChannels for asynchronous communication between processors. The upstream processor does not know when the downstream processor is done with the row, so it must allocate a new row in order to not reuse memory that the downstream processor will possibly read at a later point.

However, with work happening on RowChannel elision (#20550) and thus synchronous communication between processors, it is now possible to avoid these allocations between some processors as the ownership of the memory is clearly defined.

@asubiotto asubiotto added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Feb 7, 2018
@asubiotto asubiotto added this to the 2.0 milestone Feb 7, 2018
@jordanlewis jordanlewis modified the milestones: 2.0, 2.1 Feb 22, 2018
@jordanlewis
Copy link
Member

We need to clear up the contract around Next(). It's unclear to me whether any processor currently returns memory from Next() that it expects to then modify. If no processor does that, then we can avoid these allocations.

Processors that do expect to be able to keep their output should be the only ones that have to perform that copy.

@andreimatei
Copy link
Contributor

I don't understand the original comment about the RowChannel - when is passing along a row by a processor no bueno?

@asubiotto
Copy link
Contributor Author

Consider a tableReader hooked up to a RowChannel. The RowFetcher's NextRow semantics are that a row is only valid until the next call. The tableReader gets its first row, writes asynchronously to the RowChannel. Before the downstream processor can call Next() on the RowChannel, the tableReader gets a second row, which overwrites the row written to the RowChannel.

@andreimatei
Copy link
Contributor

If that's what the tableReader does, then it's breaking the contract of RowReceiver.Push()

@asubiotto
Copy link
Contributor Author

It's not, because it performs an allocation. I was just expanding on the original RowChannel comment as to why allocations are necessary in that scenario.

@jordanlewis jordanlewis assigned jordanlewis and unassigned rjnn Apr 11, 2018
craig bot pushed a commit that referenced this issue Jun 4, 2018
24589: distsqlrun: don't allocate between fused processors r=jordanlewis a=jordanlewis

distsqlrun: don't allocate between fused processors

Previously, `ProcOutputHelper.ProcessRow` (and, by extension, all
`RowSource.Next` implementations) always allocated a fresh
`EncDatumRow`. This was wasteful - not every processor needs to be able
to hold a reference to the output of `RowSource.Next`.

Now, `ProcessRow` never allocates a fresh `EncDatumRow`, and the
contract of `RowSource.Next` has been changed to say that it's not valid
to hang on to a row returned by `Next` past the next call to `Next`.
Processors that need to hold on to a row from their upstreams have been
modified to make an explicit copy to achieve this safely.

Finally, a new `copyingRowReceiver` is introduced that makes a copy of
every row that is `Push`'d to it. A `copyingRowReceiver` is inserted
before every router, since routers all expect that their inputs will be
immutable. This preserves the safety of sending outputs of
`RowSource.Next`, which aren't safe to hold on to, to routers, which
expect rows that *are* safe to hold on to.

Release note: None

Fixes #22462.
Fixes #24452.

26355: libroach: fix excessive compactions performed by DBCompactRange r=bdarnell,tschottdorf a=petermattis

Fix excessive compactions from `DBCompactRange` due to mishandling of
the first and last ranges to compact. When a non-empty start or end key
is specified, DBCompactRange was previously calling
`rocksdb::DB::CompactRange` with a `null` start/end key resulting in
compacting from the beginning (or to the end) of the entire key space.

See #24029

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Peter Mattis <[email protected]>
@craig craig bot closed this as completed in #24589 Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

4 participants