Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: clean up meta components handling #62221

Merged
merged 5 commits into from
Apr 22, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Mar 19, 2021

colexec: plumb statsCollectors directly in most cases

This commit is mostly a mechanical change which replaces getStats
functions with using the stats collectors directly for the root
materializer and the hash routers. The outbox still expects a callback
because of the need to append flow-level stats on the last outbox.

Additionally, the drain helper of the materializer has been updated so
that the stats are retrieved before the metadata sources are drained.
This will be an invariant enforced by the checkers in the follow-up
commits.

Release note: None

colexecargs: tightly couple root op with metadata sources and closers

This commit introduces a utility struct OpWithMetaInfo that has a root
operator as well as the metadata sources and closers present in the tree
handling of which (draining and closing, respectively) hasn't been
claimed yet. The struct will be the same (and will replace)
opDAGWithMetaSources and SynchronizerInput by the follow-up commit.

The motivation behind this change is that we want to attribute stats
collectors, metadata sources, and - why not - closers to the tree of
operators in which they are present.

Release note: None

colexec: clean up stats collection from wrapped processors

Previously, if we had wrapped row-execution processors in the vectorized
flow, during the stats collection two ComponentStats objects would get
emitted for each wrapped processor - one by the processor itself and
another by the vectorized stats collector that is wrapping the
columnarizer. Both stats objects would have the same ComponentID, so
they would get Unioned. This commit cleans up the situation by making
the columnarizer "hijack" the ExecStatsForTrace function from the
wrapped processor.

This required a bit of plumbing, namely, we need to:

  • track the "root" columnarizer separately in NewColOperator call so
    that later the vectorized stats collectors could augment the
    ComponentStats object created by the wrapped processor with additional
    information
  • provide an interface for the columnarizer to hijack the stats
    function.

Release note: None

colexec: finish stats collection refactor

This commit plumbs the stats collectors into OpWithMetaInfo object
which allows us to get rid of separate opDAGWithMetaSources and
SynchronizerInput structs as well as more clealy define the ownership
of the stats collectors. Now all components can drain their input trees
are also responsible for collecting the stats, if applicable, before
draining. This invariant will be enforced by the follow-up commit.

Notable changes are to the synchronizers - now all of them (ordered,
serial unordered, parallel unordered) will be collecting stats. This
required creating a tracing span on the first call to Next so that we
had a way to propagate stats. That a-bit-hacky logic will be removed
once the Operator interface is updated to take in the context object
in Init.

Release note: None

colflow: enforce stats collection before drain meta

This commit introduces a dummy vectorized stats collector that is also
a metadata source that simply checks that GetStats has been called
before DrainMeta and returns an error as a metadata if not. This stats
invariants checker is now planned whenever util.CrdbTestBuild is true
(which is when TAGS=crdb_test was specified).

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Mar 19, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the stats-refactor branch 8 times, most recently from b8b1c89 to 6dd24eb Compare March 19, 2021 16:55
@yuzefovich yuzefovich changed the title WIP colflow: extract vectorized stats collectors for better tracking colexec: clean up meta components handling Mar 19, 2021
@yuzefovich yuzefovich force-pushed the stats-refactor branch 3 times, most recently from ab83dec to ac784f3 Compare March 19, 2021 19:18
@yuzefovich yuzefovich requested review from asubiotto and a team March 19, 2021 19:19
@yuzefovich
Copy link
Member Author

I think this is RFAL.

A more detailed motivation behind this change is here, but the gist of the change is introducing the following struct

type OpWithMetaInfo struct {
  Root colexecop.Operator
  StatsCollectors []colexecop.VectorizedStatsCollector
  MetadataSources execinfrapb.MetadataSources
  ToClose colexecop.Closers
}

to tightly couple stats collectors and metadata sources which simplifies the tracking of which components belong to which input tree. Closers are added because why not. The only component that doesn't use this new struct is an outbox because it requires special handling of stats collection in order to emit the flow-level stats on the last outbox on a remote node.

@yuzefovich yuzefovich force-pushed the stats-refactor branch 3 times, most recently from 8eca6e2 to c8c20c2 Compare March 22, 2021 23:43
@yuzefovich yuzefovich removed the request for review from asubiotto April 9, 2021 22:33
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Apr 9, 2021
@yuzefovich yuzefovich requested review from RaduBerinde and rytaft April 9, 2021 23:06
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but there is a lot here I'm not familiar with. Maybe @jordanlewis can take a look as well.

Reviewed 17 of 41 files at r2, 22 of 29 files at r3, 24 of 25 files at r4, 2 of 2 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)


pkg/sql/colexec/columnarizer.go, line 48 at r3 (raw file):

// chunk into a coldata.Batch column by column.
type Columnarizer struct {
	execinfra.ProcessorBase

Unrelated, but I find it strange that these are embedding ProcessorBase (given that they're not processors). It may pay off to think about what functionality exactly do they need and if we can extract that out into something smaller.


pkg/sql/colexec/columnarizer.go, line 159 at r3 (raw file):

		c.input.Start(c.ctx)
		c.initStatus = colexecop.OperatorInitialized
		if pb, ok := c.input.(execinfra.ExecStatsForTraceHijacker); ok {

[nit] I know that pb here is from "processor base", but it's kind of cryptic


pkg/sql/colexec/materializer.go, line 123 at r1 (raw file):

	if len(d.statsCollectors) > 0 {
		// If statsCollectors is non-nil, then the drainHelper is responsible
		// for attaching the execution statistics to the span, yet we don't get

I have trouble understanding this comment. What does "we don't get the recording from the span" mean?


pkg/sql/colflow/stats.go, line 243 at r5 (raw file):

}

func maybeAddStatsInvariantChecker(op *colexecargs.OpWithMetaInfo) {

[nit] could use a short comment


pkg/sql/rowexec/aggregator.go, line 157 at r3 (raw file):

// interface.
func (ag *aggregatorBase) HijackExecStatsForTrace() func() *execinfrapb.ComponentStats {
	return ag.ProcessorBase.HijackExecStatsForTrace()

Why do we need these wrappers, don't we inherit the function from ProcessorBase anyway?

@yuzefovich yuzefovich force-pushed the stats-refactor branch 2 times, most recently from 56376ae to 2aa4dca Compare April 16, 2021 22:32
@yuzefovich yuzefovich force-pushed the stats-refactor branch 2 times, most recently from 406d02e to 6ddba03 Compare April 16, 2021 22:55
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! @jordanlewis could you take a look from higher level perspective?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/colexec/columnarizer.go, line 48 at r3 (raw file):

Previously, RaduBerinde wrote…

Unrelated, but I find it strange that these are embedding ProcessorBase (given that they're not processors). It may pay off to think about what functionality exactly do they need and if we can extract that out into something smaller.

Fair point, left a TODO to think about it.


pkg/sql/colexec/materializer.go, line 123 at r1 (raw file):

Previously, RaduBerinde wrote…

I have trouble understanding this comment. What does "we don't get the recording from the span" mean?

Updated the comment, let me know if it makes sense.


pkg/sql/rowexec/aggregator.go, line 157 at r3 (raw file):

Previously, RaduBerinde wrote…

Why do we need these wrappers, don't we inherit the function from ProcessorBase anyway?

Hm, when I was writing this code originally, I ran into some issues (namely, a processor (hidden behind RowSource interface) couldn't be converted to ExecStatsForTraceHijacker - I assumed it was because all processors embed ProcessorBase by value, by the method implementation must have a pointer receiver so that the ProcessorBase is updated accordingly when execStats func is hijacked. As a result, I was running in incomplete EXPLAIN ANALYZE diagrams.

However, now I removed these wrappers, and things seem to be fine. Not sure what changed, but I've removed them.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/rowexec/aggregator.go, line 157 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, when I was writing this code originally, I ran into some issues (namely, a processor (hidden behind RowSource interface) couldn't be converted to ExecStatsForTraceHijacker - I assumed it was because all processors embed ProcessorBase by value, by the method implementation must have a pointer receiver so that the ProcessorBase is updated accordingly when execStats func is hijacked. As a result, I was running in incomplete EXPLAIN ANALYZE diagrams.

However, now I removed these wrappers, and things seem to be fine. Not sure what changed, but I've removed them.

If A embeds Base, *A implements any interface that *Base implements (and A does not) https://play.golang.org/p/l8THFK7DW-y

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordanlewis please give this at least a high level review.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/rowexec/aggregator.go, line 157 at r3 (raw file):

Previously, RaduBerinde wrote…

If A embeds Base, *A implements any interface that *Base implements (and A does not) https://play.golang.org/p/l8THFK7DW-y

Makes sense, thanks!

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 29 files at r3, 64 of 66 files at r6, 41 of 41 files at r7, 29 of 30 files at r8, 40 of 40 files at r9, 17 of 17 files at r10.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/colexecop/operator.go, line 347 at r8 (raw file):

}

// VectorizedStatsCollector is the common interface implemented by collectors.

nit: it is not clear at least to me what "Collectors" are from this context, a better comment would help. Also perhaps some commentary on GetStats - is it the stats just for the operator, or for its children, etc?


pkg/sql/colflow/stats.go, line 247 at r10 (raw file):

// specified. See comment on statsInvariantChecker for the kind of invariants
// checked.
func maybeAddStatsInvariantChecker(op *colexecargs.OpWithMetaInfo) {

Nice, I like this idea

This commit is mostly a mechanical change which replaces `getStats`
functions with using the stats collectors directly for the root
materializer and the hash routers. The outbox still expects a callback
because of the need to append flow-level stats on the last outbox.

Additionally, the drain helper of the materializer has been updated so
that the stats are retrieved before the metadata sources are drained.
This will be an invariant enforced by the checkers in the follow-up
commits.

Release note: None
This commit introduces a utility struct `OpWithMetaInfo` that has a root
operator as well as the metadata sources and closers present in the tree
handling of which (draining and closing, respectively) hasn't been
claimed yet. The struct will be the same (and will replace)
`opDAGWithMetaSources` and `SynchronizerInput` by the follow-up commit.

The motivation behind this change is that we want to attribute stats
collectors, metadata sources, and - why not - closers to the tree of
operators in which they are present.

Release note: None
Previously, if we had wrapped row-execution processors in the vectorized
flow, during the stats collection two `ComponentStats` objects would get
emitted for each wrapped processor - one by the processor itself and
another by the vectorized stats collector that is wrapping the
columnarizer. Both stats objects would have the same ComponentID, so
they would get Unioned. This commit cleans up the situation by making
the columnarizer "hijack" the `ExecStatsForTrace` function from the
wrapped processor.

This required a bit of plumbing, namely, we need to:
- track the "root" columnarizer separately in `NewColOperator` call so
that later the vectorized stats collectors could augment the
ComponentStats object created by the wrapped processor with additional
information
- provide an interface for the columnarizer to hijack the stats
function.

Release note: None
This commit plumbs the stats collectors into `OpWithMetaInfo` object
which allows us to get rid of separate `opDAGWithMetaSources` and
`SynchronizerInput` structs as well as more clealy define the ownership
of the stats collectors. Now all components can drain their input trees
are also responsible for collecting the stats, if applicable, before
draining. This invariant will be enforced by the follow-up commit.

Notable changes are to the synchronizers - now all of them (ordered,
serial unordered, parallel unordered) will be collecting stats. This
required creating a tracing span on the first call to `Next` so that we
had a way to propagate stats. That a-bit-hacky logic will be removed
once the `Operator` interface is updated to take in the context object
in `Init`.

Release note: None
This commit introduces a dummy vectorized stats collector that is also
a metadata source that simply checks that `GetStats` has been called
before `DrainMeta` and returns an error as a metadata if not. This stats
invariants checker is now planned whenever `util.CrdbTestBuild` is true
(which is when `TAGS=crdb_test` was specified).

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/colexecop/operator.go, line 347 at r8 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: it is not clear at least to me what "Collectors" are from this context, a better comment would help. Also perhaps some commentary on GetStats - is it the stats just for the operator, or for its children, etc?

Done.


pkg/sql/colflow/stats.go, line 247 at r10 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Nice, I like this idea

:)

@craig
Copy link
Contributor

craig bot commented Apr 22, 2021

Build succeeded:

@craig craig bot merged commit a2b80cb into cockroachdb:master Apr 22, 2021
@yuzefovich yuzefovich deleted the stats-refactor branch April 22, 2021 05:45
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Nov 22, 2022
This commit simplifies the way we track all of the `colexecop.Closer`s
we need to close. Previously, we would track them using `OpWithMetaInfo`
and then many operators would be responsible for closing the components
in their input tree. This commit makes it so that we close them during
the flow cleanup. This is ok because we know that all concurrent
goroutines will have exited by the time `Cleanup` is called since we do
so only after `Flow.Wait` returns (which blocks).

The decision to put closers into `OpWithMetaInfo` was made in cockroachdb#62221
with justification "why not" that I provided. Now I have the answer to
why we should not do it. Unlike the stats collection and metadata
draining (which - at least currently - must happen in the "owner"
goroutines), for closers we have a convenient place to close them at the
flow level. The contract is such that the closure must occur eventually,
but it doesn't matter much in which goroutine it's done (as long as there
is no race) and whether it is a bit delayed. (The only downside I see is
that the tracing spans are finished with a delay in comparison to when the
relevant operations are actually done, but this has already been pretty
bad, and this commit makes things only slightly worse. This "delayed"
finish shows up as "over-extended" span when viewing traces via jaeger.)

As a result of this refactor, the assertion that all closers are closed
seems redundant - we'd effectively be asserting only that a single
method is called, thus the assertion is removed. This commit also
allowed to remove some of the complexity around `ExplainVec`
implementation (we no longer need to tie the cleanup to closing of the
corresponding planNode).

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Nov 22, 2022
This commit simplifies the way we track all of the `colexecop.Closer`s
we need to close. Previously, we would track them using `OpWithMetaInfo`
and then many operators would be responsible for closing the components
in their input tree. This commit makes it so that we close them during
the flow cleanup. This is ok because we know that all concurrent
goroutines will have exited by the time `Cleanup` is called since we do
so only after `Flow.Wait` returns (which blocks).

The decision to put closers into `OpWithMetaInfo` was made in cockroachdb#62221
with justification "why not" that I provided. Now I have the answer to
why we should not do it. Unlike the stats collection and metadata
draining (which - at least currently - must happen in the "owner"
goroutines), for closers we have a convenient place to close them at the
flow level. The contract is such that the closure must occur eventually,
but it doesn't matter much in which goroutine it's done (as long as there
is no race) and whether it is a bit delayed. (The only downside I see is
that the tracing spans are finished with a delay in comparison to when the
relevant operations are actually done, but this has already been pretty
bad, and this commit makes things only slightly worse. This "delayed"
finish shows up as "over-extended" span when viewing traces via jaeger.)

As a result of this refactor, the assertion that all closers are closed
seems redundant - we'd effectively be asserting only that a single
method is called, thus the assertion is removed. This commit also
allowed to remove some of the complexity around `ExplainVec`
implementation (we no longer need to tie the cleanup to closing of the
corresponding planNode).

Release note: None
craig bot pushed a commit that referenced this pull request Dec 8, 2022
91971: colflow: track closers by the vectorized flow r=yuzefovich a=yuzefovich

This commit simplifies the way we track all of the `colexecop.Closer`s
we need to close. Previously, we would track them using `OpWithMetaInfo`
and then many operators would be responsible for closing the components
in their input tree. This commit makes it so that we close them during
the flow cleanup. This is ok because we know that all concurrent
goroutines will have exited by the time `Cleanup` is called since we do
so only after `Flow.Wait` returns (which blocks).

The decision to put closers into `OpWithMetaInfo` was made in #62221
with justification "why not" that I provided. Now I have the answer to
why we should not do it. Unlike the stats collection and metadata
draining (which - at least currently - must happen in the "owner"
goroutines), for closers we have a convenient place to close them at the
flow level. The contract is such that the closure must occur eventually,
but it doesn't matter much in which goroutine it's done (as long as there
is no race) and whether it is a bit delayed. (The only downside I see is
that the tracing spans are finished with a delay in comparison to when the
relevant operations are actually done, but this has already been pretty
bad, and this commit makes things only slightly worse. This "delayed"
finish shows up as "over-extended" span when viewing traces via jaeger.)

As a result of this refactor, the assertion that all closers are closed
seems redundant - we'd effectively be asserting only that a single
method is called, thus the assertion is removed. This commit also
allowed to remove some of the complexity around `ExplainVec`
implementation (we no longer need to tie the cleanup to closing of the
corresponding planNode).

Epic: None

Release note: None

93077: opt: fix optbuilder partial index comment r=mgartner a=mgartner

Epic: None

Release note: None

93175: kv: use correct sequence number when scanning for conflicting intents r=nvanbenschoten a=arulajmani

A read only request scans the lock table before it can proceed with dropping latches. It can only evaluate if no conflicting intents are found. While doing so, it also determines if the MVCC scan evaluation needs to consult intent history (by using the interleaved iterator).

The MVCC scan evaluation needs to consult intent history if we discover an intent by the transaction performing the read operation at a higher sequence number or a higher timestamp. The correct sequence numbers to compare here are those on the `BatchRequest`, and not on the transaction. Before this patch, we were using the sequence number on the transaction, which could lead us to wrongly conclude that the use of an intent interleaving iterator wasn't required.

Specifically, if the batch of the following construction was retried on the server:

```
b.Scan(a, e)
b.Put(b, "value")
```

The scan would end up (erroneously) reading "value" at key b.

As part of this patch, I've also renamed `ScanConflictingIntents` to `ScanConflictingIntentsForDroppingLatchesEarly` -- the function isn't as generalized as the old name would suggest.

Closes #92217
Closes #92189

Release note: None

93225: multitenant: check query rows result in admin function tests r=knz a=ecwall

Informs CRDB-16746

Some admin functions return error rows in the response instead of an error itself. Update related tests to check these rows.

Release note: None

93263: docs issue generation: Update TC configuration to use new parameters r=nickvigilante a=nickvigilante

Part of #72910

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Nick Vigilante <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants