Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: audit all processors to make their closure bullet-proof #91969

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 16, 2022

This commit replaces all usages of ProcessorBaseNoHelper.Ctx field
with a call to the newly-introduced Ctx() method which returns
a background context if the processor hasn't been started. This change
makes it so that all processors now respect the contract of
colexecop.Closer interface which says that Close must be safe to
call even if Init hasn't been performed (in the context of processors
this means that Columnarizer.Init wasn't called meaning that
Processor.Start wasn't either).

Initially, I attempted to fix this in #91446 by putting the protection
into the columnarizer, but that led to broken assumptions since we
wouldn't close all closers that we expected to (in particular, the
materializer that is the input to the wrapped row-by-row processor
wouldn't be closed). This commit takes a different approach and should
fix the issue for good without introducing any flakiness.

As a result, this commit fixes a rarely hit issue when the aggregator
and the zigzag joiner attempt to log when they are closed if they
haven't been started (that we see occasionally from sentry). The issue
is quite rare though, so no release note seems appropriate.

Fixes: #84902.
Fixes: #91845.

Release note: None

@yuzefovich yuzefovich requested a review from cucaroach November 16, 2022 02:53
@yuzefovich yuzefovich requested review from a team as code owners November 16, 2022 02:53
@yuzefovich yuzefovich requested review from msbutler and ajwerner and removed request for a team November 16, 2022 02:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)


pkg/sql/execinfra/processorsbase.go line 361 at r1 (raw file):

	// NOTE: if StartInternal() hasn't been called, this will be nil, so
	// consider using EnsureCtx() instead.
	Ctx  context.Context

could we unexport this to enforce calling EnsureCtx?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)


pkg/sql/execinfra/processorsbase.go line 361 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

could we unexport this to enforce calling EnsureCtx?

We could, and it would fix this nil pointer error for good, but for some reason I'm slightly hesitant about making such a change - like it "feels" wrong to me, not sure why :) Probably because it would "pollute" the code a bit when accessing the context when we know that it is non-nil - which is pretty much in all places except for "closing", and the problem in the "closing" scenario is only present when the row-by-row processors are wrapped into the vectorized flows (due to different interfaces used in both engines).

I don't have a strong opinion though, so if you think it's worth it, I'd be fine with making such a change.

@rytaft
Copy link
Collaborator

rytaft commented Nov 16, 2022

pkg/sql/execinfra/processorsbase.go line 361 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We could, and it would fix this nil pointer error for good, but for some reason I'm slightly hesitant about making such a change - like it "feels" wrong to me, not sure why :) Probably because it would "pollute" the code a bit when accessing the context when we know that it is non-nil - which is pretty much in all places except for "closing", and the problem in the "closing" scenario is only present when the row-by-row processors are wrapped into the vectorized flows (due to different interfaces used in both engines).

I don't have a strong opinion though, so if you think it's worth it, I'd be fine with making such a change.

I also don't feel too strongly, but if you make this change it might feel less polluting if you change the function name from EnsureCtx() to just Ctx(), since it will be the only way to access the ctx. But I defer to you to decide if you think that makes sense.

@yuzefovich yuzefovich force-pushed the fix-closers branch 2 times, most recently from 3360813 to 3877149 Compare November 17, 2022 05:18
Copy link
Contributor

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still a long term goal to not store the context at all? Just curious, is the reason we haven't done that because of the code churn required or something else?

Reviewed 22 of 22 files at r1, 4 of 28 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think that we want to remove the context from the processors or from the operators (there is more context in this comment). Where I think we ought to remove the context from is eval.Context because that object doesn't have a clear lifetime and is being passed around many different layers - processors and operators don't have such issues.

Could someone take another look? I replaced all Ctx accesses with Ctx() calls, most changes were mechanical, the only interesting ones where in columnarizer.go and processors_test.go.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)


pkg/sql/execinfra/processorsbase.go line 361 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I also don't feel too strongly, but if you make this change it might feel less polluting if you change the function name from EnsureCtx() to just Ctx(), since it will be the only way to access the ctx. But I defer to you to decide if you think that makes sense.

Ok, I decided to implement this suggestion.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 22 files at r1, 28 of 28 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

@yuzefovich
Copy link
Member Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 22, 2022

Build failed (retrying...):

@yuzefovich
Copy link
Member Author

Needs a rebase.

bors r-

@craig
Copy link
Contributor

craig bot commented Nov 22, 2022

Canceled.

This commit replaces all usages of `ProcessorBaseNoHelper.Ctx` field
with a call to the newly-introduced `Ctx()` method which returns
a background context if the processor hasn't been started. This change
makes it so that all processors now respect the contract of
`colexecop.Closer` interface which says that `Close` must be safe to
call even if `Init` hasn't been performed (in the context of processors
this means that `Columnarizer.Init` wasn't called meaning that
`Processor.Start` wasn't either).

Initially, I attempted to fix this in cockroachdb#91446 by putting the protection
into the columnarizer, but that led to broken assumptions since we
wouldn't close all closers that we expected to (in particular, the
materializer that is the input to the wrapped row-by-row processor
wouldn't be closed). This commit takes a different approach and should
fix the issue for good without introducing any flakiness.

As a result, this commit fixes a rarely hit issue when the aggregator
and the zigzag joiner attempt to log when they are closed if they
haven't been started (that we see occasionally from sentry). The issue
is quite rare though, so no release note seems appropriate.

Release note: None
@yuzefovich
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 22, 2022

Build succeeded:

@craig craig bot merged commit 7ffaece into cockroachdb:master Nov 22, 2022
@blathers-crl
Copy link

blathers-crl bot commented Nov 22, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 19f3386 to blathers/backport-release-22.1-91969: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from 19f3386 to blathers/backport-release-22.2-91969: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants