Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support spilling to disk for bufferNode #63900

Merged
merged 1 commit into from
Apr 22, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Apr 20, 2021

This commit refactors several planNodes that need to buffer rows to
use a disk-backed row container instead of pure in-memory one. In order
to achieve that a couple of light wrappers are introduced on top of the
corresponding container and an iterator over it.

Still, one - probably important - case is not fixed properly: namely, if
a subquery is executed in AllRows or AllRowsNormalized mode, then we
first buffer the rows into the disk-backed container only to materialize
it later into a single tuple. Addressing this is left as a TODO.

Fixes: #62301.
Fixes: #62674.

Release note (sql change): CockroachDB now should be more stable when
executing queries with subqueries producing many rows (previously we
could OOM crash and now we will use the temporary disk storage).

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Apr 20, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the spill-buffer-node branch 2 times, most recently from 0369ee0 to 3e5081c Compare April 20, 2021 02:56
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Apr 20, 2021
@yuzefovich yuzefovich marked this pull request as ready for review April 20, 2021 03:38
@yuzefovich yuzefovich requested review from rytaft and a team April 20, 2021 03:38
@yuzefovich
Copy link
Member Author

Kicked off 20 runs of tpcdsvec (re #62301), and they finished successfully with this commit.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/apply_join.go, line 124 at r1 (raw file):

			for {
				var rrow tree.Datums
				if len(a.rightTypes) != 0 {

how will we know when the rows are exhausted if there are no columns?


pkg/sql/create_stats.go, line 522 at r1 (raw file):

		planCtx := dsp.NewPlanningCtx(ctx, evalCtx, nil /* planner */, txn, true /* distribute */)
		if err := dsp.planAndRunCreateStats(
			ctx, evalCtx, planCtx, txn, r.job, NewRowResultWriter(nil /* rowContainer */),

how does this change without a row container? Shouldn't we pass a rowContainerHelper?

This commit refactors several `planNode`s that need to buffer rows to
use a disk-backed row container instead of pure in-memory one. In order
to achieve that a couple of light wrappers are introduced on top of the
corresponding container and an iterator over it.

Still, one - probably important - case is not fixed properly: namely, if
a subquery is executed in AllRows or AllRowsNormalized mode, then we
first buffer the rows into the disk-backed container only to materialize
it later into a single tuple. Addressing this is left as a TODO.

Release note (sql change): CockroachDB now should be more stable when
executing queries with subqueries producing many rows (previously we
could OOM crash and now we will use the temporary disk storage).
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)


pkg/sql/apply_join.go, line 124 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

how will we know when the rows are exhausted if there are no columns?

Nice catch! In this scenario previously we would have entered an infinite loop. I added a test and fixed the problem (this also required updating the mem row container code slightly).


pkg/sql/create_stats.go, line 522 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

how does this change without a row container? Shouldn't we pass a rowContainerHelper?

sampleAggregator can only produce metadata objects, so we don't need an actual row container here. Added a comment.

@craig
Copy link
Contributor

craig bot commented Apr 21, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Apr 22, 2021

Build succeeded:

@craig craig bot merged commit bb86f86 into cockroachdb:master Apr 22, 2021
@yuzefovich yuzefovich deleted the spill-buffer-node branch April 22, 2021 01:15
mgartner added a commit to mgartner/cockroach that referenced this pull request May 20, 2021
This commit fixes a bug introduced in cockroachdb#63900 which causes execution of
semi and anti apply-joins to panic. For a each row on the left side of
the apply-join, rows are fetched for the right side of the join and
added to an iterator. For semi and anti apply-joins, the right rows are
only consumed until a match is found. These right rows were not being
cleared for the next successive left row. This caused a panic when the
apply-join predicate would be applied on a `nil` left row during the
next call to `applyJoinNode.Next`; the next left row is only fetched if
the right row iterator has been cleared.

Fixes cockroachdb#65040

Release note (bug fix): A bug that caused panics while executing semi
and anti apply-joins has been fixed. This bug was present since version
21.2.0-alpha.
mgartner added a commit to mgartner/cockroach that referenced this pull request May 20, 2021
This commit fixes a bug introduced in cockroachdb#63900 which causes execution of
semi and anti apply-joins to panic. For a each row on the left side of
the apply-join, rows are fetched for the right side of the join and
added to an iterator. For semi and anti apply-joins, the right rows are
only consumed until a match is found. These right rows were not being
cleared for the next successive left row. This caused a panic when the
apply-join predicate would be applied on a `nil` left row during the
next call to `applyJoinNode.Next`; the next left row is only fetched if
the right row iterator has been cleared.

Fixes cockroachdb#65040

There is no release note because this bug is only present in
21.2.0-alpha, which has not yet been released.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request May 20, 2021
This commit fixes a bug introduced in cockroachdb#63900 which causes execution of
semi and anti apply-joins to panic. For a each row on the left side of
the apply-join, rows are fetched for the right side of the join and
added to an iterator. For semi and anti apply-joins, the right rows are
only consumed until a match is found. These right rows were not being
cleared for the next successive left row. This caused a panic when the
apply-join predicate would be applied on a `nil` left row during the
next call to `applyJoinNode.Next`; the next left row is only fetched if
the right row iterator has been cleared.

Fixes cockroachdb#65040

There is no release note because this bug is only present in
21.2.0-alpha, which has not yet been released.

Release note: None
craig bot pushed a commit that referenced this pull request May 20, 2021
65375: sql: add support for expression-based indexes with CREATE INDEX r=mgartner a=mgartner

#### sql: add experimental_enable_expression_based_indexes session setting

This commit adds a session setting that will eventually enable users to
create expression-based indexes. The setting will be removed when
expression-based indexes are fully supported.

Release note: None

#### sql: add support for expression-based indexes with CREATE INDEX

This commit adds basic support for creating expression-based indexes
with a `CREATE INDEX` statement. An expression-based index is syntactic
sugar for an index on a virtual computed column. Creating an
expression-based index will automatically created a hidden virtual
column with the given expression. If a virtual column with the given
expression already exists, that column is used rather than creating a
new one.

Future work includes supporting expression-based indexes in
`CREATE TABLE` and making error messages related to these indexes more
user-friendly.

There is no release note because expression-based indexes are not
enabled by default. They require the
`experimental_enable_expression_based_indexes` session setting until
they are fully supported.

Release note: None


65524: sql: clear right rows correctly during apply-join r=mgartner a=mgartner

This commit fixes a bug introduced in #63900 which causes execution of
semi and anti apply-joins to panic. For a each row on the left side of
the apply-join, rows are fetched for the right side of the join and
added to an iterator. For semi and anti apply-joins, the right rows are
only consumed until a match is found. These right rows were not being
cleared for the next successive left row. This caused a panic when the
apply-join predicate would be applied on a `nil` left row during the
next call to `applyJoinNode.Next`; the next left row is only fetched if
the right row iterator has been cleared.

Fixes #65040

There is no release note because this bug is only present in
21.2.0-alpha, which has not yet been released.

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: implement disk spilling for bufferNode roachtest: tpcdsvec failed
3 participants