Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.2: implement generic query plans #128100

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Aug 1, 2024

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release


Release justification: New feature gated behind session setting.

@mgartner mgartner requested review from a team and yuzefovich and removed request for a team August 1, 2024 17:05
@mgartner mgartner requested review from a team as code owners August 1, 2024 17:05
@mgartner mgartner requested review from dhartunian and removed request for a team August 1, 2024 17:05
Copy link

blathers-crl bot commented Aug 1, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Aug 1, 2024
Copy link

blathers-crl bot commented Aug 1, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@mgartner mgartner requested a review from rytaft August 1, 2024 17:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner mgartner force-pushed the backport23.2-125708-125795-125808-125871-126528-126863-125814-127214-127897 branch from 4d67acd to 0257269 Compare August 2, 2024 18:11
@dhartunian dhartunian removed their request for review August 7, 2024 20:26
@rafiss rafiss removed the request for review from a team August 21, 2024 20:23
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r1, 7 of 7 files at r2, 5 of 5 files at r3, 1 of 1 files at r4, 6 of 6 files at r5, 1 of 1 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 22 of 22 files at r9, 7 of 7 files at r10, 1 of 1 files at r11, 5 of 5 files at r12, 18 of 18 files at r13, 3 of 3 files at r14, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @yuzefovich)

@mgartner
Copy link
Collaborator Author

I've added one of the commits from #129050.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r1, 7 of 7 files at r2, 5 of 5 files at r3, 1 of 1 files at r4, 6 of 6 files at r5, 1 of 1 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 22 of 22 files at r9, 7 of 7 files at r10, 1 of 1 files at r11, 5 of 5 files at r12, 18 of 18 files at r13, 3 of 3 files at r14, 10 of 10 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)

@mgartner mgartner force-pushed the backport23.2-125708-125795-125808-125871-126528-126863-125814-127214-127897 branch 2 times, most recently from aebd135 to 615c1d5 Compare September 5, 2024 14:45
The `plan_cache_mode` session setting has been added. It allows for
three options: `force_custom_plan`, `force_generic_plan`, and `auto`.
The session currently has no effect. In future commits it will control
how the system chooses between using a custom or generic query plan.

Release note: None
The `ConvertSelectWithPlaceholdersToJoin` exploration rule has been
added which enables optimizations to apply to query plans where
placeholder values are not known. See the rule's comments for more
details. The rule does not currently apply to any query plans because it
only matches Select expressions with filters that have placeholders, and
we currently replace all placeholders with constant values before
exploration. It will be used in the future when we enable exploration of
generic query plans.

Release note: None
…ldersToJoin

Join reordering and merge join exploration is now disabled for joins
created by the `ConvertSelectWithPlaceholdersToJoin` rule. The rule's
main goal is to assist in generating lookup joins. This changes reduces
unnecessary exploration.

Release note: None
This commit adds some stats tests with queries containing placeholders.

Release note: None
Stats estimates for filters of the form `col = $1` have been improved.
While histograms cannot be filtered without knowing the placeholder
value, we do know that `col` is limited to a one distinct value. This
improves the estimate over the simple 1/3 "unknown selectivity" that was
applied for these filters before this commit.

This changed required a modification to `TestCopyAndReplace` because it
made it such that a merge-join was no longer selected when optimizing
the query with placeholders. Now the test checks that `CopyAndReplace`
works correctly when copying a fully optimized memo with a parameterized
lookup-join. I've confirmed that the test still catches the bug
fixed in the original PR that added the test, cockroachdb#35020.

Release note: None
Tests for generic query plans with partial indexes have been added. No
code changes are needed in order for a query with a filter like
`col = $1` to utilize an index on `(col) WHERE col IS NOT NULL`.

Release note: None
The logic for picking a prepared memo has been moved to a function,
`fetchPreparedMemoLegacy`. This makes it easy in the following commit
to fallback to this logic based on the `plan_cache_mode` setting.

Release note: None
The `force_generic_plan` option for the session setting
`plan_cache_mode` has been implemented. Under this setting, prepared
statements will reuse a fully optimized, generic query plan, if a
non-stale one exists. If such a plan does not exist or is stale, a new
generic query plan will be built.

The term "generic query plan" is used in user-facing contexts to
represent any plan that is fully optimized without knowledge of specific
placeholder values nor folding stable expressions. Therefore, all of the
following are considered "generic":

  1. A plan for a query with no placeholders nor fold-able stable
     expressions.
  2. A plan utilizing the placeholder fast-path.
  3. A plan fully optimized without replacing placeholders with values.

However, only (3) is currently stored in the
`PreparedStatement.GenericMemo` field. This was a deliberate decision
that allows falling back to the original prepared statement logic when
switching between `force_custom_plan` and `force_generic_plan`. This
will make backports with this commit less risky. My goal, in a future
commit, is to put all generic query plans in the `GenericMemo` field to
reduce the confusion between the user-facing definition of "generic" and
the internal logic.

This commit does not change the behavior of the query cache. Fully
optimized plans for statements without placeholders or placeholder
fast-path plans are still cached in the query cache. Generic query plans
optimized in the presence of placeholders are not cached.

Release note (sql change): The session setting
`plan_cache_mode=force_generic_plan` can now be used to force prepared
statements to use query plans that are optimized once and reused in
future executions without re-optimization, as long as it does not become
stale due to schema changes or a collection of new table statistics. The
setting takes effect during `EXECUTE` commands. `EXPLAIN ANALYZE`
includes a "plan type" field. If a generic query plan is optimized for
the current execution, the "plan type" will be "generic, re-optimized".
If a generic query plan is reused for the current execution without
performing optimization, the "plan type" will be "generic, reused".
Otherwise, the "plan type" will be "custom".
The `plan_cache_mode=auto` session setting has been implemented which
instructs the database to pick generic query plans for prepared
statements when they have a cost less than or equal to custom plans.

The current strategy matches Postgres's strategy quite closely. Under
`plan_cache_mode=auto`, custom query plans will be generated for the
first five executions of a prepared statement. On the sixth execution, a
generic query plan will be generated. If the cost of the generic query
plan is less than the average cost of the custom plans (plus some
overhead for re-optimization), then the generic query plan will be used.

Release note (sql change): The session setting `plan_cache_mode=auto`
can now be used to instruct the system to automatically determine
whether to use "custom" or "generic" query plans for the execution of a
prepared statement. Custom query plans are optimized on every execution,
while generic plans are optimized once and reused on future executions
as-is. Generic query plans are beneficial in cases where query
optimization contributes significant overhead to the total cost of
executing a query.
Prior to this commit, the `no-stable-folds` directive only worked with
the "norm", "exprnorm", and "expropt" directives. It now also works with
the "opt" directive. This is required for testing generic query plans
where plans must be fully optimized without folding stable expressions.

Release note: None
The `ConvertSelectWithPlaceholdersToJoin` rule has been renamed to
`GenerateParameterizedJoin` and modified to also operate on stable
expressions. Stable expressions cannot be folded in generic query plans
because their value can only be determined at execution time. By
transforming a Select with a stable filter expression into a Join with a
Values input, the optimizer can potentially plan a lookup join with
similar performance characteristics to a constrained scan that would be
planned if the stable expression could be folded.

Release note: None
@mgartner mgartner force-pushed the backport23.2-125708-125795-125808-125871-126528-126863-125814-127214-127897 branch from 615c1d5 to 183ad72 Compare September 5, 2024 16:27
The exploration rule `GenerateParameterizedJoin` is now disabled when
the `plan_cache_mode` session setting is set to `force_custom_plan`.
This prevents possible regressions in plans for this mode in the rare
case that a stable function is not folded during normalization (due to
an error encountered during folding). In most cases, the check for
placeholders and stable functions in `GenerateParameterizedJoin` is
sufficient for preventing the rule from firing when a generic query plan
is not being built—before optimizing a custom plan placeholders are
always replaced with constants and stable functions are usually folded
to constants.

Release note: None
@mgartner mgartner force-pushed the backport23.2-125708-125795-125808-125871-126528-126863-125814-127214-127897 branch from 183ad72 to 8e980f3 Compare September 5, 2024 16:39
@mgartner mgartner merged commit ee2bd8b into cockroachdb:release-23.2 Sep 5, 2024
5 of 6 checks passed
@mgartner mgartner deleted the backport23.2-125708-125795-125808-125871-126528-126863-125814-127214-127897 branch September 5, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants