Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: optimize the overhead of late materialization #8730

Closed
wants to merge 3 commits into from

Conversation

Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Jan 24, 2024

What problem does this PR solve?

Issue Number: close #8641

Problem Summary:

What is changed and how it works?

Previously, each time we call LateMaterializationBlockInputStream::read(), we only call filter_column_stream->read() once, which means if the filter can filter out a lot of rows, then the return block only contains few rows.

Now, each time we call LateMaterializationBlockInputStream::read(), we will call filter_column_stream->read() so much as possible until the rows of return block is going to extend max_block_rows.

  1. The call of rest_column_stream->readWithFilter can be greatly reduce, and better batch.
  2. The size of ecah return block will be much larger, and then the upper operators can better batch.

Other changes:

  1. remove RowKeyOrderedBlockInputStream, and use ConcatSkippableBlockInputStream instead.
  2. remove the virtual function skipNextBlock and related unit tests

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Copy link
Contributor

ti-chi-bot bot commented Jan 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from lloyd-pottiger. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-triage-completed labels Jan 24, 2024
@pingcap pingcap deleted a comment from sre-bot Jan 24, 2024
@Lloyd-Pottiger
Copy link
Contributor Author

/run-build-release

@sre-bot
Copy link
Collaborator

sre-bot commented Jan 25, 2024

@Lloyd-Pottiger Lloyd-Pottiger force-pushed the optimize-lm branch 2 times, most recently from f04fcf0 to d9aeb5b Compare February 1, 2024 03:35
@Lloyd-Pottiger
Copy link
Contributor Author

/run-build-release

@sre-bot
Copy link
Collaborator

sre-bot commented Feb 1, 2024

@Lloyd-Pottiger
Copy link
Contributor Author

/run-build-release

@sre-bot
Copy link
Collaborator

sre-bot commented Feb 2, 2024

Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
@Lloyd-Pottiger
Copy link
Contributor Author

/run-all-tests

@Lloyd-Pottiger
Copy link
Contributor Author

/run-build-release

@sre-bot
Copy link
Collaborator

sre-bot commented Feb 4, 2024

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2024
Copy link
Contributor

ti-chi-bot bot commented Feb 13, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


auto rest_column_stream
= std::dynamic_pointer_cast<ConcatSkippableBlockInputStream<false>>(rest_column_stable_stream);
rest_column_stream->appendChild(rest_column_delta_stream, segment_snap->delta->getRows());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RUNTIME_CHECK to check if rest_column_stream is null for potential cast failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CH-benCHmark query performance regression when enable late materialization
3 participants