Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](merge-on-write) Fix MergeIndexDeleteBitmapCalculator::calculate_one() coredump #44284

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Nov 19, 2024

What problem does this PR solve?

Problem Summary:

MergeIndexDeleteBitmapCalculatorContext::get_current_key() may return non-OK status when encounter memory allocation failure, which makes MergeIndexDeleteBitmapCalculatorContext::Comparator::operator() returns incorrect result and break some assumptions during the process of multiway merging, which leads to coredump.

 1# 0x00007F507D0B3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# 0x000055E3A805DD7D in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 6# 0x000055E3A805047A in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 8# google::LogMessage::Flush() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
10# doris::MergeIndexDeleteBitmapCalculatorContext::seek_at_or_after(doris::Slice const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
11# doris::MergeIndexDeleteBitmapCalculator::calculate_one(doris::RowLocation&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/delete_bitmap_calculator.cpp:197
12# doris::MergeIndexDeleteBitmapCalculator::calculate_all(std::shared_ptr) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
13# doris::Tablet::calc_delete_bitmap_between_segments(std::shared_ptr, std::vector, std::allocator > > const&, std::shared_ptr) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:4075
14# doris::Tablet::update_delete_bitmap_without_lock(std::shared_ptr const&, std::vector, std::allocator > > const*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:3468
15# doris::Tablet::revise_tablet_meta(std::vector, std::allocator > > const&, std::vector, std::allocator > > const&, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:415
16# doris::EngineCloneTask::_finish_incremental_clone(doris::Tablet*, std::shared_ptr const&, long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:795
17# doris::EngineCloneTask::_finish_clone(doris::Tablet*, std::__cxx11::basic_string, std::allocator > const&, long, bool) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
18# doris::EngineCloneTask::_do_clone() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
19# doris::EngineCloneTask::execute() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:159
20# doris::clone_callback(doris::StorageEngine&, doris::TMasterInfo const&, doris::TAgentTaskRequest const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
21# std::_Function_handler(doris::TAgentTaskRequest const&) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
22# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:551
23# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
24# start_thread at ./nptl/pthread_create.c:442
25# 0x00007F507D197850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

dataroaring
dataroaring previously approved these changes Nov 19, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 19, 2024

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 19, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 19, 2024
@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 19, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.02% (9898/26035)
Line Coverage: 29.20% (82788/283541)
Region Coverage: 28.33% (42525/150085)
Branch Coverage: 24.89% (21556/86594)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5dd10afe3e3b3c41dd7e696150670637cd50fa16_5dd10afe3e3b3c41dd7e696150670637cd50fa16/report/index.html

@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 20, 2024

run cloud_p0

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit 1601d75 into apache:master Nov 20, 2024
33 of 36 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 20, 2024
…e_one()` coredump (#44284)

### What problem does this PR solve?

Problem Summary:

`MergeIndexDeleteBitmapCalculatorContext::get_current_key()` may return
non-OK status when encounter memory allocation failure, which makes
`MergeIndexDeleteBitmapCalculatorContext::Comparator::operator()`
returns incorrect result and break some assumptions during the process
of multiway merging, which leads to coredump.

```
 1# 0x00007F507D0B3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# 0x000055E3A805DD7D in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 6# 0x000055E3A805047A in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 8# google::LogMessage::Flush() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
10# doris::MergeIndexDeleteBitmapCalculatorContext::seek_at_or_after(doris::Slice const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
11# doris::MergeIndexDeleteBitmapCalculator::calculate_one(doris::RowLocation&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/delete_bitmap_calculator.cpp:197
12# doris::MergeIndexDeleteBitmapCalculator::calculate_all(std::shared_ptr) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
13# doris::Tablet::calc_delete_bitmap_between_segments(std::shared_ptr, std::vector, std::allocator > > const&, std::shared_ptr) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:4075
14# doris::Tablet::update_delete_bitmap_without_lock(std::shared_ptr const&, std::vector, std::allocator > > const*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:3468
15# doris::Tablet::revise_tablet_meta(std::vector, std::allocator > > const&, std::vector, std::allocator > > const&, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:415
16# doris::EngineCloneTask::_finish_incremental_clone(doris::Tablet*, std::shared_ptr const&, long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:795
17# doris::EngineCloneTask::_finish_clone(doris::Tablet*, std::__cxx11::basic_string, std::allocator > const&, long, bool) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
18# doris::EngineCloneTask::_do_clone() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
19# doris::EngineCloneTask::execute() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:159
20# doris::clone_callback(doris::StorageEngine&, doris::TMasterInfo const&, doris::TAgentTaskRequest const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
21# std::_Function_handler(doris::TAgentTaskRequest const&) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
22# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:551
23# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
24# start_thread at ./nptl/pthread_create.c:442
25# 0x00007F507D197850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```
github-actions bot pushed a commit that referenced this pull request Nov 20, 2024
…e_one()` coredump (#44284)

### What problem does this PR solve?

Problem Summary:

`MergeIndexDeleteBitmapCalculatorContext::get_current_key()` may return
non-OK status when encounter memory allocation failure, which makes
`MergeIndexDeleteBitmapCalculatorContext::Comparator::operator()`
returns incorrect result and break some assumptions during the process
of multiway merging, which leads to coredump.

```
 1# 0x00007F507D0B3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# 0x000055E3A805DD7D in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 6# 0x000055E3A805047A in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 8# google::LogMessage::Flush() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
10# doris::MergeIndexDeleteBitmapCalculatorContext::seek_at_or_after(doris::Slice const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
11# doris::MergeIndexDeleteBitmapCalculator::calculate_one(doris::RowLocation&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/delete_bitmap_calculator.cpp:197
12# doris::MergeIndexDeleteBitmapCalculator::calculate_all(std::shared_ptr) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
13# doris::Tablet::calc_delete_bitmap_between_segments(std::shared_ptr, std::vector, std::allocator > > const&, std::shared_ptr) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:4075
14# doris::Tablet::update_delete_bitmap_without_lock(std::shared_ptr const&, std::vector, std::allocator > > const*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:3468
15# doris::Tablet::revise_tablet_meta(std::vector, std::allocator > > const&, std::vector, std::allocator > > const&, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:415
16# doris::EngineCloneTask::_finish_incremental_clone(doris::Tablet*, std::shared_ptr const&, long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:795
17# doris::EngineCloneTask::_finish_clone(doris::Tablet*, std::__cxx11::basic_string, std::allocator > const&, long, bool) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
18# doris::EngineCloneTask::_do_clone() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
19# doris::EngineCloneTask::execute() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:159
20# doris::clone_callback(doris::StorageEngine&, doris::TMasterInfo const&, doris::TAgentTaskRequest const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
21# std::_Function_handler(doris::TAgentTaskRequest const&) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
22# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:551
23# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
24# start_thread at ./nptl/pthread_create.c:442
25# 0x00007F507D197850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```
hello-stephen pushed a commit that referenced this pull request Nov 20, 2024
…or::calculate_one()` coredump #44284 (#44330)

Cherry-picked from #44284

Co-authored-by: bobhan1 <[email protected]>
dataroaring pushed a commit that referenced this pull request Nov 26, 2024
…or::calculate_one()` coredump #44284 (#44328)

Cherry-picked from #44284

Co-authored-by: bobhan1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants