Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(runtime-params-estimator): disable rocksdb thread in it #3091

Merged
merged 20 commits into from
Aug 11, 2020

Conversation

ailisp
Copy link
Member

@ailisp ailisp commented Aug 6, 2020

Thanks to an external contributor, we're able to use all export apis to disable rocksdb threads in runtime param estimator

Test Plan

Observe only one thread during run runtime params estimator
image

@gitpod-io
Copy link

gitpod-io bot commented Aug 6, 2020

@ailisp ailisp marked this pull request as ready for review August 6, 2020 19:49
@ailisp
Copy link
Member Author

ailisp commented Aug 6, 2020

Seems parameter estimator get stuck in the middle after turn off all threads, probably due to there's limit of how big the state it can have. will see if i can increase the bar to proceed.

Update: have some little clue with debug print and lldb, it is not blocking by rocksdb writing limits (have increased all of them to unlimited or very high). Looking at debug print, measure_actions never returns. Looking at lldb after it stucks, rockdb wait for a ConditionVar in the rocksdb destructor (but since there's no other thread there's no way to make progress), so avoid drop RuntimeTestbed to avoid trigger rocksdb destructor might help.
Then I have a look at rocksdb source code of DBImpl::CloseHelper, there are a few cond vars there, need add more print there to figure out what is blocked. I guess there's inconsistency in rocksdb options, although turn off all threads, there's other option make it think it need wait for some background work done before exit.

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff7f96886a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff7fa2756e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x00000001036f355f runtime-params-estimator`rocksdb::port::CondVar::Wait() + 15
    frame #3: 0x00000001036aa677 runtime-params-estimator`rocksdb::InstrumentedCondVar::Wait() + 167
    frame #4: 0x0000000103561e44 runtime-params-estimator`rocksdb::DBImpl::CloseHelper() + 404
    frame #5: 0x00000001035629e9 runtime-params-estimator`rocksdb::DBImpl::~DBImpl() + 57
    frame #6: 0x00000001035633fe runtime-params-estimator`rocksdb::DBImpl::~DBImpl() + 14
    frame #7: 0x000000010352f767 runtime-params-estimator`rocksdb_close + 23
    frame #8: 0x000000010318bce8 runtime-params-estimator`core::ptr::drop_in_place::hb76b33f569232c9f + 520
    frame #9: 0x000000010318bdcf runtime-params-estimator`core::ptr::drop_in_place::hd364f2154be9f2ec + 15
    frame #10: 0x000000010318c130 runtime-params-estimator`alloc::sync::Arc$LT$T$GT$::drop_slow::h11db2b232d66dcbc + 32
    frame #11: 0x000000010318c2bf runtime-params-estimator`alloc::sync::Arc$LT$T$GT$::drop_slow::h3ce12527033b0af5 + 31
    frame #12: 0x0000000103346286 runtime-params-estimator`core::ptr::drop_in_place::hf435a7491c6c6bba + 54

Update: work around by set thread to 4 at drop rocksdb, i think it's okay to do that since metric recording is already finished at that time

@codecov
Copy link

codecov bot commented Aug 7, 2020

Codecov Report

Merging #3091 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3091   +/-   ##
=======================================
  Coverage   87.46%   87.46%           
=======================================
  Files         212      212           
  Lines       41452    41452           
=======================================
  Hits        36256    36256           
  Misses       5196     5196           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d20343c...30666e5. Read the comment docs.

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why couldn't we only use it for parameter estimator, i.e, putting this change behind some feature flag?

runtime/runtime-params-estimator/src/cases.rs Outdated Show resolved Hide resolved
@evgenykuzyakov
Copy link
Collaborator

Why couldn't we only use it for parameter estimator, i.e, putting this change behind some feature flag?

Actually, I agree with @bowenwang1996 on this. But it also can be useful not only for param estimator, so some generic feature name will be good

@ailisp ailisp requested a review from bowenwang1996 August 10, 2020 18:24
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only use the our fork of rocksdb when the single thread feature is enabled.

@@ -10,7 +10,7 @@ log = "0.4"
failure = "0.1"
failure_derive = "0.1"
lazy_static = "1.4"
rocksdb = "0.14"
rocksdb = { git = "https://github.com/nearprotocol/rust-rocksdb", branch="disable-thread" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It use facebook/rocksdb#7191 (merged, but rust-rocksdb haven't bump version) and rust-rocksdb/rust-rocksdb#448 (approved, but need to wait next time rust-rocksdb bump version). Our fork is equal to rust-rocksdb/rust-rocksdb#448

opts.set_max_total_wal_size(1 * 1024 * 1024 * 1024);
#[cfg(not(feature = "single_thread_rocksdb"))]
{
println!("Use background threads in rocksdb");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this log message or convert it into a proper tracing::info/debug/trace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we didn't log anything in db config module in neard, so I'll remove it to be consistent

env.set_high_priority_background_threads(0);
env.set_low_priority_background_threads(0);
env.set_background_threads(0);
println!("Disabled all background threads in rocksdb");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Copy link
Member Author

@ailisp ailisp Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this one to as print. This is only showup in param estimator, which only used print, one log line between the raw prints looks not good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants