-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background schedule pool fixed #1722
Background schedule pool fixed #1722
Conversation
are you guys still interested into this pull request ? Just asking because now conflicts with master and as time you don't care about it I will close it |
I'm interested in this modification. For example, if you run quite loaded server under ASan, it will complain about "thread limit exceeded" after few days of run. And your modification will fix it. But it's rather hard to implement it without breaking something. And @ztlpn had a few complains about this modification. Let @ztlpn to remind these complains and we will continue. |
ok great ! I'll wait for your feedback guys ! |
In fact this patch will not help because the root cause is that several threads are created for executing each query, not the threads doing replication (there can be a lot of them but at least the number is stationary). |
@@ -193,6 +193,9 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree( | |||
shutdown_event(false), part_check_thread(*this), | |||
log(&Logger::get(database_name + "." + table_name + " (StorageReplicatedMergeTree)")) | |||
{ | |||
initMergeSelectSession(); | |||
merge_selecting_handle = context_.getSchedulePool().addTask("StorageReplicatedMergeTree", [this] { mergeSelectingThread(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task name is misleading - better name it something like StorageReplicatedMergeTree::mergeSelectingThread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are talking about StorageReplicatedMergeTree name ?. I used the same name as in logs and also as the class name. I will change into "mergeSelectingThread" or better "mergeSelectingTask". Please confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are talking about StorageReplicatedMergeTree name ?
Yes. In contrast to say ReplicatedMergeTreePartCheckThread, StorageReplicatedMergeTree is a huge class and it is not evident from the current name what part of the functionality this task is responsible for.
I like StorageReplicatedMergeTree::mergeSelectingTask
better than just mergeSelectingTask because it more directly points at the place in code where this task resides. Also may be a good idea to add the table name just as is done for the logger name.
Anyway, I've noticed that the names are only used to log slow executions so being verbose probably won't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK . this point is clear. I'll change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 6629b03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
std::function<std::pair<String, String>(const MergeTreeData::DataPartPtr &, const MergeTreeData::DataPartPtr &)> merge_sel_merging_predicate_args_to_key; | ||
std::chrono::steady_clock::time_point merge_sel_now; | ||
std::unique_ptr<CachedMergingPredicate<std::pair<std::string, std::string>> > merge_sel_cached_merging_predicate; | ||
std::function<bool(const MergeTreeData::DataPartPtr &, const MergeTreeData::DataPartPtr &)> merge_sel_can_merge; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we need a lot of fields here with the common prefix suggests that we should encapsulate the context of the merge selecting task in its own class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true. I'll do this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ztlpn looking to the code seems not so trivial. merge_sel_uncached_merging_predicate and merge_sel_can_merge are functions that have in their implementations lot of members from StorageReplicatedMergeTree.
How you prefer to deal with this ? I want to avoid to pass to the new context class a reference to StorageReplicatedMergeTree and to make them friend calsses (as I saw into lot of places in the code) because I have to violate the encapsulation. And code won't be better than now..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even a separate class that is a friend of StorageReplicatedMergeTree is better in terms of encapsulation. Yes, it can access private members of StorageReplicatedMergeTree, but StorageReplicatedMergeTree can't access its private members! As opposed to the current solution where everybody can access anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add the new class StorageReplicatedMergeTreeContext
into same header StorageReplicatedMergeTree.h or do you want a separate file ?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok not problem. Even if I think now calling all this _*Thread is no longer accurate :) better will be _*Task or something. There is no longer any thread there.. Of course I can't change all of them now (existing one) because I will shoot myself in both legs.. merges with other branches are a headache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well in some sense these tasks are still sequential threads of execution, just not represented as OS threads... And I agree that renaming it now would be painful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will make some time this days to fix this (from what I see last remaining task). I will call the new class ReplicatedMergeTreeMergeSelectingThread and place it in Storages/MergeTree dir.
I also need to bring the branch up to date again with master. I see an ugly conflict there.. need to see how I have to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
In commit f811da7 you can find the version of the fix where all merge_sel_* are moved into ReplicatedMergeTreeMergeSelectingThread
What I didn't done yet is to move this class into it's own file. Here I have some problems that I can fix in many ways
and not sure how you prefer.
Problem is that: canMergePartsAccordingToZooKeeperInfo it's used also into line 2456 and this function is locally only in this file.
What I can do is to move canMergePartsAccordingToZooKeeperInfo, partsWillNotBeMergedOrDisabled, CachedMergingPredicate into the new file and
1) for first function I can make it member of ReplicatedMergeTreeMergeSelectingThread so can be used from StorageReplicatedMergeTree as well
2) create in ReplicatedMergeTreeMergeSelectingThread another version for can_merge similar with the one from StorageReplicatedMergeTree line 2456
Please review this commit and also let me know how to proceed. One of the above options or another idea
Silviu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ztlpn any input on this ? want to finish it and then to merge with master as time lot of conflicts are already accumulating
if (check_period_ms > static_cast<Int64>(storage.data.settings.check_delay_period) * 1000) | ||
check_period_ms = storage.data.settings.check_delay_period * 1000; | ||
|
||
storage.queue_updating_task_handle = storage.context.getSchedulePool().addTask("queue_updating_task_handle", [this]{ storage.queueUpdatingThread(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task is inconsistently named (after the variable name, not the function name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to : queueUpdatingThread . Please confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageReplicatedMergeTree::queueUpdatingTask
(possibly with the table name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this is clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 6629b03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if (min_time > current_time) | ||
{ | ||
wakeup_event.wait_for(lock, std::chrono::microseconds(min_time - current_time)); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is a bit strange - after the thread is awakened, it will be granted delayed_tasks_lock, then it will release it, only to try and immediately reacquire it on line 230.
But probably this doesn't matter for correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that in this case the lock is released and then reacquired. But in this case we cannot just schedule the task. Maybe was removed from the queue so for this reason we are going again through checks.
I need to think if calling active() method on the task is enough or there are also other implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need to go through all the checks after returning from the wait() method (even if for the possibility of a spurious wakeup).
My point was that it is not necessary to release the lock and immediately re-acquire it to check the condition - the lock is already acquired after we return from the wait() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ztlpn I'm afraid is not clear for me on how you want this fixed : I mean how I can go through all the checks while also I'm not releasing the lock which is already acquired.
Can you provide some hints ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible solution:
Here is the current code (in pseudocode)
while (!shutdown)
{
{
acquire lock;
if (shutdown)
break;
if (no tasks)
{
condition.wait();
continue;
}
}
schedule task;
}
Every time we do continue;
we release the lock only to immediately re-acquire it.
Now what if we add a second loop:
while (!shutdown)
{
{
acquire lock;
while (!shutdown)
{
if (no tasks)
{
condition.wait();
continue;
}
else
break;
}
}
schedule task;
}
This way we still hold the lock after continuing and we execute the inner loop until we get a task to schedule. The lock is released only to wait on the condition.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW there is a bug in the exit predicate (line 265) - should be if (shutdown)
not if(!shutdown)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 7d6268c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You see any issue with this implementation :
void BackgroundSchedulePool::delayExecutionThreadFunction()
{
setThreadName("BckSchPoolDelay");
while (!shutdown)
{
TaskHandle task;
bool found = false;
{
std::unique_lock lock(delayed_tasks_lock);
while(!shutdown)
{
Poco::Timestamp min_time;
if (!delayed_tasks.empty())
{
auto t = delayed_tasks.begin();
min_time = t->first;
task = t->second;
}
if (!task)
{
wakeup_cond.wait(lock);
continue;
}
Poco::Timestamp current_time;
if (min_time > current_time)
{
wakeup_cond.wait_for(lock, std::chrono::microseconds(min_time - current_time));
continue;
}
else
{
//we have a task ready for execution
found = true;
break;
}
}
}
if(found)
task->schedule();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0a05769
|
||
if (!task) | ||
{ | ||
wakeup_event.wait(lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small possibility of deadlock on shutdown here. If this thread checks the shutdown variable and then sleeps for some reason before waiting on the condition, it can miss the notification from destructor (line 142) and wait forever.
The wakeup_event field is misnamed - it is a std::condition_variable, not Poco::Event (which has different semantics and will save notifications even when no one is waiting for them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can fix the deadlock by adding std::lock_guard lock(delayed_tasks_lock); in desturctor before shutdown = true. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would still leave a tiny window for deadlock - because shutdown flag is not checked under the lock in the delayExecutionThreadFunction loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right we can add a check inside the lock as well on the first line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 6629b03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
|
||
void BackgroundSchedulePool::scheduleDelayedTask(const TaskHandle & task, size_t ms, std::lock_guard<std::mutex>&) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small deviation from the style guide (the space before the & ;))
Also, I think the code is more readable when the lock argument is explicitly named to document exactly which mutex needs to be locked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change did by @alexey-milovidov in his first review. I can add a variable name there but will be unused inside the method. I think this is the reason why he picked to not put a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, the variable will be unused. Its name can be put in a comment though:
, std::lock_guard<std::mutex> & /* schedule_mutex_lock */)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this point is clear !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 6629b03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -444,6 +462,11 @@ bool ZooKeeper::exists(const std::string & path, Stat * stat_, const EventPtr & | |||
return existsWatch(path, stat_, callbackForEvent(watch)); | |||
} | |||
|
|||
bool ZooKeeper::exists(const std::string & path, Stat * stat, const TaskHandlePtr & watch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this class should know about TaskHandles. Can we use the generic callback interface (with TaskHandles converted to callbacks outside the ZooKeeper class)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky. If we want a uniform implementation we need 2 implementations for that interface. One using the EventPtr and one using TaskHandlePtr. Because both functions are doing same stuff.. only signaling it's done via a different method.
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have 2 impls for each method: one with EventPtr and one with WatchCallback (generic callback interface). Clients of BackgroundSchedulePool that want their tasks to be scheduled when a watch fires, should create WatchCallbacks themselves. This way ZooKeeper and BackgroundSchedulePool can be decoupled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK noted !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit a2dc16a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
CurrentMetrics::Increment metric_increment{CurrentMetrics::BackgroundSchedulePoolTask}; | ||
|
||
Stopwatch watch; | ||
function(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As scheduled flag is false here, we can schedule this task once more from another thread. If a thread in the pool is available, it will immediately try to execute but will be stuck waiting for the exec_mutex. Which is good because we don't want to execute the function more than once at any point in time.
But it is bad from the performance point of view because the executing task will occupy not one but two threads in the pool which are in short supply. We should try to minimize waiting in BackgroundSchedulePool::threadFunction().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea on how we can do this safely ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the pool has no idea if the task is currently executing.
If we think of a task more as a state machine... Here are the current states: Idle, Scheduled, Executing, Delayed. We want to go from Executing to Scheduled and we need to do it without waiting (that was the problem with the first version). But if we transition the task to the Scheduled state it will be immediately picked up for execution by another thread that will then get stuck on the exec_mutex (the current problem).
So possible path to the solution is to add one more state ExecutingScheduled (maybe just another flag) and to check after executing the task if it was scheduled once more in the meantime. If it was, execute it again right away (possibly in the same thread!).
The same goes for the situation when we delay the currently executing task for some small amount and this delay expires before the task finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed in 24de8d6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Please rebase the branch or merge the latest stable into it so that the changes are easily testable. |
Hello @ztlpn , I'll try to rebase with last master but there are conflicts in several places which are not straight forward to fix. I first need to review what changes were done in the meantime and then work on this to make sure I won't break something. Last time when I tried I failed :) maybe I was to tired. I'll try again next week. Unfortunately this is the problem when pull requests stays without review for long time. |
@ztlpn How many threads are created per each insert and select query ? It's a fixed number or depends from query to query ? |
Depends on the query and settings. Typically, to execute a SELECT query ClickHouse will create around max_threads new threads that will then immediately exit. But if it is a Distributed query, a thread for each shard will be created and that can be hundreds. This is not considered a big problem because thread creation on Linux is very efficient. But they specifically are the reason ASan hits its thread limits (and not the replication threads). Possible fix - use for query execution threads from some other thread pool with elastic resizing and a big upper thread limit. That's a separate issue though :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
c90c8db
to
a2dc16a
Compare
@ztlpn Now the branch has no longer conflicts with master. |
…threads in the pool which are in short supply.
@alexey-milovidov @ztlpn : Excepting 2 points where I added some comments because is not clear enough for me how you except to implement them all should be fixed and conflict free with master. Once you clarify the other 2 I'll fix them and run a load test and some functionality tests again. |
@@ -163,7 +164,7 @@ class ReplicatedMergeTreeQueue | |||
* If next_update_event != nullptr, will call this event when new entries appear in the log. | |||
* Returns true if new entries have been. | |||
*/ | |||
bool pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper, zkutil::EventPtr next_update_event); | |||
bool pullLogsToQueue(zkutil::ZooKeeperPtr zookeeper, BackgroundSchedulePool::TaskHandle next_update_event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be next_update_task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or _task_handle. Both are ok as long as you name all of them consistently (currently _handle is also used in a couple of places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 31874ed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about merge_selecting_handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5099284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you misunderstood me :) next_update_task_handle
is perfectly OK, I was talking about the StorageReplicatedMergeTree member variable which is inconsistently named.
I won't comment several times if if there is a common issue with several separate places in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted prev. commit. You mean to replace merge_selecting_handle with merge_selecting_task_handle ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 1418e33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -161,9 +160,11 @@ class ZooKeeper | |||
int32_t tryRemoveEphemeralNodeWithRetries(const std::string & path, int32_t version = -1, size_t * attempt = nullptr); | |||
|
|||
bool exists(const std::string & path, Stat * stat = nullptr, const EventPtr & watch = nullptr); | |||
bool exists(const std::string & path, Stat * stat, const WatchCallback & watch_callback); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for this method as it is the same as existsWatch(). The reason for the -Watch suffix is that overload resolution becomes somewhat difficult because there is the other method with a lot of default parameters.
Anyway, the interface for get(), tryGet() and exists() should be consistent. So you probably should add and use the getWatch() method, delete this exists() overload and use existsWatch() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4361df9 . Here we need to take into account that libzookeper will be removed soon, and the new implementation that @alexey-milovidov did have a slightly different interface. We will see what conflicts we will have on merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for consistency getWatch() instead of get().
Yes, there will be some conflicts. Who knows which pull will get merged first :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 438121e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…task_handle to merge_selecting_handle)
…task_handle to merge_selecting_handle) (reverted from commit 5099284)
# Conflicts: # dbms/src/Common/ZooKeeper/LeaderElection.h # dbms/src/Common/ZooKeeper/ZooKeeper.cpp # dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp # dbms/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp # dbms/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.h # dbms/src/Storages/MergeTree/ReplicatedMergeTreePartCheckThread.cpp # dbms/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp # dbms/src/Storages/StorageReplicatedMergeTree.cpp
Branch merged with master |
This is almost ready for merge. Alexey @ztlpn is doing final reviews. |
…-pool-fix # Conflicts: # dbms/src/Storages/StorageReplicatedMergeTree.cpp
Great ! I merged again with master to fix all conflicts . |
…-pool-fix # Conflicts: # dbms/src/Common/ZooKeeper/LeaderElection.h # dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp # dbms/src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp # dbms/src/Storages/MergeTree/ReplicatedMergeTreePartCheckThread.cpp # dbms/src/Storages/StorageReplicatedMergeTree.cpp
@ztlpn @alexey-milovidov From what I see ReplicatedMergeTreeCleanupThread it's called to often even if the table is idle, no read or write happened. This task is not triggered by any action that can create data that needs to be cleaned. it simply reschedule itself on a regular interval. This takes lot of cpu and I think its better to change it to be triggered only when actions that can generate obsolete data in a table takes place. |
That's correct, it will be much better. |
Same problem for merge_selecting_task_handle which on IDLE tables because there is nothing to merge its reschedule every 5 seconds (MERGE_SELECTING_SLEEP_MS) And another one is ReplicatedMergeTreeRestartingThread which checks if Zookeper session expired. Even this one reschedule itself very aggressively. @alexey-milovidov now that you rewrited the zookeper implementation there is no other method to detect when the session expired without polling ? |
Hello,
I fixed all observations @ztlpn did on #1689 .
Also the bug on the replication he found was a regression caused by my previous fix : #1649
So the changes between the initial version and this one are:
Basically I added a new mutex that synchronize the execute function (
exec_mutex
) to make sure that only one execute can take place at a time and also used when we deactivate the task to make sure that we wait for any in progress execute.The other mutex
schedule_mutex
)it's used only to synchronize thedeactivated
andscheduled
properties in order to make sure that:Also because of this changes there is no longer any need for recursive mutex.
I tested that the bug @ztlpn reported is fixed and also the deadlock found by me. I'll run more tests on the replication process in the following days and update the status here. In the meantime please review and come up with some feedback if any.
Silviu