-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiFlash crash in AggregateFunctionSumAddImpl::add #2607
Comments
in HashMap.h:200, this->size=0, but it iteratering. template <typename Func>
void ALWAYS_INLINE mergeToViaEmplace(Self & that, Func && func)
{
for (auto it = this->begin(), end = this->end(); it != end; ++it)
{
typename Self::LookupResult res_it;
bool inserted;
that.emplace(Cell::getKey(it->getValue()), res_it, inserted, it.getHash());
func(res_it->getMapped(), it->getMapped(), inserted);
}
} |
the root cause is a race in function scheduleThreadForNextBucket, member variable void scheduleThreadForNextBucket()
{
++max_scheduled_bucket_num;
if (max_scheduled_bucket_num >= NUM_BUCKETS)
return;
parallel_merge_data->pool.schedule(
ThreadFactory(true, "MergingAggregtd").newJob([this]{ thread(max_scheduled_bucket_num); }));
} then |
this bug is imported in #2496, it added the ThreadFactory, change the call model from ClickHouse do some similar fix in ClickHouse/ClickHouse#21818. |
amazing finding ... |
however, after I apply #2617, there is another problem the deeper reason is in need furthur investigation. |
after I update the tidb version to keep compatible with tiflash, this error disappeared. |
full stack 2021.08.13 16:58:58.474323 [ 17019 ] <Error> BaseDaemon: (from thread 17079) Received signal Segmentation fault (11).
2021.08.13 16:58:58.474332 [ 17019 ] <Error> BaseDaemon: Address: NULL pointer.
2021.08.13 16:58:58.474339 [ 17019 ] <Error> BaseDaemon: Access: read.
2021.08.13 16:58:58.474347 [ 17019 ] <Error> BaseDaemon: Address not mapped to object.
2021.08.13 16:58:58.482853 [ 17019 ] <Error> BaseDaemon: 0. bin/tiflash/tiflash(DB::AggregateFunctionCount::merge(char*, char const*, DB::Arena*) const+0x1) [0x6884921]
2021.08.13 16:58:58.482894 [ 17019 ] <Error> BaseDaemon: 1. bin/tiflash/tiflash(void DB::Aggregator::mergeDataImpl<DB::AggregationMethodSerialized<TwoLevelHashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef, void>, HashTableNoState>, DefaultHash<StringRef, void>, TwoLevelHashTableGrower<8ul>, Allocator<true>, HashMapTable> >, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef, void>, HashTableNoState>, DefaultHash<StringRef, void>, TwoLevelHashTableGrower<8ul>, Allocator<true> > >(HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef, void>, HashTableNoState>, DefaultHash<StringRef, void>, TwoLevelHashTableGrower<8ul>, Allocator<true> >&, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef, void>, HashTableNoState>, DefaultHash<StringRef, void>, TwoLevelHashTableGrower<8ul>, Allocator<true> >&, DB::Arena*) const+0x270) [0x7f65180]
2021.08.13 16:58:58.482914 [ 17019 ] <Error> BaseDaemon: 2. bin/tiflash/tiflash(void DB::Aggregator::mergeBucketImpl<DB::AggregationMethodSerialized<TwoLevelHashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef, void>, HashTableNoState>, DefaultHash<StringRef, void>, TwoLevelHashTableGrower<8ul>, Allocator<true>, HashMapTable> > >(std::vector<std::shared_ptr<DB::AggregatedDataVariants>, std::allocator<std::shared_ptr<DB::AggregatedDataVariants> > >&, int, DB::Arena*) const+0x9c) [0x7f653ac]
2021.08.13 16:58:58.482946 [ 17019 ] <Error> BaseDaemon: 3. bin/tiflash/tiflash(DB::MergingAndConvertingBlockInputStream::thread(int)+0xc9e) [0x7f91b6e]
2021.08.13 16:58:58.482957 [ 17019 ] <Error> BaseDaemon: 4. bin/tiflash/tiflash(ThreadPool::worker()+0x167) [0x8192297]
2021.08.13 16:58:58.482966 [ 17019 ] <Error> BaseDaemon: 5. bin/tiflash/tiflash() [0x8e371bf]
2021.08.13 16:58:58.482983 [ 17019 ] <Error> BaseDaemon: 6. /lib64/libpthread.so.0(+0x7dd5) [0x7fd7fd995dd5] |
log
reproduce:
tiup playground nightly tiup bench tpch prepare --tiflash --analyze mysql -h127.0.0.1 -P4000 -Dtest -uroot -e"source ~/tmp/q18.sql"
The text was updated successfully, but these errors were encountered: