Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateColumnFamily API incompatible with latest versions #3609

Closed
vpallipadi opened this issue Mar 14, 2018 · 20 comments
Closed

CreateColumnFamily API incompatible with latest versions #3609

vpallipadi opened this issue Mar 14, 2018 · 20 comments
Assignees

Comments

@vpallipadi
Copy link

While trying to migrate from 5.3.4 (and older releases) to 5.10.4, we noticed intermittent crash with

pure virtual method called
terminate called without an active exception

while dropping the column family and freeing the cfh.

Narrowing this down, we were able to reproduce this with a simple unit test below, which passes with 5.3.4 and fails with 5.10.4.

TEST_F(ColumnFamilyTest, CreateDropAndDestroy) {
ColumnFamilyHandle* cfh;
Open();
ASSERT_OK(db_->CreateColumnFamily(ColumnFamilyOptions(), "yoyo", &cfh));
ASSERT_OK(db_->Put(WriteOptions(), cfh, "foo", "bar"));
ASSERT_OK(db_->Flush(FlushOptions(), cfh));
ASSERT_OK(db_->DropColumnFamily(cfh));
ASSERT_OK(db_->DestroyColumnFamilyHandle(cfh));
}

Expected behavior

Success.

Actual behavior

Crash with pure virtual method called.

Steps to reproduce the behavior

The above test does work with this change to the test.

  • ASSERT_OK(db_->CreateColumnFamily(ColumnFamilyOptions(), "yoyo", &cfh));
  • ColumnFamilyOptions cfo;
  • ASSERT_OK(db_->CreateColumnFamily(cfo, "yoyo", &cfh));

@sachja

@siying
Copy link
Contributor

siying commented Mar 14, 2018

@vpallipadi can you identify which virtual method called, or the callstack?

@vpallipadi
Copy link
Author

#0 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7ff9e4c20cc9] ?? ??:0
#1 /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7ff9e4c240d8] ?? ??:0
#2 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155) [0x7ff9e552b535] ?? ??:0
#3 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6) [0x7ff9e55296d6] ?? ??:0
#4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703) [0x7ff9e5529703] ?? ??:0
#5 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5f1bf) [0x7ff9e552a1bf] ?? ??:0
#6 ./external_sst_file_import_test() [0x6febb7] rocksdb::BlockBasedTable::Close() /home/cohesity/src/rocksdb-5.10.4-new/table/block_based_table_reader.cc:2357
#7 ./external_sst_file_import_test() [0x6fec1e] rocksdb::BlockBasedTable::~BlockBasedTable() /home/cohesity/src/rocksdb-5.10.4-new/table/block_based_table_reader.cc:63
#8 ./external_sst_file_import_test() [0x6feef1] rocksdb::BlockBasedTable::~BlockBasedTable() /home/cohesity/src/rocksdb-5.10.4-new/table/block_based_table_reader.cc:64
#9 ./external_sst_file_import_test() [0x57e118] rocksdb::LRUHandle::Free() /home/cohesity/src/rocksdb-5.10.4-new/./cache/lru_cache.h:110
#10 ./external_sst_file_import_test() [0x659799] rocksdb::TableCache::Evict(rocksdb::Cache*, unsigned long) /home/cohesity/src/rocksdb-5.10.4-new/db/table_cache.cc:475
#11 ./external_sst_file_import_test() [0x5f6203] rocksdb::DBImpl::PurgeObsoleteFiles(rocksdb::JobContext const&, bool) /home/cohesity/src/rocksdb-5.10.4-new/db/db_impl_files.cc:484
#12 ./external_sst_file_import_test() [0x598802] rocksdb::JobContext::Clean() /home/cohesity/src/rocksdb-5.10.4-new/./db/job_context.h:159 (discriminator 2)
#13 ./external_sst_file_import_test() [0x598cf1] rocksdb::ColumnFamilyHandleImpl::~ColumnFamilyHandleImpl() /home/cohesity/src/rocksdb-5.10.4-new/db/column_family.cc:68

@siying
Copy link
Contributor

siying commented Mar 14, 2018

Can you try whether this patch can fix it?

diff --git a/db/column_family.cc b/db/column_family.cc
index e6518df..274001c 100644
--- a/db/column_family.cc
+++ b/db/column_family.cc
@@ -53,6 +53,9 @@ ColumnFamilyHandleImpl::~ColumnFamilyHandleImpl() {
 #endif  // ROCKSDB_LITE
     // Job id == 0 means that this is not our background process, but rather
     // user thread
+    // Need to hold some shared pointers owned by the initial_cf_options
+    // before final cleaning up finishes.
+    ColumnFamilyOptions initial_cf_options_copy = cfd_->initial_cf_options();
     JobContext job_context(0);
     mutex_->Lock();
     if (cfd_->Unref()) {
diff --git a/db/column_family.h b/db/column_family.h
index ce1fd47..b07dbb5 100644
--- a/db/column_family.h
+++ b/db/column_family.h
@@ -343,6 +343,8 @@ class ColumnFamilyData {

   bool initialized() const { return initialized_.load(); }

+  const ColumnFamilyOptions& initial_cf_options() { return initial_cf_options_; }
+
   Env::WriteLifeTimeHint CalculateSSTWriteHint(int level);

  private:

@vpallipadi
Copy link
Author

Yes. That works.

@siying
Copy link
Contributor

siying commented Mar 14, 2018

I'm sending a PR: #3610

@siying
Copy link
Contributor

siying commented Mar 14, 2018

To work it around, you can keep your ColumnFamilyOptions all the time.

@sachja
Copy link

sachja commented Mar 15, 2018

Do we need a similar fix for db_options ? I get the following crash in a test while calling delete on the db pointer.

PC: @ 0x7fbd6bddb13f __GI_raise
*** SIGABRT (@0x3e80000443d) received by PID 17469 (TID 0x7fbd6a49cb80) from PID 17469; stack trace: ***
@ 0x7fbd6e2ddbdd google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fbd6c60ead0 (unknown)
@ 0x7fbd6bddb13f __GI_raise
@ 0x7fbd6bddc56a __GI_abort
@ 0x18e55bf cohesity::misc::(anonymous namespace)::FailureHandler::FailureFunction()
@ 0x7fbd6c60c119 __pthread_once_slow
@ 0x18e20a2 cohesity::misc::(anonymous namespace)::FailureHandler::ExecuteFailureFunction()
@ 0x7fbd6e2cff4d google::LogMessage::Fail()
@ 0x7fbd6e2d1ded google::LogMessage::SendToLog()
@ 0x7fbd6e2cfa87 google::LogMessage::Flush()
@ 0x7fbd6e2d28b9 google::LogMessageFatal::~LogMessageFatal()
@ 0x18e23cc _ZZN8cohesity4misc12_GLOBAL__N_18InitOnceEvENUlvE_4_FUNEv
@ 0x7fbd6c508d36 __cxxabiv1::__terminate()
@ 0x7fbd6c508d81 std::terminate()
@ 0x7fbd6c50984f __cxa_pure_virtual
@ 0x7fbd6de7be4c rocksdb::ShardedCache::Erase()
@ 0x7fbd6e00f8e0 rocksdb::BlockBasedTable::Close()
@ 0x7fbd6de79b56 rocksdb::LRUCacheShard::ApplyToAllCacheEntries()
@ 0x7fbd6de7bac0 rocksdb::ShardedCache::ApplyToAllCacheEntries()
@ 0x7fbd6df75a64 rocksdb::VersionSet::~VersionSet()
@ 0x7fbd6ded8b47 rocksdb::DBImpl::~DBImpl()
@ 0x7fbd6ded91b1 rocksdb::DBImpl::~DBImpl()

@vpallipadi
Copy link
Author

This is the column_family_test that fails with above trace that Sachin posted.

TEST_F(ColumnFamilyTest, CreateDropAndDestroy) {
ColumnFamilyHandle* cfh;
Open();
ASSERT_OK(db_->CreateColumnFamily(ColumnFamilyOptions(), "yoyo", &cfh));
ASSERT_OK(db_->Put(WriteOptions(), cfh, "foo", "bar"));
ASSERT_OK(db_->Flush(FlushOptions(), cfh));
ASSERT_OK(db_->DisableFileDeletions());
ASSERT_OK(db_->DropColumnFamily(cfh));
ASSERT_OK(db_->DestroyColumnFamilyHandle(cfh));
}

Again, this test works fine on 5.3.4

@siying
Copy link
Contributor

siying commented Mar 16, 2018

@vpallipadi what's the difference between this one and the one you pasted previously?

@siying
Copy link
Contributor

siying commented Mar 16, 2018

@vpallipadi it's very likely that DB destructor has some bug too. The destructing dependencies are more complicated them. But I can't reproduce with this simple test:

TEST_F(DBTest2, OpenDBAndDestoryOption) {
  Close();

  std::unique_ptr<Options> options(new Options());
  ASSERT_OK(DB::Open((*options.get()), dbname_, &db_));
  options.reset();

  ASSERT_OK(Put("foo", "bar"));
  ASSERT_OK(Flush());
  ASSERT_EQ("bar", Get("foo"));
  delete db_;
  db_ = nullptr;

  EXPECT_OK(DestroyDB(dbname_, Options()));

}

If you can find a unit test to reproduce it. I'll dig more and fix it.

@vpallipadi
Copy link
Author

This simple test added to column_family_test reproduces the problem on 5.10.4.

TEST_F(ColumnFamilyTest, CreateDropAndDestroy) {
ColumnFamilyHandle* cfh;
Open();
ColumnFamilyOptions cfo;
ASSERT_OK(db_->CreateColumnFamily(cfo, "yoyo", &cfh));
ASSERT_OK(db_->Put(WriteOptions(), cfh, "foo", "bar"));
ASSERT_OK(db_->Flush(FlushOptions(), cfh));
ASSERT_OK(db_->DisableFileDeletions());
ASSERT_OK(db_->DropColumnFamily(cfh));
ASSERT_OK(db_->DestroyColumnFamilyHandle(cfh));
}

Only difference between this one and previous one is use of DisableFileDeletions().
Once we add it, we see the failure in DB destructor.

@toktarev
Copy link

Is there some RocksDB release where this issue is resolved ?

@ajkr ajkr self-assigned this Apr 24, 2018
@ajkr
Copy link
Contributor

ajkr commented Apr 24, 2018

It's not fixed. Looking now.

@ajkr
Copy link
Contributor

ajkr commented Apr 24, 2018

I believe the problem is that table readers from dropped column families stick around in table cache since they're not deletable. Meanwhile, their ColumnFamilyOptions has already been removed along with its references to objects like BlockBasedTableFactory, which owns a BlockBasedTableOptions. Let me see if forcing eviction during DropColumnFamily can help.

@vpallipadi
Copy link
Author

Anything we can do to help resolve this one. We are seeing this pretty consistently in our usage. Latest stack trace looks something like this.

@     0x7fe62d098d36 __cxxabiv1::__terminate()
@     0x7fe62d098d81 std::terminate()
@     0x7fe62d09984f __cxa_pure_virtual
@     0x7fe62d3c9fdc rocksdb::ShardedCache::Erase()
@     0x7fe62d55f080 rocksdb::BlockBasedTable::Close()
@     0x7fe62d55f11d rocksdb::BlockBasedTable::~BlockBasedTable()
@     0x7fe62d55f321 rocksdb::BlockBasedTable::~BlockBasedTable()
@     0x7fe62d3c861b rocksdb::LRUCacheShard::Erase()
@     0x7fe62d4b18d9 rocksdb::TableCache::Evict()
@     0x7fe62d43f0b5 rocksdb::DBImpl::PurgeObsoleteFiles()
@     0x7fe62d436675 rocksdb::DBImpl::BackgroundCallCompaction()
@     0x7fe62d43675a rocksdb::DBImpl::BGWorkCompaction()
@     0x7fe62d5c6796 rocksdb::ThreadPoolImpl::Impl::BGThread()
@     0x7fe62d5c68ed rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper()
@     0x7fe62d0c3aef execute_native_thread_routine
@     0x7fe62d791434 start_thread
@     0x7fe62ca2204f __GI___clone

@vpallipadi
Copy link
Author

Looking through the logs, this is the sequence of events that causes the reference to freed object.

// eptr error was hit when trying to purge these files after a background compaction.
__for_range = std::vector of length 3, capacity 3 = {{file_name = "/2281663.sst", path_id = 0},

{file_name = "/2281658.sst", path_id = 0}
, {file_name = "/2184018.sst", path_id = 0}}

// From rocksdb log, two of these files (2281658 and 2184018) are form cf that is still alive. The third one is from cfh 285_123, which had following recent changes.

// Flush
2018/05/14-15:39:03.266242 7f42ff29c700 (Original Log Time 2018/05/14-15:39:03.266040) [db/db_impl_compaction_flush.cc:1276] Calling FlushMemTableToOutputFile with column family [285_123], flush slots available 6, compaction slots available 4, flush slots scheduled 1, compaction slots scheduled 1
2018/05/14-15:39:03.266246 7f42ff29c700 [db/flush_job.cc:270] [285_123] [JOB 10634] Flushing memtable with next log file: 2281662
2018/05/14-15:39:03.266264 7f42ff29c700 EVENT_LOG_v1

{"time_micros": 1526337543266257, "job": 10634, "event": "flush_started", "num_memtables": 1, "num_entries": 25123, "num_deletes": 10040, "memory_usage": 5209192}
2018/05/14-15:39:03.266268 7f42ff29c700 [db/flush_job.cc:299] [285_123] [JOB 10634] Level-0 flush table #2281663: started
2018/05/14-15:39:03.287627 7f42ff29c700 EVENT_LOG_v1 {"time_micros": 1526337543287608, "cf_name": "285_123", "job": 10634, "event": "table_file_creation", "file_number": 2281663, "file_size": 908818, "table_properties": {"data_size": 893192, "index_size": 2269, "filter_size": 12505, "raw_key_size": 910511, "raw_average_key_size": 42, "raw_value_size": 746175, "raw_average_value_size": 34, "num_data_blocks": 47, "num_entries": 21391, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "10040", "kMergeOperands": "0"}}
2018/05/14-15:39:03.287668 7f42ff29c700 [db/flush_job.cc:339] [285_123] [JOB 10634] Level-0 flush table #2281663: 908818 bytes OK
2018/05/14-15:39:03.288569 7f42ff29c700 (Original Log Time 2018/05/14-15:39:03.287757) [db/memtable_list.cc:374] [285_123] Level-0 commit table #2281663 started
2018/05/14-15:39:03.288575 7f42ff29c700 (Original Log Time 2018/05/14-15:39:03.287959) [db/memtable_list.cc:397] [285_123] Level-0 commit table #2281663: memtable #1 done

// CFM Drop
2018/05/14-15:39:04.887893 3498700 [db/db_impl.cc:1419] Dropped column family with id 25809
2018/05/14-15:39:04.999376 3498700 EVENT_LOG_v1

{"time_micros": 1526337544999371, "job": 0, "event": "table_file_deletion", "file_number": 2281204}
2018/05/14-15:39:04.999691 3498700 EVENT_LOG_v1

{"time_micros": 1526337544999689, "job": 0, "event": "table_file_deletion", "file_number": 2280640}
2018/05/14-15:39:04.999925 3498700 EVENT_LOG_v1

{"time_micros": 1526337544999924, "job": 0, "event": "table_file_deletion", "file_number": 2280028}
2018/05/14-15:39:05.000206 3498700 EVENT_LOG_v1

{"time_micros": 1526337545000205, "job": 0, "event": "table_file_deletion", "file_number": 2279379}
2018/05/14-15:39:05.000481 3498700 EVENT_LOG_v1

{"time_micros": 1526337545000480, "job": 0, "event": "table_file_deletion", "file_number": 2278782}
2018/05/14-15:39:05.000530 3498700 EVENT_LOG_v1

{"time_micros": 1526337545000528, "job": 0, "event": "table_file_deletion", "file_number": 2278268}
2018/05/14-15:39:05.007617 3498700 EVENT_LOG_v1

{"time_micros": 1526337545007613, "job": 0, "event": "table_file_deletion", "file_number": 2278260}
2018/05/14-15:39:05.053192 3498700 EVENT_LOG_v1

{"time_micros": 1526337545053185, "job": 0, "event": "table_file_deletion", "file_number": 2241786}

// However, file 2281663 did not get deleted along with the drop. It tried to delete it immediately afterwards and that caused the crash.

@ajkr ajkr assigned riversand963 and unassigned ajkr May 18, 2018
@riversand963
Copy link
Contributor

riversand963 commented May 21, 2018

This commit provides a workaround for the issue, and I cannot reproduce the failure on current master. @ajkr @vpallipadi

Update:
It still fails the test case posted by @sachja . Looking into it now.

@toktarev
Copy link

THe problem is that there is not release with this commit

@riversand963
Copy link
Contributor

PR 3888 should fix both cases, and we will discuss how to make this change available to you. Feel free to provide suggestion and comment. Thanks!

facebook-github-bot pushed a commit that referenced this issue May 25, 2018
Summary:
Please refer to earlier discussion in [issue 3609](#3609).
There was also an alternative fix in [PR 3888](#3888), but the proposed solution requires complex change.

To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.

Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.

Test plan
```
$ make -j16
$ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
```

Expected behavior:
All tests should pass.
Closes #3898

Differential Revision: D8149421

Pulled By: riversand963

fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438
riversand963 added a commit that referenced this issue May 25, 2018
Summary:
Please refer to earlier discussion in [issue 3609](#3609).
There was also an alternative fix in [PR 3888](#3888), but the proposed solution requires complex change.

To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.

Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.

Test plan
```
$ make -j16
$ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
```

Expected behavior:
All tests should pass.
Closes #3898

Differential Revision: D8149421

Pulled By: riversand963

fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438
riversand963 added a commit that referenced this issue May 25, 2018
Summary:
Please refer to earlier discussion in [issue 3609](#3609).
There was also an alternative fix in [PR 3888](#3888), but the proposed solution requires complex change.

To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.

Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.

Test plan
```
$ make -j16
$ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
```

Expected behavior:
All tests should pass.
Closes #3898

Differential Revision: D8149421

Pulled By: riversand963

fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438
@riversand963
Copy link
Contributor

The PR fixing this issue has now been merged back to 5.13.fb. You can try it out and feel free to let us know how it goes. Thanks!
cc @vpallipadi @toktarev @sachja

ns-codereview pushed a commit to couchbase/kv_engine that referenced this issue Jun 15, 2018
In RocksDBKVStore we declare the following member variables in the
following order:

- std::unique_ptr<rocksdb::DB> rdb;
- rocksdb::ColumnFamilyOptions defaultCFOptions;
- rocksdb::ColumnFamilyOptions seqnoCFOptions;
- std::shared_ptr<rocksdb::Cache> blockCache.

The member declaration order leads to the inverse destruction sequence
'blockCache->seqnoCFOptions->defaultCFOptions->rdb'.
At first look, the issue seems the following:
- rdb (in some internal structure) keeps a rocksdb::Cache& to
  *blockCache (note that a Ref& does not incrememnt the ref-count of
  a shared_ptr)
- given the destruction order, blockCache is releases (ref-count=0)
- rdb has still rocksdb::Cache&, which is now referring to released
  memory
When a method of rocksdb::Cache& is called, then we have the 
'pure virtual' error.

A couple of open RocksDB issues have been identified as possible cause:
- facebook/rocksdb#3609
- facebook/rocksdb#3534

As a temporary workaround we destroy 'rdb' as first step in the
destruction sequence.

Change-Id: I1abd52ffc55c3d8ac41e072b3097541df1d37532
Reviewed-on: http://review.couchbase.org/91010
Reviewed-by: Dave Rigby <[email protected]>
Tested-by: Build Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@toktarev @ajkr @siying @sachja @riversand963 @vpallipadi and others