Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Table for Tablet on 2.2->2.3 Upgrade #6080

Open
nspiegelberg opened this issue Oct 16, 2020 · 1 comment
Open

Missing Table for Tablet on 2.2->2.3 Upgrade #6080

nspiegelberg opened this issue Oct 16, 2020 · 1 comment
Assignees

Comments

@nspiegelberg
Copy link
Contributor

nspiegelberg commented Oct 16, 2020

A user is facing an issue in Upgrade process, trying to migrate yugabyte from yugabyte-2.2.2.0 to yugabyte-2.3.2.0 on an existing data-directory.

./bin/yb-ctl start --data_dir=/path-to-datadir/yugabyte_datadir --timeout-processes-running-sec 1000 --timeout-yb-admin-sec 300
Starting cluster with base directory /path-to-datadir/yugabyte_datadir
Waiting for cluster to be ready.
Viewing file /path-to-datadir/yugabyte_datadir/node-2/disk-1/master.err:
F1016 09:19:44.019273 9239 catalog_manager.cc:695] Failed to load sys catalog: Corruption (yb/master/catalog_loaders.cc:157): Failed while visiting tablets in sys catalog: Missing table for tablet: : 04c4de3e6b5f47f3990f0a5f1218fe37
Fatal failure details written to /path-to-datadir/yugabyte_datadir/node-2/disk-1/yb-data/master/logs/yb-master.FATAL.details.2020-10-16T09_19_44.pid8982.txt
F20201016 09:19:44 ../../src/yb/master/catalog_manager.cc:695] Failed to load sys catalog: Corruption (yb/master/catalog_loaders.cc:157): Failed while visiting tablets in sys catalog: Missing table for tablet: : 04c4de3e6b5f47f3990f0a5f1218fe37
@ 0x7f23874891fc yb::LogFatalHandlerSink::send()
@ 0x7f2386669376 google::LogMessage::SendToLog()
@ 0x7f23866667da google::LogMessage::Flush()
@ 0x7f23866698a9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f2391c7b3d7 yb::master::CatalogManager::LoadSysCatalogDataTask()
@ 0x7f2387523814 yb::ThreadPool::DispatchThread()
@ 0x7f238751ffef yb::Thread::SuperviseThread()
@ 0x7f2382c48694 start_thread
@ 0x7f238238541d __clone
@ (nil) (unknown)
*** Check failure stack trace: ***
@ 0x7f23874875e1 yb::(anonymous namespace)::DumpStackTraceAndExit()
@ 0x7f2386666c8d google::LogMessage::Fail()
@ 0x7f2386668dfd google::LogMessage::SendToLog()
@ 0x7f23866667da google::LogMessage::Flush()
@ 0x7f23866698a9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f2391c7b3d7 yb::master::CatalogManager::LoadSysCatalogDataTask()
@ 0x7f2387523814 yb::ThreadPool::DispatchThread()
@ 0x7f238751ffef yb::Thread::SuperviseThread()
@ 0x7f2382c48694 start_thread
@ 0x7f238238541d __clone
@ (nil) (unknown)
Error: Failed to load sys catalog: Corruption (yb/master/catalog_loaders.cc:157): Failed while visiting tablets in sys catalog: Missing table for tablet: : 04c4de3e6b5f47f3990f0a5f1218fe37

@nspiegelberg nspiegelberg self-assigned this Oct 16, 2020
@nspiegelberg
Copy link
Contributor Author

Repro'd:

yugabyte=# create database bogdan with colocated=true;
yugabyte=# \c bogdan
You are now connected to database "bogdan" as user "yugabyte".
bogdan=# create table foo (k int primary key);
CREATE TABLE
bogdan=# drop table foo;
DROP TABLE

then restart the cluster

[3:04 PM] F1016 22:02:47.142489 23396 catalog_manager.cc:695] Failed to load sys catalog: Corruption (yb/master/catalog_loaders.cc:157): Failed while visiting tablets in sys catalog: Missing table for tablet: : 2e9bbcdd7a4840ba83a4f1f41a1985c7

nspiegelberg added a commit that referenced this issue Oct 28, 2020
Summary:
When adding an optimization to avoid loading deleted tables, we missed an edge case where a
colocated tablet could have a deleted table inside it.  In this case, we get into a crash loop
because the code assumes it's the result of data corruption.  We should instead skip over these
tables and continue loading the rest of the schema.

Test Plan: PgMiniTest.DropAllTablesInColocatedDB

Reviewers: jason, bogdan

Reviewed By: bogdan

Subscribers: mihnea, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D9690
nspiegelberg added a commit that referenced this issue Oct 29, 2020
…restart

Summary:
When adding an optimization to avoid loading deleted tables, we missed an edge case where a
colocated tablet could have a deleted table inside it.  In this case, we get into a crash loop
because the code assumes it's the result of data corruption.  We should instead skip over these
tables and continue loading the rest of the schema.

Test Plan: Jenkins: rebase: 2.3

Reviewers: jason, bogdan

Reviewed By: bogdan

Subscribers: ybase, mihnea

Differential Revision: https://phabricator.dev.yugabyte.com/D9780
nspiegelberg added a commit that referenced this issue Oct 30, 2020
Summary:
Added a new PG mini test for handling missing tables on master restart.  After landing,
noticed that ASAN builds were failing with "SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../../src/yb/yql/pgwrapper/pg_mini_test_base.cc:35:19".  Thats when I noticed that most of the
PGMini tests are disabled in ASAN for random failures with the base test scaffolding (like this).

Test Plan: PgMiniTest.DropAllTablesInColocatedDB

Reviewers: sergei, jason, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D9796
nspiegelberg added a commit that referenced this issue Nov 14, 2020
Summary:
DropAllTablesInColocatedDB has been experiencing periodic failures on master.  CREATE TABLE
was occassionally failing after master failover because it was racing with TabletServer Heartbeat
onlining.  Added a Wait to the test to verify that all TabletServers are active in the Master before
continuing.  Also fixed 2 other invocations that had the same anti-pattern.

Test Plan: PgMiniTest.DropAllTablesInColocatedDB -n 100

Reviewers: bogdan, jason

Reviewed By: jason

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D9895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant