Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initdb fails in debug build on CentOS #1291

Closed
ttyusupov opened this issue May 1, 2019 · 2 comments
Closed

initdb fails in debug build on CentOS #1291

ttyusupov opened this issue May 1, 2019 · 2 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL)

Comments

@ttyusupov
Copy link
Contributor

./yb_build.sh --sj --targets "yb-master yb-tserver postgres yb-admin"
./bin/yb-ctl start
2019-05-01 14:39:07,419 INFO: Waiting until we have enough tablet servers (currently 1)
2019-05-01 14:39:11,817 INFO: Waiting until we have enough tablet servers (currently 1)
2019-05-01 14:39:12,038 INFO: Running initdb to initialize PostgreSQL metadata in the YugaByte cluster. This may take up to a minute.
2019-05-01 14:40:41,461 INFO: initdb log file (from /home/centos/yugabyte-data/initdb.log):
2019-05-01 14:40:41,461 INFO: The files belonging to this database system will be owned by user "centos".
2019-05-01 14:40:41,461 INFO: This user must also own the server process.
2019-05-01 14:40:41,461 INFO:
2019-05-01 14:40:41,461 INFO: The database cluster will be initialized with locale "C".
2019-05-01 14:40:41,462 INFO: The default database encoding has accordingly been set to "SQL_ASCII".
2019-05-01 14:40:41,462 INFO: The default text search configuration will be set to "english".
2019-05-01 14:40:41,462 INFO:
2019-05-01 14:40:41,462 INFO: Data page checksums are disabled.
2019-05-01 14:40:41,462 INFO:
2019-05-01 14:40:41,462 INFO: creating directory /home/centos/yugabyte-data/yb-ctl_tmp_pg_data_d85e51b40a2174b6bb35e66513fb8871 ... ok
2019-05-01 14:40:41,462 INFO: creating subdirectories ... ok
2019-05-01 14:40:41,462 INFO: selecting default max_connections ... 100
2019-05-01 14:40:41,462 INFO: selecting default shared_buffers ... 128MB
2019-05-01 14:40:41,462 INFO: selecting dynamic shared memory implementation ... posix
2019-05-01 14:40:41,462 INFO: creating configuration files ... ok
2019-05-01 14:40:41,462 INFO: running bootstrap script ... I0501 14:39:12.895831  1228 ybc_util.cc:59] Setting flag pggate_master_addresses to the value of the env var FLAGS_pggate_master_addresses: 127.0.0.1:7100
2019-05-01 14:40:41,462 INFO: I0501 14:39:12.897752  1228 mem_tracker.cc:241] MemTracker: hard memory limit is 12.349309 GB
2019-05-01 14:40:41,462 INFO: I0501 14:39:12.897783  1228 mem_tracker.cc:243] MemTracker: soft memory limit is 10.496913 GB
2019-05-01 14:40:41,462 INFO: I0501 14:39:12.899215  1230 async_initializer.cc:77] Starting to init ybclient
2019-05-01 14:40:41,462 INFO: I0501 14:39:12.912180  1230 async_initializer.cc:83] Successfully built ybclient
2019-05-01 14:40:41,462 INFO: W0501 14:39:13.951544  1236 outbound_call.cc:341] RPC callback for RPC call yb.master.MasterService.GetTableLocations -> { remote: 127.0.0.1:7100 idx: 0 protocol: 0x00007ff54267c338 -> tcp } , state=FINISHED_SUCCESS. blocked reactor thread for 88335us
2019-05-01 14:40:41,462 INFO: W0501 14:40:41.087746  1236 tablet_rpc.cc:320] Timed out (yb/rpc/outbound_call.cc:436): Failed Read(tablet: 00000000000000000000000000000000, num_ops: 1, num_attempts: 47, txn: 00000000-0000-0000-0000-000000000000) to tablet 00000000000000000000000000000000 on tablet server { uuid: dd0d97df2897493fb8c454947e446efa private: [host: "127.0.0.1" port: 7100] cloud_info: placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" after 47 attempt(s): Read RPC to 127.0.0.1:7100 timed out after 1.162s
2019-05-01 14:40:41,462 INFO: E0501 14:40:41.088639  1228 elog.c:436] Stack trace for a FATAL log message
2019-05-01 14:40:41,462 INFO:     @     0x7ff54c68a114  YBCLogImpl (src/yb/util/ybc_util.cc:111)
2019-05-01 14:40:41,462 INFO:     @           0xa83756  errfinish (src/postgres/src/backend/utils/error/elog.c:436)
2019-05-01 14:40:41,462 INFO:     @           0xa87046  pg_re_throw (src/postgres/src/backend/utils/error/elog.c:1783)
2019-05-01 14:40:41,463 INFO:     @           0x58aa5b  ybcinbuild (src/postgres/src/backend/access/ybc/ybcin.c:80)
2019-05-01 14:40:41,463 INFO:     @           0x59f3fd  index_build (src/postgres/src/backend/catalog/index.c:2346)
2019-05-01 14:40:41,463 INFO:     @           0x591289  build_indices (src/postgres/src/backend/bootstrap/bootstrap.c:1140)
2019-05-01 14:40:41,463 INFO:     @           0x58c0ab  boot_yyparse (postgres/src/backend/bootstrap/bootparse.y:516)
2019-05-01 14:40:41,463 INFO:     @           0x58faa5  BootstrapModeMain (src/postgres/src/backend/bootstrap/bootstrap.c:535)
2019-05-01 14:40:41,463 INFO:     @           0x58f93a  AuxiliaryProcessMain (src/postgres/src/backend/bootstrap/bootstrap.c:442)
2019-05-01 14:40:41,463 INFO:     @           0x77eaab  PostgresServerProcessMain (src/postgres/src/backend/main/main.c:226)
2019-05-01 14:40:41,463 INFO:     @           0x77ee8a
2019-05-01 14:40:41,463 INFO:     @     0x7ff54b543824  __libc_start_main (csu/libc-start.c:289)
2019-05-01 14:40:41,463 INFO:     @           0x48bb38  (unknown) (sysdeps/x86_64/start.S:118)
2019-05-01 14:40:41,463 INFO:     @ 0xffffffffffffffff
@ndeodhar ndeodhar added the area/ysql Yugabyte SQL (YSQL) label May 1, 2019
@d-uspenskiy
Copy link
Contributor

It is possible to increase timeout for RPC as a workaround solution

export FLAGS_retryable_rpc_single_call_timeout_ms=3000

@d-uspenskiy
Copy link
Contributor

Another workaround solution is to use fastdebug build instead of debug

./yb_build.sh fastdebug --sj

yugabyte-ci pushed a commit that referenced this issue May 28, 2019
Summary:
`initdb` process on the tablet server reads data from postgres system table with thousands rows hosted in
the master.
All these rows are fetched in context of single RPC with `DocRowwiseIterator`.
Timeout of single RPC call is set to 2500msec and in `debug` build this timeout is not enough.

```
W0517 20:01:22.840813 19574 yb_rpc.cc:361] Call yb.tserver.TabletServerService.Read 127.0.0.1:41018 => 127.0.0.1:7100 (request call id 5035) took 2514ms (client timeout 2499ms)
```

After several RPC retries the whole read operations fails with timeout

```
W0501 14:40:41.087746  1236 tablet_rpc.cc:320] Timed out (yb/rpc/outbound_call.cc:436): Failed Read(tablet: 00000000000000000000000000000000, num_ops: 1, num_attempts: 47, txn: 00000000-0000-0000-0000-000000000000) to tablet 00000000000000000000000000000000 on tablet server { uuid: dd0d97df2897493fb8c454947e446efa private: [host: "127.0.0.1" port: 7100] cloud_info: placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" after 47 attempt(s): Read RPC to 127.0.0.1:7100 timed out after 1.162s
```

Debug build uses no optimization, but in case of using `-O1` optimization
`DocRowwiseIterator::HasNext` method works ~9 times faster and RPC takes no more than 250msecs

Test Plan:
- rebuild YB in debug mode `/yb_build.sh --sj`
- check cluster starts `./bin/yb-ctl start`

Reviewers: timur, mikhail

Reviewed By: mikhail

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D6660
ttyusupov added a commit that referenced this issue Jul 16, 2021
Summary:
Some time ago we had optimizations disabled for debug build type, but it was enabled during fix of #1291: f710367 ( https://phabricator.dev.yugabyte.com/D6660 ). Now we no longer have `retryable_rpc_single_call_timeout_ms` flag, also optimizations in debug build make it harder to investigate issues because of optimized stack traces and variables. So, we can disable these optimizations again to make debugging easier.

Before (note <optimized out> values that are not available for debugging):
```
(gdb) bt
#0  0x00007f97990bfa6b in raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x00007f97a45268b9 in AddHash (num_probes=6, total_bits=523776, num_lines=<optimized out>, data=0x2bdc000 "", h=4266458700) at ../../src/yb/rocksdb/util/bloom.cc:66
#2  rocksdb::(anonymous namespace)::FixedSizeFilterBitsBuilder::AddKey (this=<optimized out>, key=...) at ../../src/yb/rocksdb/util/bloom.cc:463
#3  0x00007f97a44f0818 in rocksdb::FixedSizeFilterBlockBuilder::AddKey (this=this@entry=0x1c87a40, key=...) at ../../src/yb/rocksdb/table/fixed_size_filter_block.cc:97
#4  0x00007f97a44f08b0 in rocksdb::FixedSizeFilterBlockBuilder::Add (this=0x1c87a40, key=...) at ../../src/yb/rocksdb/table/fixed_size_filter_block.cc:91
#5  0x00007f97a44cf295 in rocksdb::BlockBasedTableBuilder::Add (this=0x1cd1c00, key=..., value=...) at ../../src/yb/rocksdb/table/block_based_table_builder.cc:468
#6  0x00007f97a439b9a4 in rocksdb::BuildTable (dbname=..., env=0x7f97a48a2c00 <rocksdb::Env::Default()::default_env>, ioptions=..., env_options=..., table_cache=0x1ccf740, iter=0x7f97969858f8, meta=0x7f97969864a0, internal_comparator=
    std::shared_ptr<const rocksdb::InternalKeyComparator> (use count 3, weak count 0) = {...}, int_tbl_prop_collector_factories=std::vector of length 1, capacity 1 = {...}, column_family_id=0,
    snapshots=std::vector of length 0, capacity 0, earliest_write_conflict_snapshot=72057594037927935, compression=rocksdb::kSnappyCompression, compression_opts=..., paranoid_file_checks=false, internal_stats=0x1d40200,
    boundary_values_extractor=0x1cd03f0, io_priority=rocksdb::Env::IO_HIGH, table_properties=0x7f9796987000) at ../../src/yb/rocksdb/db/builder.cc:160
#7  0x00007f97a4444ced in rocksdb::FlushJob::WriteLevel0Table (this=this@entry=0x7f9796986f40, mems=..., edit=0x1d44268, meta=meta@entry=0x7f97969864a0) at ../../src/yb/rocksdb/db/flush_job.cc:290
#8  0x00007f97a444669c in rocksdb::FlushJob::Run (this=this@entry=0x7f9796986f40, file_meta=file_meta@entry=0x7f9796986d00) at ../../src/yb/rocksdb/db/flush_job.cc:191
#9  0x00007f97a43fb5ba in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x1d24000, cfd=cfd@entry=0x1a7b000, mutable_cf_options=..., made_progress=made_progress@entry=0x7f9796987f47,
    job_context=job_context@entry=0x7f9796987d70, log_buffer=0x7f9796987480) at ../../src/yb/rocksdb/db/db_impl.cc:1873
#10 0x00007f97a43fc505 in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x1d24000, made_progress=made_progress@entry=0x7f9796987f47, job_context=job_context@entry=0x7f9796987d70, log_buffer=log_buffer@entry=0x7f9796987480,
    cfd=0x1a7b000, cfd@entry=0x0) at ../../src/yb/rocksdb/db/db_impl.cc:3202
#11 0x00007f97a4406cb3 in rocksdb::DBImpl::BackgroundCallFlush (this=this@entry=0x1d24000, cfd=cfd@entry=0x0) at ../../src/yb/rocksdb/db/db_impl.cc:3276
#12 0x00007f97a4406f6d in rocksdb::DBImpl::BGWorkFlush (db=db@entry=0x1d24000) at ../../src/yb/rocksdb/db/db_impl.cc:3132
#13 0x00007f97a4540875 in rocksdb::ThreadPool::BGThread (this=0x1adeb60, thread_id=0) at ../../src/yb/rocksdb/util/thread_posix.cc:126
#14 0x00007f97a4540899 in operator() (__closure=<optimized out>) at ../../src/yb/rocksdb/util/thread_posix.cc:165
#15 std::_Function_handler<void(), rocksdb::ThreadPool::StartBGThreads()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:1871
#16 0x00007f979d804626 in operator() (this=0x1cbdc78) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#17 yb::Thread::SuperviseThread (arg=0x1cbdc20) at ../../src/yb/util/thread.cc:771
#18 0x00007f97990b7694 in start_thread (arg=0x7f9796990700) at pthread_create.c:333
#19 0x00007f9798df941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```

After:
```
#0  0x00007f1a9109ba6b in raise (sig=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x00007f1a9edd7343 in rocksdb::(anonymous namespace)::AddHash (h=4266458700, data=0x2d58000 "", num_lines=1023, total_bits=523776, num_probes=6) at ../../src/yb/rocksdb/util/bloom.cc:66
#2  0x00007f1a9edd87c8 in rocksdb::(anonymous namespace)::FixedSizeFilterBitsBuilder::AddKey (this=0x2d30040, key=...) at ../../src/yb/rocksdb/util/bloom.cc:463
#3  0x00007f1a9ed8e757 in rocksdb::FixedSizeFilterBlockBuilder::AddKey (this=0x1ef22d0, key=...) at ../../src/yb/rocksdb/table/fixed_size_filter_block.cc:97
#4  0x00007f1a9ed8e6eb in rocksdb::FixedSizeFilterBlockBuilder::Add (this=0x1ef22d0, key=...) at ../../src/yb/rocksdb/table/fixed_size_filter_block.cc:91
#5  0x00007f1a9ed5b2ea in rocksdb::BlockBasedTableBuilder::Add (this=0x1e4dc00, key=..., value=...) at ../../src/yb/rocksdb/table/block_based_table_builder.cc:468
#6  0x00007f1a9eb80bd4 in rocksdb::BuildTable (dbname="/tmp/yb_tests__2020-12-14T18_15_29__23449.18214.17918/mytestdb-814110369", env=0x7f1a9f35fe40 <rocksdb::Env::Default()::default_env>, ioptions=..., env_options=...,
    table_cache=0x1e4b740, iter=0x7f1a8e961858, meta=0x7f1a8e962480, internal_comparator=std::shared_ptr<const rocksdb::InternalKeyComparator> (use count 3, weak count 0) = {...},
    int_tbl_prop_collector_factories=std::vector of length 1, capacity 1 = {...}, column_family_id=0, snapshots=std::vector of length 0, capacity 0, earliest_write_conflict_snapshot=72057594037927935,
    compression=rocksdb::kSnappyCompression, compression_opts=..., paranoid_file_checks=false, internal_stats=0x1ebc200, boundary_values_extractor=0x1e4c3f0, io_priority=rocksdb::Env::IO_HIGH, table_properties=0x7f1a8e962f70)
    at ../../src/yb/rocksdb/db/builder.cc:160
#7  0x00007f1a9ec8e56b in rocksdb::FlushJob::WriteLevel0Table (this=0x7f1a8e962eb0, mems=..., edit=0x1ec0268, meta=0x7f1a8e962480) at ../../src/yb/rocksdb/db/flush_job.cc:290
#8  0x00007f1a9ec8d767 in rocksdb::FlushJob::Run (this=0x7f1a8e962eb0, file_meta=0x7f1a8e962c70) at ../../src/yb/rocksdb/db/flush_job.cc:191
#9  0x00007f1a9ec10c56 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=0x1ea0000, cfd=0x1bf7000, mutable_cf_options=..., made_progress=0x7f1a8e9640b7, job_context=0x7f1a8e963ee0, log_buffer=0x7f1a8e9635f0)
    at ../../src/yb/rocksdb/db/db_impl.cc:1873
#10 0x00007f1a9ec18bb6 in rocksdb::DBImpl::BackgroundFlush (this=0x1ea0000, made_progress=0x7f1a8e9640b7, job_context=0x7f1a8e963ee0, log_buffer=0x7f1a8e9635f0, cfd=0x1bf7000) at ../../src/yb/rocksdb/db/db_impl.cc:3202
#11 0x00007f1a9ec1914d in rocksdb::DBImpl::BackgroundCallFlush (this=0x1ea0000, cfd=0x0) at ../../src/yb/rocksdb/db/db_impl.cc:3276
#12 0x00007f1a9ec182fa in rocksdb::DBImpl::BGWorkFlush (db=0x1ea0000) at ../../src/yb/rocksdb/db/db_impl.cc:3132
#13 0x00007f1a9ee02747 in rocksdb::ThreadPool::BGThread (this=0x1c5ab60, thread_id=0) at ../../src/yb/rocksdb/util/thread_posix.cc:126
#14 0x00007f1a9ee028c6 in rocksdb::ThreadPool::<lambda()>::operator()(void) const (__closure=0x1e39c78) at ../../src/yb/rocksdb/util/thread_posix.cc:165
#15 0x00007f1a9ee03140 in std::_Function_handler<void(), rocksdb::ThreadPool::StartBGThreads()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:1871
#16 0x00007f1aa1efe732 in std::function<void ()>::operator()() const (this=0x1e39c78) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#17 0x00007f1a95e6bcf9 in yb::Thread::SuperviseThread (arg=0x1e39c20) at ../../src/yb/util/thread.cc:771
#18 0x00007f1a91093694 in start_thread (arg=0x7f1a8e96c700) at pthread_create.c:333
#19 0x00007f1a90dd541d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```

Test Plan:
```
#!/usr/bin/env bash

set -euo pipefail

for i in {1..50}
do
  echo "Iteration: $i"
  rm -rf build/debug-gcc-dynamic-ninja/share/initial_sys_catalog_snapshot
  ybd --sj packaged
  ./bin/yb-ctl destroy
  ./bin/yb-ctl start
  ./bin/ysqlsh -c "SELECT 1"
done
./bin/yb-ctl destroy
```

Reviewers: bogdan, sergei, dmitry, mbautin

Reviewed By: mbautin

Subscribers: eng

Differential Revision: https://phabricator.dev.yugabyte.com/D10121
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL)
Projects
None yet
Development

No branches or pull requests

3 participants