Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master server UI breaks when using hostname. #460

Closed
bbaddepudi opened this issue Sep 4, 2018 · 1 comment
Closed

Master server UI breaks when using hostname. #460

bbaddepudi opened this issue Sep 4, 2018 · 1 comment
Assignees
Labels
kind/bug This issue is a bug

Comments

@bbaddepudi
Copy link
Collaborator

bbaddepudi commented Sep 4, 2018

Added 127.0.01 as node1, and same for 2 & 3, in /etc/hosts.

Then started masters of a local RF=3 cluster (on Mac) using:

~/code/yugabyte/build/latest/bin/yb-master --webserver_interface 127.0.0.1 --rpc_bind_addresses=127.0.0.1 --server_broadcast_addresses=node1:7100 --use_private_ip=zone --master_addresses node1:7100,node2:7100,node3:7100 --fs_data_dirs "\
/tmp/yblocal1/"  >& /tmp/yb-master_1.out &
~/code/yugabyte/build/latest/bin/yb-master --webserver_interface 127.0.0.2  --rpc_bind_addresses=127.0.0.2 --server_broadcast_addresses=node2:7100 --use_private_ip=zone  --master_addresses node1:7100,node2:7100,node3:7100 --fs_data_dirs\
 "/tmp/yblocal2/"  >& /tmp/yb-master_2.out &
~/code/yugabyte/build/latest/bin/yb-master --webserver_interface 127.0.0.3 --rpc_bind_addresses=127.0.0.3 --server_broadcast_addresses=node3:7100 --use_private_ip=zone --master_addresses node1:7100,node2:7100,node3:7100 --fs_data_dirs "\
/tmp/yblocal3/"  >& /tmp/yb-master_3.out &

yb-master UI at 7000 showed node3 is the leader. Then did a kill -9 on the this master process. The logs showed that master-1 became the new leader, but that master-1 process crashed when its UI was accessed.

  thread #34: tid = 0x0021, 0x00007fff8fd1af06 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff8fd1af06 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8676b4ec libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff961dd787 libsystem_c.dylib`__abort + 145
    frame #3: 0x00007fff961dd6f6 libsystem_c.dylib`abort + 144
    frame #4: 0x00007fff91fc7f81 libc++abi.dylib`abort_message + 257
    frame #5: 0x00007fff91feda2f libc++abi.dylib`default_terminate_handler() + 243
    frame #6: 0x00007fff8bf596c3 libobjc.A.dylib`_objc_terminate() + 124
    frame #7: 0x00007fff91feb19e libc++abi.dylib`std::__terminate(void (*)()) + 8
    frame #8: 0x00007fff91feac12 libc++abi.dylib`__cxa_throw + 121
    frame #9: 0x0000000114fd1813 libprotobuf.15.dylib`google::protobuf::internal::LogMessage::Finish(this=<unavailable>) + 227 at common.cc:269
    frame #10: 0x000000011018bf49 libmaster.dylib`google::protobuf::RepeatedPtrField<yb::HostPortPB>::TypeHandler::Type const& google::protobuf::internal::RepeatedPtrFieldBase::Get<google::protobuf::RepeatedPtrField<yb::HostPortPB>::TypeHandler>(this=0x00000001164f29d0, index=0) const + 297 at repeated_field.h:1522
    frame #11: 0x000000011018be0b libmaster.dylib`google::protobuf::RepeatedPtrField<yb::HostPortPB>::Get(this=0x00000001164f29d0, index=0) const + 27 at repeated_field.h:1989
    frame #12: 0x000000011018942f libmaster.dylib`yb::ServerRegistrationPB::http_addresses(this=0x00000001164f29a0, index=0) const + 31 at wire_protocol.pb.h:1131
    frame #13: 0x000000011018168b libmaster.dylib`yb::master::MasterPathHandlers::HandleMasters(this=0x00000001164e80d0, req=0x0000700000f4f440, output=0x0000700000f4f2c0) + 2235 at master-path-handlers.cc:643
    frame #14: 0x0000000110180b54 libmaster.dylib`yb::master::MasterPathHandlers::RootHandler(this=0x00000001164e80d0, req=0x0000700000f4f440, output=0x0000700000f4f2c0) + 10692 at master-path-handlers.cc:604
    frame #15: 0x0000000110194b59 libmaster.dylib
...
    frame #22: 0x0000000112920664 libserver_process.dylib`yb::Webserver::RunPathHandler(this=0x0000000116546300, handler=0x000000011675a2d0, connection=0x00000001173bd000, request_info=0x00000001173bd000) + 3988 at webserver.cc:369
    frame #23: 0x000000011291f695 libserver_process.dylib`yb::Webserver::BeginRequestCallback(this=0x0000000116546300, connection=0x00000001173bd000, request_info=0x00000001173bd000) + 1925 at webserver.cc:312
    frame #24: 0x000000011291ec76 libserver_process.dylib`yb::Webserver::BeginRequestCallbackStatic(connection=0x00000001173bd000) + 54 at webserver.cc:287
    frame #25: 0x0000000112941574 libserver_process.dylib`handle_request + 3972
    frame #26: 0x0000000112940019 libserver_process.dylib`worker_thread + 1369
    frame #27: 0x00007fff8676899d libsystem_pthread.dylib`_pthread_body + 131
    frame #28: 0x00007fff8676891a libsystem_pthread.dylib`_pthread_start + 168
    frame #29: 0x00007fff86766351 libsystem_pthread.dylib`thread_start + 13
@bbaddepudi bbaddepudi self-assigned this Sep 5, 2018
@bbaddepudi bbaddepudi added the kind/bug This issue is a bug label Sep 5, 2018
@bbaddepudi
Copy link
Collaborator Author

bbaddepudi commented Sep 7, 2018

Current theory is that GetMasterEntryForHost returned error status semantics regressed recently and are causing this codepath in master-path-handlers.cc not being taken on failures.

    if (master.has_error()) {
      Status error = StatusFromPB(master.error());
      *output << "  <tr>\n";
      *output << Substitute("    <td colspan=2><font color='red'><b>$0</b></font></td>\n",
                            EscapeForHtmlToString(error.ToString()));
      *output << "  </tr>\n";
      continue;
    }

@bbaddepudi bbaddepudi assigned spolitov and unassigned bbaddepudi Sep 7, 2018
yugabyte-ci pushed a commit that referenced this issue Sep 7, 2018
…rHosts

Summary:
We ignore controller status in GetMasterEntryForHosts, in this case response is empty and its http_addresses is also empty.
Added handling of controller status to address this issue.

Test Plan:
Add 127.0.0.1 as node1, and same for 2 & 3, in /etc/hosts.

Start masters of a local RF=3 cluster (on Mac) using:

```
./yb-master --webserver_interface 127.0.0.1 --rpc_bind_addresses=127.0.0.1 --server_broadcast_addresses=node1:7100 --use_private_ip=zone --master_addresses node1:7100,node2:7100,node3:7100 --fs_data_dirs "/tmp/yblocal1/"  >& /tmp/yb-master_1.out &
./yb-master --webserver_interface 127.0.0.2 --rpc_bind_addresses=127.0.0.2 --server_broadcast_addresses=node2:7100 --use_private_ip=zone --master_addresses node1:7100,node2:7100,node3:7100 --fs_data_dirs "/tmp/yblocal2/"  >& /tmp/yb-master_2.out &
./yb-master --webserver_interface 127.0.0.3 --rpc_bind_addresses=127.0.0.3 --server_broadcast_addresses=node3:7100 --use_private_ip=zone --master_addresses node1:7100,node2:7100,node3:7100 --fs_data_dirs "/tmp/yblocal3/"  >& /tmp/yb-master_3.out &
```

yb-master UI at 7000 shows the leader. kill -9 this master process.
New master leader should not crash when its UI is accessed.

Reviewers: robert, mikhail, bharat

Reviewed By: bharat

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D5441
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

2 participants