Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support host names in YugaByte DB #128

Closed
rkarthik007 opened this issue Mar 27, 2018 · 5 comments
Closed

Support host names in YugaByte DB #128

rkarthik007 opened this issue Mar 27, 2018 · 5 comments
Assignees
Labels
kind/enhancement This is an enhancement of an existing feature
Milestone

Comments

@rkarthik007
Copy link
Collaborator

rkarthik007 commented Mar 27, 2018

Scenario

Seeing this issue with shrinking the cluster, when running using yb-docker-ctl and Kubernetes.

Steps to repro

  • Download the latest yb-docker-ctl for a local docker setup of YugaByte.
  • Start a docker cluster: bin/yb-docker-ctl create
  • Start a workload (this creates a table):
    java -jar ./yb-sample-apps.jar --workload CassandraKeyValue \
                                   --nodes localhost:9042 \
                                   --num_threads_write 1 \
                                   --num_threads_read 4 \
                                   --value_size 4096
    
  • Add a node: bin/yb-docker-ctl add_node
  • Wait for the load to stabilize: http://localhost:7000/tablet-servers
  • Remove a node:
    bin/yb-docker-ctl remove_node 4
    

Details

Docker will stop that container but also remove the DNS entry for that node name (in this case, yb-tserver-n4). Subsequently, the Raft groups that still have that node as a peer and list it by hostname end up failing to resolve the address for this peer.

Here is a code snippet where the failure happens:
src/yb/consensus/consensus_peers.cc:457

Status CreateConsensusServiceProxyForHost(const shared_ptr<Messenger>& messenger,
                                          const HostPort& hostport,
                                          gscoped_ptr<ConsensusServiceProxy>* new_proxy) {
  std::vector<Endpoint> addrs;
  RETURN_NOT_OK(hostport.ResolveAddresses(&addrs));
  if (addrs.size() > 1) {
    LOG(WARNING) << "Peer address '" << hostport.ToString() << "' "
                 << "resolves to " << addrs.size() << " different addresses. Using "
                 << addrs[0];
  }
  new_proxy->reset(new ConsensusServiceProxy(messenger, addrs[0]));
  return Status::OK();
}

Another node wins the election upon the node removal - the winning leader then loops through its current RaftConfig set of peers (including the dead one, as this could be a temporary failure) and tries to setup a new Proxy to it, which goes down this path of name resolution and fails.

Thanks to @bmatican for investigating/writing up a lot of this.

@rkarthik007
Copy link
Collaborator Author

rkarthik007 commented Mar 27, 2018

Possible solutions:

  1. Resolve host names as early as possible and use IP addresses from that point onwards.
  2. Treat DNS resolution timeout as host being unreachable.

Preferred option

Option 1 is not preferable because the failure in resolution of single peer leads to the failure of the whole Raft config update. This means that even though we could have moved forward, we would not do so till the peer is removed from the Raft quorum.

Option 2 details

Option #2 is very preferable as it works well with kubernetes, but we would need a way to safely re-resolve the dns or remove the node from the list after a while even if that node ends up alive. Basically, if the node that was removed is added back in a few seconds, we should re-resolve it at a later point.

Assuming we leave the peer connection object as a shell without the ip address but with just the dns name: if the node that failed dns resolution comes back, we may never refresh the connection to this host. The master also will not remove the shell host from the tablet raft groups, because the shell host will already appear as a valid part of various quorums.

@spolitov's proposal:

We could do an async DNS resolution, so it would not block current thread and other peers could be resolved. Currently the DNS resolution happens while holding the lock, and that is not a good practice. Even in a good setup, DNS resolution could add a significant latency. This is ok when we are refreshing peers when becoming a new master, but not ok at steady state if we want to re-resolve on the fly.

cc @mbautin @ttyusupov @kmuthukk

@rkarthik007 rkarthik007 added the kind/enhancement This is an enhancement of an existing feature label Mar 27, 2018
@rkarthik007
Copy link
Collaborator Author

cc @robertpang who is looking at something similar.

@bmatican
Copy link
Contributor

This is still happening even after @spolitov fixed the DNS resolution and turned it async. The problem now is that, when DNS resolution fails, PeerManager::UpdateRaftConfig ends up not running the Peer::NewRemotePeer code, which would be responsible for creating a new Peer and calling Peer::Init which would set it up as tracked in the PeerMessageQueue::TrackPeer. That is important, because the PeerMessageQueue::RequestForPeer is the one responsible for kicking out dead peers after follower_unavailable_considered_failed_sec elapses...

So in short, no DNS resolution -> no Peer -> no book-keeping on communication -> no removal of dead peer from RaftConfig...

I've also figured out a way to repro this locally, with yb-ctl instead of needing to generate our tar.gz and setup a docker image.

Setting up /etc/hosts:

127.0.0.1 yb-master-n1
127.0.0.2 yb-master-n2
127.0.0.3 yb-master-n3

127.0.0.1 yb-tserver-n1
127.0.0.2 yb-tserver-n2
# 127.0.0.3 yb-tserver-n3
127.0.0.4 yb-tserver-n4
127.0.0.5 yb-tserver-n5

Then with a bit of tweak to yb-ctl to use those hostnames instead of ips:

diff --git a/bin/yb-ctl b/bin/yb-ctl
index d857a6c..09e41f1 100755
--- a/bin/yb-ctl
+++ b/bin/yb-ctl
@@ -85,8 +85,9 @@ class ExitWithError(Exception):
     pass


-def get_local_ip(index):
-    return "127.0.0.{}".format(index)
+def get_local_ip(daemon_type, index):
+    # return "127.0.0.{}".format(index)
+    return "yb-{}-n{}".format(daemon_type, index)


 class DaemonId:
@@ -107,7 +108,7 @@ class DaemonId:
         return self.daemon_type == DAEMON_TYPE_TSERVER

     def get_ip_address(self):
-        return get_local_ip(self.index)
+        return get_local_ip(self.daemon_type, self.index)


 class ClusterOptions:
@@ -450,7 +451,9 @@ class ClusterControl:
             "--redis_proxy_bind_address {}".format(daemon_id.get_ip_address()),
             "--cql_proxy_bind_address {}".format(daemon_id.get_ip_address()),
             "--pgsql_proxy_bind_address {}".format(daemon_id.get_ip_address()),
-            "--local_ip_for_outbound_sockets {}".format(daemon_id.get_ip_address()),
+            # "--local_ip_for_outbound_sockets {}".format(daemon_id.get_ip_address()),
+            "--local_ip_for_outbound_sockets 127.0.0.{}".format(daemon_id.index),
+            "--follower_unavailable_considered_failed_sec=15",
             # TODO ENG-2876: Enable this in master asvwell.
             "--use_cassandra_authentication={}".format(
                 str(self.options.use_cassandra_authentication).lower())
@@ -711,6 +714,7 @@ class ClusterControl:
             cmd_setup_redis_table = [yb_admin_binary_path,
                                      "--master_addresses",
                                      self.options.master_addresses,
+                                     "--yb_num_shards_per_tserver", str(self.options.num_shards_per_tserver),
                                      "setup_redis_table"]
             result = ""
             try:

it can repro with

./bin/yb-ctl create
./bin/yb-ctl setup_redis
./bin/yb-ctl add_node
# wait for tablets to balance
# comment out node 3 from /etc/hosts
./bin/yb-ctl remove_node 3
# wait the 15s from the follower_unavailable_considered_failed_sec flag

@spolitov
Copy link
Contributor

spolitov commented May 8, 2018

Fixed

@spolitov spolitov closed this as completed May 8, 2018
@spolitov
Copy link
Contributor

spolitov commented May 8, 2018

Commit 9ba8436

jasonyb pushed a commit that referenced this issue Mar 18, 2024
Add Oracle regexp_like(), regexp_count(), regexp_instr() and regexp_substr() functions.
jasonyb pushed a commit that referenced this issue Jun 11, 2024
PG-215: Doc Fixed links to usage examples,
devansh-ism pushed a commit to devansh-ism/yugabyte-db that referenced this issue Jul 17, 2024
Add support for testing of pgTap update scripts.

This commit adds several new make targets:

- make uninstall-all: remove ALL installed pgtap code. Unlike `make unintall`, this removes pgtap*, not just our defined targets. Useful when testing multiple versions of pgtap.
- make regress: run installcheck then print any diffs from expected output.
- make updatecheck: install an older version of pgTap from PGXN (controlled by $UPDATE_FROM; 0.95.0 by default), update to the current version via ALTER EXTENSION, then run installcheck.
- make results: runs `make test` and copies all result files to test/expected/. DO NOT RUN THIS UNLESS YOU'RE CERTAIN ALL YOUR TESTS ARE PASSING!

In addition to these changes, `make installcheck` now runs as many tests as possible in parallel. This is much faster than running them sequentially. The degree of parallelism can be controlled via `$PARALLEL_CONN`. Setting `$PARALLEL_CONN` to `1` will go back to a serial test schedule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This is an enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants