Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration of 0 in withMultipleHostAndPort not rotating host. #389

Open
bt-wil opened this issue Dec 16, 2024 · 4 comments
Open

Duration of 0 in withMultipleHostAndPort not rotating host. #389

bt-wil opened this issue Dec 16, 2024 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@bt-wil
Copy link

bt-wil commented Dec 16, 2024

When Duration.ofMillis(0) is passed into withMultipleHostAndPort, only the first host is used. The expectation is that hosts will be rotated on subsequent uses but without blacklisting.

package org.kiwiproject.consul;

import com.google.common.net.HostAndPort;

import java.time.Duration;
import java.time.Instant;
import java.util.Arrays;

public class CheckConsul {
    public static void main(String[] args) {
        var consul = Consul.builder().withPing(false).withMultipleHostAndPort(
                        Arrays.asList(
                                HostAndPort.fromString("a.foo.bar.baz:8500"),
                                HostAndPort.fromString("b.foo.bar.baz:8500")
                        ),
                        Duration.ofMillis(0)
                )
                .build();

        var client = consul.keyValueClient();

        for (int i = 0; i < 3; i++) {
            try {
                var res = client.getValueAsString("foo").orElse("");
                System.out.println(Instant.now() + " Result: " + res);
            } catch (Exception e) {
                System.out.println(Instant.now() + " Error: " + e);
            }
            System.out.println("Sleeping");
            try {
                Thread.sleep(1000);
            } catch (InterruptedException ignored) {
            }
        }
    }
}

Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
2024-12-16T18:53:46.228475Z Error: org.kiwiproject.consul.util.failover.MaxFailoverAttemptsExceededException: Reached max failover attempts (10). Giving up.
Sleeping
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
2024-12-16T18:53:47.243308Z Error: org.kiwiproject.consul.util.failover.MaxFailoverAttemptsExceededException: Reached max failover attempts (10). Giving up.
Sleeping
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
Next request: http://a.foo.bar.baz:8500/v1/kv/foo
2024-12-16T18:53:48.250436Z Error: org.kiwiproject.consul.util.failover.MaxFailoverAttemptsExceededException: Reached max failover attempts (10). Giving up.
Sleeping
@sleberknight
Copy link
Member

This is discussed in #319 which I added about a year ago. We never did anything else with it because we made sure our service code always uses a positive timeout and we also put a maximum number of failover attempts in #304.

I think the correct answer is to change the code to require a positive timeout, because a zero timeout really isn't a timeout at all (again, see #319). But that's the way the code was when we started maintaining this project.

Anyway, the constraint in the constructor of BlacklistingConsulFailoverStrategy is probably the correct location to add the check, though maybe also in the ConsulFailoverInterceptor constructor that accepts a collection of targets and a timeout.

If someone wants to be cute and put a timeout of 1 nanosecond, it probably won't work but that's on them.

Last, I think we can classify this as a bug and it can be changed in a patch or minor version, not requiring a major version bump.

@sleberknight sleberknight self-assigned this Dec 17, 2024
@sleberknight sleberknight added the bug Something isn't working label Dec 17, 2024
@sleberknight sleberknight added this to the 1.4.3 milestone Dec 17, 2024
@bt-wil
Copy link
Author

bt-wil commented Dec 20, 2024

Thanks, I didn't realized there was a ticket already. My thought is that if zero duration is allowed, it would cycle through all the hosts without blacklisting.

If someone wants to be cute and put a timeout of 1 nanosecond, it probably won't work but that's on them.

The use case that I have is: use the local consul agent first and fallback to the cluster. If any issue occurred, it will proceed to the next available host. Seems like it can be simulated with some small positive timeout duration.

@sleberknight
Copy link
Member

It seems wrong to me to allow zero duration and never blacklist a target in a class named BlacklistingConsulFailoverStrategy, i.e., it's a class that was originally intended specifically for blacklisting targets. At least, that's what it seems the original authors intended. Plus, it would add more complexity to the class to not blacklist when the timeout is zero.

I can see why it's confusing though, since you need to read the documentation to know that withMultipleHostAndPort uses a BlacklistingConsulFailoverStrategy under the hood. I am pretty sure I added that to the Javadocs to make it more clear without needing to change the public API.

Maybe your use case might be better served using a simpler ConsulFailoverStrategy implementation that does a round robin between the specified target Consul servers with little or no delay between the attempts. That implementation could allow a positive or zero timeout, and would simply cycle through each of the targets (with the optional timeout). The ConsulFailoverInterceptor prevents infinite looping via maxFailoverAttempts which defaults to 10.

Then, when constructing the Consul instance you would use the Consul.Builder#withFailoverInterceptorUsingStrategy method instead of withMultipleHostAndPort.

For example, something like:

// assume targets and timeoutMillis are already defined

var roundRobinStrategy = new RoundRobinConsulFailoverStrategy(targets, timeoutMillis);

var consul = Consul.builder()
        .withPing(false)
        .withFailoverInterceptorUsingStrategy(roundRobinStrategy)
        .build()

As a first guess, the implementation of RoundRobinConsulFailoverStrategy needs to keep track of the last target used, and return the next one in computeNextStage each time it is called. I think the isRequestViable would always return true since it's always going to try the next target. And, the markRequestFailed is probably an empty implementation.

Going further, we could add RoundRobinConsulFailoverStrategy as a new failover strategy, and also modify withMultipleHostAndPort so that it uses BlacklistingConsulFailoverStrategy if the timeout is zero, or RoundRobinConsulFailoverStrategy if the timeout is zero. I would hesitate to overload withMultipleHostAndPort with a single-argument variant that accepts only a Collection<HostAndPort> hostAndPort to avoid adding more surface area to the Consul.Builder API, but would consider it. But a single-argument withMultipleHostAndPort would be the same as using a zero timeout and would use the round robin strategy internally.

@bt-wil
Copy link
Author

bt-wil commented Dec 31, 2024

A round robin failover strategy make sense. I tried to do a simple implementation from your suggestion, but it need an extra state to know when to reset.

diff --git a/src/main/java/org/kiwiproject/consul/util/failover/ConsulFailoverInterceptor.java b/src/main/java/org/kiwiproject/consul/util/failover/ConsulFailoverInterceptor.java
index 4f6e8e3..93ab07d 100644
--- a/src/main/java/org/kiwiproject/consul/util/failover/ConsulFailoverInterceptor.java
+++ b/src/main/java/org/kiwiproject/consul/util/failover/ConsulFailoverInterceptor.java
@@ -86,55 +86,59 @@ public class ConsulFailoverInterceptor implements Interceptor {
     @Override
     public Response intercept(Chain chain) {

-        // The original request
-        Request originalRequest = chain.request();
-
-        // If it is possible to do a failover on the first request (as in, one or more
-        // targets are viable)
-        if (strategy.isRequestViable(originalRequest)) {
-
-            // Initially, we have an inflight request
-            Request previousRequest = originalRequest;
-
-            // Note:
-            // The previousResponse is never used here and is always null when calling computeNextStage.
-            // See discussion in issue #195 ("previousResponse is always null in ConsulFailoverInterceptor#intercept")
-            // Link: https://github.com/kiwiproject/consul-client/issues/195
-
-            Optional<Request> maybeNextRequest;
-            var currentAttempt = 1;
-
-            // Get the next viable request
-            while ((maybeNextRequest = strategy.computeNextStage(previousRequest, null)).isPresent()) {
-                // Get the response from the last viable request
-                Exception exception;
-                Request nextRequest = maybeNextRequest.orElseThrow(IllegalStateException::new);
-                try {
-
-                    // Cache for the next cycle if needed
-                    previousRequest = nextRequest;
-
-                    // Anything other than an exception is valid here.
-                    // This is because a 400-series error is a valid code (Permission Denied/Key Not Found)
-                    return chain.proceed(nextRequest);
-                } catch (Exception ex) {
-                    logExceptionThrownOnRequest(LOG, ex, nextRequest);
-                    strategy.markRequestFailed(nextRequest);
-                    exception = ex;
+        try {
+            // The original request
+            Request originalRequest = chain.request();
+
+            // If it is possible to do a failover on the first request (as in, one or more
+            // targets are viable)
+            if (strategy.isRequestViable(originalRequest)) {
+
+                // Initially, we have an inflight request
+                Request previousRequest = originalRequest;
+
+                // Note:
+                // The previousResponse is never used here and is always null when calling computeNextStage.
+                // See discussion in issue #195 ("previousResponse is always null in ConsulFailoverInterceptor#intercept")
+                // Link: https://github.com/kiwiproject/consul-client/issues/195
+
+                Optional<Request> maybeNextRequest;
+                var currentAttempt = 1;
+
+                // Get the next viable request
+                while ((maybeNextRequest = strategy.computeNextStage(previousRequest, null)).isPresent()) {
+                    // Get the response from the last viable request
+                    Exception exception;
+                    Request nextRequest = maybeNextRequest.orElseThrow(IllegalStateException::new);
+                    try {
+
+                        // Cache for the next cycle if needed
+                        previousRequest = nextRequest;
+
+                        // Anything other than an exception is valid here.
+                        // This is because a 400-series error is a valid code (Permission Denied/Key Not Found)
+                        return chain.proceed(nextRequest);
+                    } catch (Exception ex) {
+                        logExceptionThrownOnRequest(LOG, ex, nextRequest);
+                        strategy.markRequestFailed(nextRequest);
+                        exception = ex;
+                    }
+
+                    if (++currentAttempt > maxFailoverAttempts) {
+                        throw new MaxFailoverAttemptsExceededException(
+                                String.format("Reached max failover attempts (%d). Giving up.", maxFailoverAttempts),
+                                exception);
+                    }
                 }

-                if (++currentAttempt > maxFailoverAttempts) {
-                    throw new MaxFailoverAttemptsExceededException(
-                            String.format("Reached max failover attempts (%d). Giving up.", maxFailoverAttempts),
-                            exception);
-                }
-            }
-
-            throw new ConsulException("Unable to successfully determine a viable host for communication.");
+                throw new ConsulException("Unable to successfully determine a viable host for communication.");

-        } else {
-            throw new ConsulException(
-                    "Consul failover strategy has determined that there are no viable hosts remaining.");
+            } else {
+                throw new ConsulException(
+                        "Consul failover strategy has determined that there are no viable hosts remaining.");
+            }
+        } finally {
+            strategy.reset();
         }
     }

diff --git a/src/main/java/org/kiwiproject/consul/util/failover/strategy/BlacklistingConsulFailoverStrategy.java b/src/main/java/org/kiwiproject/consul/util/failover/strategy/BlacklistingConsulFailoverStrategy.java
index 9928a77..8715091 100644
--- a/src/main/java/org/kiwiproject/consul/util/failover/strategy/BlacklistingConsulFailoverStrategy.java
+++ b/src/main/java/org/kiwiproject/consul/util/failover/strategy/BlacklistingConsulFailoverStrategy.java
@@ -126,6 +126,10 @@ public class BlacklistingConsulFailoverStrategy implements ConsulFailoverStrateg
         this.blacklist.put(fromRequest(current), Instant.now());
     }

+    @Override
+    public void reset() {
+    }
+
     private HostAndPort fromRequest(Request request) {
         return HostAndPort.fromParts(request.url().host(), request.url().port());
     }
diff --git a/src/main/java/org/kiwiproject/consul/util/failover/strategy/ConsulFailoverStrategy.java b/src/main/java/org/kiwiproject/consul/util/failover/strategy/ConsulFailoverStrategy.java
index 8b60aa2..ab31a80 100644
--- a/src/main/java/org/kiwiproject/consul/util/failover/strategy/ConsulFailoverStrategy.java
+++ b/src/main/java/org/kiwiproject/consul/util/failover/strategy/ConsulFailoverStrategy.java
@@ -40,4 +40,8 @@ public interface ConsulFailoverStrategy {
      */
     void markRequestFailed(@NonNull Request current);

+    /**
+     * Reset the state when all options are exhausted (if needed).
+     */
+    void reset();
 }
diff --git a/src/main/java/org/kiwiproject/consul/util/failover/strategy/RoundRobinFailoverStrategy.java b/src/main/java/org/kiwiproject/consul/util/failover/strategy/RoundRobinFailoverStrategy.java
new file mode 100644
index 0000000..9ecf321
--- /dev/null
+++ b/src/main/java/org/kiwiproject/consul/util/failover/strategy/RoundRobinFailoverStrategy.java
@@ -0,0 +1,52 @@
+package org.kiwiproject.consul.util.failover.strategy;
+
+import com.google.common.net.HostAndPort;
+import okhttp3.HttpUrl;
+import okhttp3.Request;
+import okhttp3.Response;
+import org.checkerframework.checker.nullness.qual.NonNull;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+
+public class RoundRobinFailoverStrategy implements ConsulFailoverStrategy {
+    private static final int INITIAL_INDEX = -1;
+
+    private int currentIndex = INITIAL_INDEX;
+    private final List<HostAndPort> targets;
+
+    public RoundRobinFailoverStrategy(Collection<HostAndPort> targets) {
+        this.targets = new ArrayList<>(targets);
+    }
+
+    @Override
+    public @NonNull Optional<Request> computeNextStage(@NonNull Request previousRequest, @Nullable Response previousResponse) {
+        var nextIndex = currentIndex + 1;
+
+        if (nextIndex >= targets.size()) {
+            return Optional.empty();
+        }
+
+        final HostAndPort next = targets.get(nextIndex);
+        final HttpUrl nextURL = previousRequest.url().newBuilder().host(next.getHost()).port(next.getPort()).build();
+        return Optional.of(previousRequest.newBuilder().url(nextURL).build());
+    }
+
+    @Override
+    public boolean isRequestViable(@NonNull Request current) {
+        return true;
+    }
+
+    @Override
+    public void markRequestFailed(@NonNull Request current) {
+        this.currentIndex++;
+    }
+
+    @Override
+    public void reset() {
+        this.currentIndex = INITIAL_INDEX;
+    }
+}

From this thread, the client should be thread safe: rickfast/consul-client#69 , so likely will need thread-local to store the current index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants