Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to enforce leadership loss/switch. #4477

Closed
shreemaan-abhishek opened this issue Oct 5, 2022 · 5 comments · Fixed by #4977
Closed

A way to enforce leadership loss/switch. #4477

shreemaan-abhishek opened this issue Oct 5, 2022 · 5 comments · Fixed by #4977
Assignees
Milestone

Comments

@shreemaan-abhishek
Copy link

shreemaan-abhishek commented Oct 5, 2022

Is your enhancement related to a problem? Please describe

In the scope of leader election in a distributed environment. One of the best practices is that a given instance should not be the leader forever. After a certain interval of time, the leader instance should give up leadership so that the election happens again and a new leader is chosen.

Using fabric8 kubernetes client to implement leader election among replicas of a pod, we are able to observe a leader is elected. And leadership is transferred to a different pod once the leader pod is dead. The only missing part of the puzzle is, the fabric8 kubernetes client not having a way to enforce the leader pod to lose its leadership status.

Such an expected implementation can be used by the client to enforce a leadership switch periodically using appropriate methods provided by the JDK.

Describe the solution you'd like

The solution can be as simple as this method being marked as public.

Describe alternatives you've considered

To enforce leadership loss we can:

  • call ExecutorService.shutdownNow() on the ExecutorService running leader election. (Or basically, try to interrupt/kill the leader election thread)

  • update the lease lock with a LeaderElectionRecord having an empty holder identity.

Additional context

After taking a look at the example code for leader election shared in this repo, I think the fabric8 leader election interface was never intended to be used for a use case like this. (Leader Election among replicas)

@shawkins
Copy link
Contributor

shawkins commented Oct 5, 2022

With #4318 you can do:

LeaderElector elector = kubernetesClient.leaderElector().withConfig(new LeaderElectionConfigBuilder().withReleaseOnCancel()...

Future<?> theFuture = elector.start();
...
// then when you call cancel on the future, it will release the leadership if currently held
theFuture.cancel();

@shreemaan-abhishek
Copy link
Author

shreemaan-abhishek commented Oct 6, 2022

@shawkins - Thank you so much for the response.

We tried this out, the idea was to switch the leadership periodically from one instance to another by making the leader instance lose the leadership. We observed the leadership status being passed on to other instances a couple of times after this, theFuture.cancel(); did not yield leadership loss.

This is why I am looking for an appropriate way to do this using an API method.

@shawkins
Copy link
Contributor

shawkins commented Oct 6, 2022

We observed the leadership status being passed on to other instances a couple of times after this, theFuture.cancel(); did not yield leadership loss.

Yes it will only release if the current instance is the leader. That means you can get the behavior you want by using the onStartLeading callback to create a timer task or similar to cancel the current leader after whatever interval you see fit. Something like:

  private void startLeaderElector() {
      AtomicReference<Future<?>> startFuture = new AtomicReference<>();
      LeaderElector elector = kubernetesClient.leaderElector()
          .withConfig(
              new LeaderElectionConfigBuilder()
                  .withReleaseOnCancel()
                  ...
                  .withLeaderCallbacks(new LeaderCallbacks(
                      () -> {
                        // fine to run over the default pool as the start work is non-blocking
                        CompletableFuture.delayedExecutor(30, TimeUnit.MINUTES).execute(() -> {
                          startFuture.get().cancel(true);
                        });
                        // do other stuff
                      },
                      () -> {
                        startLeaderElector();
                        // do other stuff
                      },
                      s -> { // do something }))
                  .build())
          .build();
      startFuture.set(elector.start());
    }

Obviously that's not very elegant, but I don't see that the go client implementation is designed for this either.

Enhancements that would make this easier:

  • another config value for the total number of allowed renews or total renewal duration
  • add a cancel method to the LeaderElector, and pass the elector to at least the onStartLeading callback - but the signature change is breaking.

It may also be good to clarify / enforce that a LeaderElector instance should only have 1 election running at a time - simultaneous calls to run / start should be disallowed.

@manusa any thoughts?

@stale
Copy link

stale bot commented Jan 4, 2023

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@stale stale bot added the status/stale label Jan 4, 2023
@shreemaan-abhishek
Copy link
Author

ping!

@stale stale bot removed the status/stale label Jan 5, 2023
shawkins added a commit to shawkins/kubernetes-client that referenced this issue Mar 18, 2023
shawkins added a commit to shawkins/kubernetes-client that referenced this issue Mar 18, 2023
@shawkins shawkins self-assigned this Mar 19, 2023
@manusa manusa added this to the 6.6.0 milestone Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants