-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-10975 Add Kubernetes Leader Election and State Provider #6779
Conversation
return new StandardLeaderElectionCommandProvider(kubernetesClientProvider, namespace); | ||
} | ||
|
||
private void registerLeaderElectionCommand(final String roleName, final LeaderElectionStateChangeListener listener, final String participantId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of check-then-modify here. I think we need to ensure that this is thread safe and synchronize the method or add locking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is only called from register
, which is marked as synchronized
, do you think any additional synchronization is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, good call. So technically it's correct/safe. But I would generally mark this as synchronized
as well because it has virtually zero cost, since the synchronization lock has already been obtained, but it makes it more clear that the actions can only be taken while synchronized. Or alternatively documenting via JavaDoc that it should only be called while synchronized. But I'd prefer just synchronizing the method itself.
final ObjectMeta metadata = configMap.getMetadata(); | ||
final String resourceVersion = metadata.getResourceVersion(); | ||
try { | ||
return resourceVersion == null ? UNKNOWN_VERSION : Long.parseLong(resourceVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is valid. According to https://kubernetes.io/docs/reference/using-api/api-concepts/
You must not assume resource versions are numeric or collatable. API clients may only compare two resource versions for equality (this means that you must not compare resource versions for greater-than or less-than relationships).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would propose that we update our StateMap as follows:
Make long getVersion()
deprecated in favor of a new String getStateVersion()
Remove long getVersion()
in version 2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about this approach, it probably depends on the Kubernetes implementation. One option that comes to mind is creating using a hash of the Resource Version. Another option could be using a custom metadata field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere in the code base that I see long getVersion()
being used, actually, is to check if the value is -1
(indicating that the state hasn't actually been stored anywhere). So it probably actually makes sense to use an Optional<String>
as the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, adding a new Optional<String> getStateVersion()
method to StateMap
sounds like a good approach.
So I crated a two-node nifi cluster using GKE to test this. On startup, things work well. Both nodes join the cluster. I can see that state is getting stored/recovered properly using ListGCSBucket. If I then disconnect the node that is Primary/Coordinator, I see that the other node is elected. But if I then reconnect the disconnected node, it gets into a bad state.
So looks like it is not properly relinquishing the ownership of the lease. I presume this is what causes both nodes to believe that they are the coordinator/primary. |
Thanks for the testing and feedback @markap14! As it turns out, the Fabric8 Client version 6.3.0 includes a fix for relinquishing leadership described in Issue 4547. I will include an update to client version 6.3.0 as part of additional updates for State Map versioning. |
Great, thanks @exceptionfactory!
And also in the |
Thanks for the recommendation on the Docker script @markap14. I pushed updates with the following changes:
|
963fd18
to
89cd267
Compare
Thanks @exceptionfactory . Pulled the latest, rebuilt everything, rebuilt the docker image. Interestingly, it no longer worked. I got errors on startup telling me that the URI was invalid when calling |
Also, just to ensure that I was using the latest, I did a |
Thanks for the feedback @markap14. On further evaluation of the disconnect and reconnect behavior, I realized the In addition to those changes, I removed the These changes resulted in consistent behavior with various disconnect and reconnect attempts. |
Thanks for the latest updated @exceptionfactory . Ran into another issue when testing, unfortunately.
Even after I waited over an hour the lease remains there. If I look at it:
We can see here that date is well past the renewTime. (10:11:34 PM = 22:11:34 PM vs 22:04:13 as the renew time).
As soon as I delete the lease ( |
Good find on the expired Lease handling @markap14, thanks for the additional testing. The implementation of To make the implementation contract clearer, I also changed the return of Following those changes, shutting down all nodes, then starting up a single node that was not previously the Cluster Coordinator or Primary Node worked as expected. The single node was able to become the leader for those roles. It is worth noting that the Lease objects continue to live in the Kubernetes namespace even if all nodes have been shutdown for an extended period. The |
...ework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java
Show resolved
Hide resolved
It looks like a The maximum size of a ConfigMap is 1MiB. Do you anticipate any compatibility issues where a use case would work with Zookeeper but would not work with ConfigMaps? |
Thanks for the feedback @michael81877. The standard ZooKeeper node storage is also limited to 1 MB, so the standard size limits of a Kubernetes ConfigMap align well with current behavior. With that background and similar size limitations, I do not expect any compatibility issues based on size limitations. It is possible that other behavioral nuances could surface as a result of these alternative State Management and Leader Election implementations. |
02ce8ac
to
1514148
Compare
@markap14 I rebased the branch to the current |
1514148
to
8d024fc
Compare
- Added Kubernetes Leader Election Manager based on Kubernetes Leases - Added Kubernetes State Provider based on Kubernetes ConfigMaps - Added nifi-kubernetes-client for generalized access to Fabric8 Kubernetes Client - Added nifi.cluster.leader.election.implementation Property defaulting to CuratorLeaderElectionManager - Refactored LeaderElectionManager to nifi-framework-api for Extension Discovering Manager - Refactored shared ZooKeeper configuration to nifi-framework-cluster-zookeeper
- Upgraded Kubernetes Client from 6.2.0 to 6.3.0 - Added getStateVersion to StateMap and deprecated getVersion - Updated Docker start.sh with additional properties
- Changed LeaderElectionManager.getLeader() return to Optional String
8d024fc
to
c2df141
Compare
- Updated versions to match current main branch and avoid reverting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran this in k8s and tested all of the basic clustering scenarios like disconnecting/connecting nodes, killing nodes, storing cluster state, etc., everything looks good, nice work! Going to merge this.
…6779) * NIFI-10975 Added Kubernetes Leader Election and State Provider - Added Kubernetes Leader Election Manager based on Kubernetes Leases - Added Kubernetes State Provider based on Kubernetes ConfigMaps - Added nifi-kubernetes-client for generalized access to Fabric8 Kubernetes Client - Added nifi.cluster.leader.election.implementation Property defaulting to CuratorLeaderElectionManager - Refactored LeaderElectionManager to nifi-framework-api for Extension Discovering Manager - Refactored shared ZooKeeper configuration to nifi-framework-cluster-zookeeper * NIFI-10975 Updated Kubernetes Client and StateMap - Upgraded Kubernetes Client from 6.2.0 to 6.3.0 - Added getStateVersion to StateMap and deprecated getVersion - Updated Docker start.sh with additional properties * NIFI-10975 Corrected MockStateManager.assertStateSet() * NIFI-10975 Upgraded Kubernetes Client from 6.3.0 to 6.3.1 * NIFI-10975 Corrected unregister leader and disabled release on cancel * NIFI-10975 Corrected findLeader handling of Lease expiration - Changed LeaderElectionManager.getLeader() return to Optional String * NIFI-10975 Corrected StandardNiFiServiceFacade handling of Optional Leader * NIFI-10975 Changed getLeader() to call findLeader() to avoid stale cached values * NIFI-10975 Updated LeaderElectionCommand to run LeaderElector in loop * NIFI-10975 Rebased on project version 2.0.0-SNAPSHOT * NIFI-10975 Corrected Gson and AspectJ versions - Updated versions to match current main branch and avoid reverting
Summary
NIFI-10975 Adds an initial implementation of Kubernetes cluster leader election and state management, supporting deployments without the need for ZooKeeper.
State Management Implementation
The state management implementation uses Kubernetes ConfigMaps to persist cluster information for stateful components. The
KubernetesConfigMapStateProvider
uses the standarddata
property, but encodes property names using Base64 URL encoding without padding to meet ConfigMap object property naming requirements.The new State Provider can be configured in
state-management.xml
using the following element definition, which is included in a commented section of the configuration:Leader Election Implementation
The leader election implementation uses Kubernetes Leases for distributed tracking of the current cluster leader. Kubernetes Lease names must adhere to RFC 1123 subdomain naming requirements, requiring a mapping from NiFi application names to Lease names.
The new Leader Election implementation can be configured using a new property in
nifi.properties
as follows:Framework Changes
The leader election implementation required promoting the
LeaderElectionManager
interface tonifi-framework-api
to support NAR bundle extension loading. TheLeaderElectionManager
had two methods without runtime references, which were removed. Refactoring also involved creating a newnifi-framework-leader-election-shared
module for abstracting tracking operations.The
nifi.properties
configuration includes a new property with a default value ofCuratorLeaderElectionManager
, which provides current cluster coordination using ZooKeeper.Kubernetes Client
The implementation includes a new
nifi-kubernetes-client
library which provides utility components for Kubernetes Client access and service namespace determination.The
nifi-kubernetes-client
library depends on the Fabric8 Kubernetes Client which supports current versions of Kubernetes and provides separation of API and implementation classes.Both the State Provider and Leader Election Manager implementations attempt to resolve the Kubernetes namespace based on the standard Service Account namespace secret. In absence of a readable namespace secret, the provider returns
default
as the namespace for storing Leases and ConfigMaps.Additional Changes
Additional changes include removing several integration test classes specific to ZooKeeper. These integration tests are less useful with current system integration tests run on a scheduled basis.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation