-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Common Area and remove leader election #3964
Conversation
/test-multicluster-e2e |
Codecov Report
@@ Coverage Diff @@
## main #3964 +/- ##
==========================================
+ Coverage 64.34% 65.90% +1.56%
==========================================
Files 294 308 +14
Lines 43634 44090 +456
==========================================
+ Hits 28076 29058 +982
+ Misses 13282 12705 -577
- Partials 2276 2327 +51
*This pull request uses carry forward flags. Click here to find out more.
|
f2b291e
to
5336ee7
Compare
/test-multicluster-e2e |
/test-multicluster-integration |
multicluster/controllers/multicluster/commonarea/remote_common_area_manager.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Show resolved
Hide resolved
memberAnnounce.ClusterID) | ||
data.leaderStatus.reason = ReasonNotElectedLeader | ||
data.leaderStatus.reason = ReasonNotConnectedLeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. Here, the cluster is a leader but not connected, or it is not a leader at all? We should revise either the message or the reason, depending on which one is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a temporary status indicates the cluster is not confirmed as the leader yet before validation on the leader is passed. It should be overwritten soon as long as the following validation codes passed. I renamed it as ReasonNotLeader
.
multicluster/controllers/multicluster/commonarea/remote_common_area_manager.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/commonarea/leader_checker.go
Outdated
Show resolved
Hide resolved
3abf519
to
a31e7f3
Compare
/test-multicluster-e2e |
multicluster/controllers/multicluster/commonarea/remote_common_area.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/commonarea/remote_common_area_manager.go
Outdated
Show resolved
Hide resolved
a31e7f3
to
9bbea2f
Compare
/test-multicluster-e2e |
rc.Stop() | ||
} | ||
delete(r.remoteCommonAreas, clusterID) | ||
} | ||
case <-ticker.C: | ||
r.RunLeaderElection() | ||
// Check the leader connection status periodically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should revisit the relation between remoteCommonAreaManager and RemoteCommonArea. If we just have a single leader cluster, then we should just have one RemoteCommonArea? Why we still need the complex logics of multiple RemoteCommonAreas, and async updates between remoteCommonAreaManager and RemoteCommonArea?
Probably we should think about the implementation from scratch, rather than trying to minimize the changes based on the existing code that wants to cover multiple leaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will check how to refine this part.
9bbea2f
to
4e6bb04
Compare
4e6bb04
to
8837ba8
Compare
/test-multicluster-e2e |
1 similar comment
/test-multicluster-e2e |
multicluster/controllers/multicluster/commonarea/remote_common_area.go
Outdated
Show resolved
Hide resolved
// Client grants read/write to the Namespace of the cluster that is backing this CommonArea. | ||
client.Client | ||
|
||
// GetClusterID returns the clusterID of the cluster accessed by this CommonArea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "cluster ID of the leader cluster"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, comment is updated.
multicluster/controllers/multicluster/commonarea/remote_common_area.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
func (r *MemberClusterSetReconciler) updateActiveLeader(cluster commonarea.RemoteCommonArea) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should remove this func, and put the logics into checkLeaderConnection() directly, as the two cases (cluster == nil and cluster != nil) should share no common code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's removed now.
r.remoteCommonArea = cluster | ||
if cluster != nil { | ||
if err := cluster.StartWatching(); err != nil { | ||
klog.ErrorS(err, "Failed to start watching events") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen? Should we retry StartWatching()?Should we set r.installedLeader.connected = true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has been refined and moved to remote common area now.
remoteCommonAreas := r.remoteCommonAreaManager.GetRemoteCommonAreas() | ||
if len(remoteCommonAreas) <= 0 { | ||
return nil, "", errors.New("ClusterSet has not been set up properly, no available Common Area") | ||
if r.remoteCommonArea.IsConnected() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is possible StartWatching() not called yet. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK, StartWatching() is for importer part. This GetRemoteCommonAreaAndLocalID
is only used for exporter. Exporter can work as long as the client can talk to leader cluster.
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
8837ba8
to
a3c2844
Compare
/test-multicluster-e2e |
// Client grants read/write to the Namespace of the cluster that is backing this CommonArea. | ||
client.Client | ||
|
||
// GetClusterID returns the clusterID of the leader cluster accessed by this CommonArea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "accessed by this CommonArea" (do not understand what it means).
string(r.remoteCommonAreaManager.GetLocalClusterID()), | ||
r.remoteCommonAreaManager.GetNamespace(), | ||
string(r.GetLocalClusterID()), | ||
r.localNamespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, RemoteCommonArea
doesn't have local Cluster ID/Namespace information, we set it in RemoteCommonAreaManger
before. Since I deleted the RemoteCommonAreaManger
, the localNamespace is passed to RemoteCommonArea
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/commonarea/remote_common_area.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
a3c2844
to
1e65ddf
Compare
// MemberClusterSetReconciler reconciles a ClusterSet object in the member cluster deployment. | ||
type MemberClusterSetReconciler struct { | ||
client.Client | ||
Scheme *runtime.Scheme | ||
Namespace string | ||
mutex sync.RWMutex | ||
// commonAreaLock protects the access to RemoteCommonArea which presents the leader cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "which presents the leader cluster"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -250,12 +240,20 @@ func (r *MemberClusterSetReconciler) updateStatus() { | |||
status := multiclusterv1alpha1.ClusterSetStatus{} | |||
status.TotalClusters = int32(len(r.clusterSetConfig.Spec.Members)) | |||
status.ObservedGeneration = r.clusterSetConfig.Generation | |||
status.ClusterStatuses = r.remoteCommonAreaManager.GetMemberClusterStatues() | |||
status.ClusterStatuses = []multiclusterv1alpha1.ClusterStatus{} | |||
if r.remoteCommonArea != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need to lock from this line to line 251?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, changed the scope.
secretName := addedLeader.Secret | ||
klog.InfoS("ClusterSet update", "old", currentCommonArea.GetClusterID(), "new", newLeader) | ||
klog.InfoS("Stopping old RemoteCommonArea", "cluster", clusterID) | ||
r.remoteCommonArea.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check error? I saw you checked Stop() error at line 95. Could you check Stop() can really fail or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider currentCommonArea.Stop()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
multicluster/controllers/multicluster/member_clusterset_controller.go
Outdated
Show resolved
Hide resolved
klog.InfoS("Creating RemoteCommonArea", "cluster", clusterID) | ||
// Read secret to access the leader cluster. Assume secret is present in the same Namespace as the ClusterSet. | ||
secret, err := r.getSecretForLeader(secretName, clusterSet.GetNamespace()) | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you change the block to:
if err != nil {
klog.ErrorS(err, "Failed to get Secret to create RemoteCommonArea", "secret", secretName, "cluster", clusterID)
return err
}
...
return nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
1e65ddf
to
750a42f
Compare
/test-multicluster-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
1. Antrea Multi-cluster will support only one leader cluster. There is no need to set up common area manager and do leader election, so clean up remote common area manager related codes and refactor member clusterset reconciler to handle connection/disconnection cases between member and leader clusters 2. Remove unused clusterset webhook 3. Add schema to valid the number of leader cluster in ClusterSet CR. Signed-off-by: Lan Luo <[email protected]>
750a42f
to
9ac61cf
Compare
no need to set up CommonArea manager and do leader election, so clean up
remote CommonArea manager related codes and refactor member ClusterSet
reconciler to handle connection/disconnection cases between member and
leader clusters
Signed-off-by: Lan Luo [email protected]