-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayCluster][Fix] Add expectations of RayCluster #2150
base: master
Are you sure you want to change the base?
Conversation
169770d
to
10120e3
Compare
Hi @Eikykun, thank you for the PR! I will review it next week. Are you on Ray Slack? We can iterate more quickly there since this is a large PR. My Slack handle is "Kai-Hsun Chen (ray team)". Thanks! |
I will review this PR tomorrow. |
cc @rueian Would you mind giving this PR a review? I think I don't have enough time to review it today. Thanks! |
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
Just wondering if the client-go's |
Apologies, I'm not quite clear about what "related informer cache" refers to. |
According to #715, the root cause is the stale informer cache, so I am wondering if the issue can be solved by fixing the cache, for example doing a manual |
I am reviewing this PR now. I will try to review this PR an iteration every 1 or 2 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reviewed a small part of this PR. I will try to do another iteration tomorrow.
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/active_expectation.go
Outdated
Show resolved
Hide resolved
Btw, @Eikykun would you mind rebasing with the master branch and resolving the conflict? Thanks! |
Gotit. From a problem-solving standpoint, if we don't rely on an informer in the controller and directly query the ApiServer for pods, the cache consistency issue with etcd wouldn't occur. However, this approach would increase network traffic and affect reconciliation efficiency. |
thanks for your review, I will review the pr issue and resolve the conflicts later. |
@Eikykun would you mind installing pre-commit https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md and fixing the linter issues? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the memory usage. In ReplicaSet's source code, it seems only track the number of Pods expect to be created or deleted per ReplicaSet.
Follow up for ^ |
Sorry, I didn't have time to reply a few days ago.
I started with |
Could you help approve the workflow? cc @kevin85421 |
@Eikykun, thank you for following up! Sorry for the late review. I had concerns about merging such a large change before Ray Summit. Now, I have enough time to verify the correctness and stability of this PR. This is also one of the most important stability improvements in the post-summit roadmap. https://docs.google.com/document/d/1YYuAQkHKz2UTFMnTDJLg4qnW2OAlYQqjvetP_nvt0nI/edit?tab=t.0 I will resume reviewing this PR this week. |
cc @MortalHappiness can you also give this PR a pass of review? |
A few questions I'd like to ask:
|
This might be an issue that was left over after the last simplification. Initially, I added many types like RayCluster, Service, etc., considering that more than the scale pod might require expectations. If we only consider the scaling logic for each group, we can significantly simplify the code. In fact, I recently streamlined the code and reduced the scaling expectation code to around 100 lines. You can find it in the latest commit. |
ray-operator/controllers/ray/expectations/scale_expectation_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments, could you rebase with the master branch?
ray-operator/controllers/ray/expectations/scale_expectation_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/scale_expectation_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/scale_expectation_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/scale_expectation_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/scale_expectation_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/scale_expectations_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/expectations/scale_expectations_test.go
Outdated
Show resolved
Hide resolved
By the way, maybe you missed this because this comment is folded. #2150 (comment) Could you either:
|
Thank you for your patient review. I have added some comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your hard work!
@kevin85421 Can you merge the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I am still reviewing it. I am looking forward to merging the PR.
ray-operator/controllers/ray/expectations/scale_expectations.go
Outdated
Show resolved
Hide resolved
// The first reconciliation created a Pod. If the Pod was quickly deleted from etcd by another component | ||
// before the second reconciliation. This would lead to never satisfying the expected condition. | ||
// Avoid this by setting a timeout. | ||
isPodSatisfied = errors.IsNotFound(err) && rp.recordTimestamp.Add(ExpectationsTimeout).Before(time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the error is not IsNotFound
. We will still cache it, even if it times out. Is it possible to cause some memory leak in some corner cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the error is not
IsNotFound
. We will still cache it, even if it times out. Is it possible to cause some memory leak in some corner cases?
If it is not an IsNotFound error, it indicates that there is an issue with the cache of the controllerManager
. This means that the object could not be getted from the cache correctly. This should not be seen as a corner case; rather, it should be considered a critical error. As we can observe from the controller-runtime's CacheReader.Get()
, these errors typically only occur when the cache is being used improperly.
https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/cache/internal/cache_reader.go#L57-L105
if a Pod is successfully stored in the cache, it should ultimately be getted from the cache as well. As long as the Pod is successfully getted, the expectation will be cleared. Therefore, there won't be any memory leaks in this process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering it a critical error that causes KubeRay to fail or leaving it as a memory leak are too aggressive for me.
Most users can tolerate creating additional Pods and then scaling them down, but they will complain if KubeRay crashes.
In ReplicaSet's implementation, the function SatisfiedExpectations
returns true if it expired.
ray-operator/controllers/ray/expectations/scale_expectations_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other issues (1, 2 don't need to be addressed in this PR):
-
We store
rayPod
in the indexer, whereas ReplicaSet only stores an integer, which is more memory-efficient. We can run some benchmarks after this PR is merged to check for any memory issues. If so, we can switch to ReplicaSet's implementation. -
There might be some corner cases for
suspend
. Maybe we can only calldeleteAllPods
when there are no in-flight requests (i.e. expectation is satisfied). -
Possible memory leak if KubeRay misses the resource event.
@@ -184,6 +187,8 @@ func (r *RayClusterReconciler) Reconcile(ctx context.Context, request ctrl.Reque | |||
|
|||
// No match found | |||
if errors.IsNotFound(err) { | |||
// Clear all related expectations | |||
rayClusterScaleExpectation.Delete(instance.Name, instance.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to cause a memory leak if KubeRay doesn't receive the resource event after the RayCluster CR is deleted? If it is possible, we should consider a solution, such as adding a finalizer to the RayCluster, to ensure the cleanup of the cache in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to cause a memory leak if KubeRay doesn't receive the resource event after the RayCluster CR is deleted? If it is possible, we should consider a solution, such as adding a finalizer to the RayCluster, to ensure the cleanup of the cache in a follow-up PR.
Perhaps we can cleanup when rayCluster.DeletionTimestamp.IsZero() == false
?This way, even if we lose the events of the RayCluster, we can still rely on the events from the Pods to trigger reconciliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can cleanup when rayCluster.DeletionTimestamp.IsZero() == false?
This still does not completely prevent the memory leak. Do I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can cleanup when rayCluster.DeletionTimestamp.IsZero() == false?
This still does not completely prevent the memory leak. Do I miss anything?
There is indeed such a possibility. Not too complex. I will address the solution in a follow-up PR.
@@ -804,6 +817,7 @@ func (r *RayClusterReconciler) reconcilePods(ctx context.Context, instance *rayv | |||
} | |||
logger.Info("reconcilePods", "The worker Pod has already been deleted", pod.Name) | |||
} else { | |||
rayClusterScaleExpectation.ExpectScalePod(pod.Namespace, instance.Name, worker.GroupName, pod.Name, expectations.Delete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion of Pods inside WorkersToDelete
is idempotent. Expectations
seem to be unnecessary in this case, but I am fine if we also use Expectations
to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion of Pods inside
WorkersToDelete
is idempotent.Expectations
seem to be unnecessary in this case, but I am fine if we also useExpectations
to track it.
Using Expectation here can help avoid repeatedly calling the APIServer to delete the same Pod.
Why are these changes needed?
This PR attempts to address issues #715 and #1936 by adding expectation capabilities to ensure the pod is in the desired state during the next Reconcile following pod deletion/creation.
Similar solutions can be referred to at:
Related issue number
Checks