-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The old revision pod is still in running state with 100% traffic routing to new revision #755
Comments
@jessiezcc: GitHub didn't allow me to assign the following users: user. Note that only elafros members and repo collaborators can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I think 1->0 scaling should fix this. @akyyy could you have a look? |
When you change the traffic weights, the activator may not be involved if the revisions are active all the time. So activator could not fix this. |
Actually, if the expectation is the old pod should be gone eventually (e.g. 5 minutes by default), then yes, the new 1->0 code path can fix this. |
I tested with scale to zero turned on with Joe, the old revision pod did go
away.
The question here is: Should it work even when enable-scale-to-zero is
turned off?
…On Fri, Apr 27, 2018 at 10:42 AM akyyy ***@***.***> wrote:
Actually, if the expectation is the old pod should be gone *eventually*
(e.g. 5 minutes by default), then yes, the new 1->0 code path can fix this.
To enable activator, you can set enable-scale-to-zero
<https://github.com/elafros/elafros/blob/fb6f994c56b4a3cdbe7e518d511b1186857ca0c0/elaconfig.yaml#L63>
to true.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AkL1roiE9GCOmq9GoEw5imngov0qBuuSks5ts1h_gaJpZM4Tojvb>
.
|
yes, this should still work when enable-scale-to-zero is turned off. when the revision is no longer routable, it should be transitioned to the Retired state and be torn down. it sounds like this is not happening. |
@josephburnett Unless @markusthoemmes is actively working on this, I'd like to investigate the issue. |
I've reproduced the issue (with enable-scale-to-zero turned off), but I'm wondering about the UX. You see, after the fact, the user can split the traffic between the old and new revisions and they will both continue to work. So automatically deleting a revision with 0% traffic routed to it presumes that the user will not want to switch traffic back to that revision later. For instance, they might deploy a new version of an application and after 100% of traffic is routed to the new revision, they could discover a problem and want to route the majority of traffic to the previous revision. Another way of looking at the UX is in terms of ease of reversing actions. If the user splits traffic 99% to the new revision and 1% to the old revision, they can back out of this very easily. But if they end up with 100% of traffic going to the new revision and we automatically prune the old revision, it’s harder for them to reverse what they just did (suppose they made a mistake). I wonder if we should make scaling to zero more intelligent and have it apply to revisions which are no longer routable, even if enable-scale-to-zero is turned off. That way, old revision pods could be resurrected if needed. Admittedly that adds complexity and might confuse some users. Another option would be to leave the behaviour the way it is, especially now that enable-scale-to-zero is turned on by default. It may be reasonable for expect users who choose to turn off enable-scale-to-zero to manage their revisions more carefully and to cope with pruning unroutable revisions. I don't like this option as we are effectively leaking revision pods. Thoughts? |
I don't plan to support turning off scale-to-zero. That flag is there just as a way to roll out the change. all revisions should scale to zero. But there has been some more discussion and design around serving states which outlines a better architecture: #645 (comment) So please disregard my comment about transitioning to Retired. |
I see, thanks. So the simplest fix for this issue is to wait until scaling to zero by default has "bedded in" and then delete the enable-scale-to-zero flag. |
Scale to zero is enabled by default |
Created #1531 to track clean up Reserve Revisions no longer routable. |
Since knative@e2a8237, `test/config` directory needs ytt command. This patch uses ytt.
Expected Behavior
After routing 100% to new revision and removing all references to the old revision, I expect the old revision pod to be torn down and disappear
Actual Behavior
old revision pod is still in running state
Steps to Reproduce the Problem
jessiezhu@gobaby:~/go/src/github.com/elafros/elafros$ kubectl -n ela-system get pods
NAME READY STATUS RESTARTS AGE
configuration-example-00001-autoscaler-77696d95c6-84wml 1/1 Running 0 8m
configuration-example-00002-autoscaler-6b4dbf566b-7rknq 1/1 Running 0 7m
ela-activator-6f9d78ff7c-rstwc 1/1 Running 0 2d
ela-controller-54dcdfb6-4qdkw 1/1 Running 0 18m
ela-webhook-7c4d5c5547-mvb2p 1/1 Running 0 2d
Additional Info
status:
conditions:
- state: Ready
status: "True"
domain: route-example.default.demo-domain.com
traffic:
- percent: 100
revisionName: configuration-example-00002
The text was updated successfully, but these errors were encountered: