-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External Metrics support for VPA recommender #5576
Conversation
Welcome @lallydd! |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
17 similar comments
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
Seems like there's some issues with easyCLA: communitybridge/easycla#3843 |
/close |
@MadhavJivrajani: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
6aeaef8
to
379b950
Compare
Tests pass after another rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current state, the local e2e tests fail with this in the recommender logs
Cannot update VPA vertical-pod-autoscaling-3092/hamster-vpa object. Reason: verticalpodautoscalers.autoscaling.k8s.io "hamster-vpa" is forbidden: User "system:serviceaccount:kube-system:vpa-recommender" cannot patch resource "verticalpodautoscalers/status" in API group "autoscaling.k8s.io" in the namespace "vertical-pod-autoscaling-3092"
Now that #5911 has been merged, the permissions in hack/e2e/vpa-rbac.yaml
need to be adapted as well.
This is a bit brittle, people will have to remember to adapt permissions in two places, because it is duplicated. Not sure if there's a good alternative, though.
On a more general note: I really like adding the ability to run the e2e tests locally in a kind cluster! If we get all tests to run in this scenario, this would be very helpful to everyone developing the VPA! I just really wish you would have PR'ed this separately. Right now this is a big chunk with different concerns and hard to review/approve (at least for me, it is).
That's a really good point. I'm testing a |
379b950
to
68c9db8
Compare
@voelzmo It worked! Now this is just the diff necessary: 68c9db8#diff-debb7e710c67f3afe14c9605ad0d7ab3b59f6272a15e21d55679c27b139d732d And that's all just to enable external metrics, so it's a useful artifact to maintain generally. |
Also added permissions note to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at parts of the PR other than the core code change. Mostly they look good but I'm asking for some changes (should be small) or examplations.
Thank you!
kubectl apply -f ${SCRIPT_ROOT}/hack/e2e/k8s-metrics-server.yaml | ||
|
||
for i in ${COMPONENTS}; do | ||
if [ $i == admission-controller ] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly this should never happen. Check in line 43 ensures that COMPONENTS
is recommender
or recommender-externalmetrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
echo "<suite> should be one of:" | ||
echo " - recommender" | ||
echo " - recommender-externalmetrics" | ||
echo "If component is not specified all above will be started." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be true. If there is no component specified script prints help and exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
export REGISTRY=localhost:5001 | ||
export TAG=latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if the script allowed setting REGISTRY and TAG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Now if they're in the environment, they'll get used.
for i in ${COMPONENTS}; do | ||
if [ $i == admission-controller ] ; then | ||
(cd ${SCRIPT_ROOT}/pkg/${i} && bash ./gencerts.sh || true) | ||
elif [ $i == recommender-externalmetrics ] ; then | ||
i=recommender | ||
fi | ||
ALL_ARCHITECTURES=amd64 make --directory ${SCRIPT_ROOT}/pkg/${i} release REGISTRY=${REGISTRY} TAG=${TAG} | ||
kind load docker-image ${REGISTRY}/vpa-${i}-amd64:${TAG} | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in ${COMPONENTS}; do | |
if [ $i == admission-controller ] ; then | |
(cd ${SCRIPT_ROOT}/pkg/${i} && bash ./gencerts.sh || true) | |
elif [ $i == recommender-externalmetrics ] ; then | |
i=recommender | |
fi | |
ALL_ARCHITECTURES=amd64 make --directory ${SCRIPT_ROOT}/pkg/${i} release REGISTRY=${REGISTRY} TAG=${TAG} | |
kind load docker-image ${REGISTRY}/vpa-${i}-amd64:${TAG} | |
done | |
kind load docker-image ${REGISTRY}/vpa-recommender-amd64:${TAG} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to remove the make
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it should stay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've asked about this but can't find the comment right now: is this a copy of a file from another repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a processed version of all the templates from here: https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server/templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain what this diff is for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put together a small README. In short, it's what else you need to turn on external metrics which we use for local end-to-end testing. It's automatically applied when running locally. This prevents us needing to put the permission in the normally deployed RBAC yaml -- people can put it in when they want to use this feature.
EDIT: added to existing hack/local-cluster.md
doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section "RBAC Changes" under hack/local-cluster.md
to explain it.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ ${SUITE} == recommender-externalmetrics ]; then | ||
${SCRIPT_ROOT}/hack/run-e2e-tests.sh recommender | ||
else | ||
${SCRIPT_ROOT}/hack/run-e2e-tests.sh ${SUITE} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ ${SUITE} == recommender-externalmetrics ]; then | |
${SCRIPT_ROOT}/hack/run-e2e-tests.sh recommender | |
else | |
${SCRIPT_ROOT}/hack/run-e2e-tests.sh ${SUITE} | |
fi | |
${SCRIPT_ROOT}/hack/run-e2e-tests.sh recommender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make it harder to add make other components testable locally. Are you sure you want this?
68c9db8
to
de708d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I got everything, except for replacing the script as we discusssed in this morning's zoom call.
echo "<suite> should be one of:" | ||
echo " - recommender" | ||
echo " - recommender-externalmetrics" | ||
echo "If component is not specified all above will be started." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
export REGISTRY=localhost:5001 | ||
export TAG=latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Now if they're in the environment, they'll get used.
kubectl apply -f ${SCRIPT_ROOT}/hack/e2e/k8s-metrics-server.yaml | ||
|
||
for i in ${COMPONENTS}; do | ||
if [ $i == admission-controller ] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
for i in ${COMPONENTS}; do | ||
if [ $i == admission-controller ] ; then | ||
(cd ${SCRIPT_ROOT}/pkg/${i} && bash ./gencerts.sh || true) | ||
elif [ $i == recommender-externalmetrics ] ; then | ||
i=recommender | ||
fi | ||
ALL_ARCHITECTURES=amd64 make --directory ${SCRIPT_ROOT}/pkg/${i} release REGISTRY=${REGISTRY} TAG=${TAG} | ||
kind load docker-image ${REGISTRY}/vpa-${i}-amd64:${TAG} | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to remove the make
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a processed version of all the templates from here: https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server/templates
if [ ${SUITE} == recommender-externalmetrics ]; then | ||
${SCRIPT_ROOT}/hack/run-e2e-tests.sh recommender | ||
else | ||
${SCRIPT_ROOT}/hack/run-e2e-tests.sh ${SUITE} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make it harder to add make other components testable locally. Are you sure you want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put together a small README. In short, it's what else you need to turn on external metrics which we use for local end-to-end testing. It's automatically applied when running locally. This prevents us needing to put the permission in the normally deployed RBAC yaml -- people can put it in when they want to use this feature.
EDIT: added to existing hack/local-cluster.md
doc.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't be a drop-in replacement. This script generates reasonable resource-usage sizes for every pod, making the existing recommender
tests work by giving them resource-use values. As discussed in-meeting, we can work to remove it (by adjusting tests or other infra) in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section "RBAC Changes" under hack/local-cluster.md
to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me. A few comments to clean this up.
vertical-pod-autoscaler/e2e/go.sum
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove changes in vertical-pod-autoscaler/e2e/go.sum and vertical-pod-autoscaler/e2e/vendor/ from the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and done.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a follow up issue to make this more maintainable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There aren't any changes to go.mod / go.sum causing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The external_metrics
api was referenced in vendor/modules.txt. I think it's just pulled in as part of the existing k8s.io/metrics
reference in go.mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
de708d2
to
a5148d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
As discussed at the last SIG meeting we should clearly document flags we add as alpha and communicate clearly that this isn't as mature as metrics server support yet.
https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-flag-or-cli
I just watched the SIG meeting. Should the flags be documented as alpha under the flag declaration the VPA Recommender README, or somewhere else? |
a5148d2
to
034632a
Compare
@jbartosik I added ALPHA designations to the command-line flags and added an ALPHA designation to the release notes. Also this sentence: "This is new functionality and is not as mature or as well-tested as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jbartosik, lallydd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This adds a mode to the VPA
recommender
to use External Metrics. It also adds (much more substantially in lines-of-changes) local KIND (Kubernetes IN Docker) support for therecommender
in bothmetrics-server
and External Metrics modes.Extending out the testing to support the rest of the VPA tests is certainly possible, and probably 95% done, but this PR was taking a long time already to get sorted.
Which issue(s) this PR fixes:
Fixes #5153
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: