-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track driver deploy time in e2e test pipeline #815
Track driver deploy time in e2e test pipeline #815
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AndyXiangLi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 1765
💛 - Coveralls |
4339329
to
ffe53a0
Compare
@@ -116,6 +118,15 @@ if [[ -r "${EBS_SNAPSHOT_CRD}" ]]; then | |||
kubectl apply -f "$EBS_SNAPSHOT_CRD" | |||
# TODO deploy snapshot controller too instead of including in helm chart | |||
fi | |||
endSec=$(date +'%s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have to wait for the pods to become ready. not so easy in bash to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agree, I did some research but no luck to find any tool to track container start up time.
But one thing is if we use helm --wait flag, helm will wait for containersReady condition before exit. IMO It would be good enough as for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And open for any suggestion to track this info :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh yeah let's use helm built-in functionality --wait flag, that is cool if they have it.
Otherwise I was thinking of how to call a python/go script from here and use the kube python/go client, which would be really ugly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use cloudwatch? The tests run in an AWS-internal account. And if running locally, i don't mind pushing metrics to my own account. We can fail-open in case for whatever reason the metric push fails (transient cloudwatch issue or whatever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good find. Does it wait for everything deployed by the chart to be ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally there is a way to expose the metric publicly though, cloudwatch wont be visible like https://testgrid.k8s.io/provider-aws-efs-csi-driver#e2e-test&width=20 is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will wait for all the containers to be ready
https://helm.sh/docs/intro/using_helm/
ffe53a0
to
2eec738
Compare
2eec738
to
a059b1c
Compare
@@ -116,6 +118,15 @@ if [[ -r "${EBS_SNAPSHOT_CRD}" ]]; then | |||
kubectl apply -f "$EBS_SNAPSHOT_CRD" | |||
# TODO deploy snapshot controller too instead of including in helm chart | |||
fi | |||
endSec=$(date +'%s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good find. Does it wait for everything deployed by the chart to be ready?
hack/e2e/run.sh
Outdated
@@ -25,6 +25,7 @@ source "${BASE_DIR}"/util.sh | |||
|
|||
DRIVER_NAME=${DRIVER_NAME:-aws-ebs-csi-driver} | |||
CONTAINER_NAME=${CONTAINER_NAME:-ebs-plugin} | |||
DRIVER_START_TIME_THRESHOLD=25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we come up with this number? I feel like it should be higher? We can adjust later as we go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed on my cluster that usually takes ~15s so I set this number. But makes sense to increase that a little bit in the initial commit, we can adjust this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looks like it failed on the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the delta is the image pull time.. Cold start apparently takes longer than I expected lol
hack/e2e/run.sh
Outdated
secondUsed=$(( (endSec-startSec)/1 )) | ||
# Set timeout threshold as 20 seconds for now, usually it takes less than 10s to startup | ||
if [ $secondUsed -gt $DRIVER_START_TIME_THRESHOLD ]; then | ||
loudecho "Driver start timeout, test fail!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should log here how long it took and what the threshold is, so we can see the gap immediately without reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
I think the main bottleneck will be image pulling which we don't really control. I guess if we are trying to measure from ux perspecitve what 'cold start' would look like then it doesn't amtter the details of what is happening under the hood, just the overall #. anyway I'm ok with merging variation of this and seeing how it goes. |
a059b1c
to
6260b0a
Compare
hack/e2e/run.sh
Outdated
secondUsed=$(( (endSec-startSec)/1 )) | ||
# Set timeout threshold as 20 seconds for now, usually it takes less than 10s to startup | ||
if [ $secondUsed -gt $DRIVER_START_TIME_THRESHOLD ]; then | ||
loudecho "Driver start timeout, Cost $secondUsed but the threshold is $DRIVER_START_TIME_THRESHOLD Fail the test." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Cost/Took
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it took ~30s to start up now, do you think it is ok I change the threshold to 45s? 60s looks a bit too much here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a problem with going to 60s honestly. For now we can use this to make sure we don't introduce a change that'd increase the startup time too much. So let's start with a high number and go from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, Thank you!
6260b0a
to
91d1e5c
Compare
hack/e2e/run.sh
Outdated
@@ -25,6 +25,7 @@ source "${BASE_DIR}"/util.sh | |||
|
|||
DRIVER_NAME=${DRIVER_NAME:-aws-ebs-csi-driver} | |||
CONTAINER_NAME=${CONTAINER_NAME:-ebs-plugin} | |||
DRIVER_START_TIME_THRESHOLD=60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last thing, can you add a comment like # seconds
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, added seconds in the name to make it clear.
91d1e5c
to
bdf5075
Compare
@@ -25,6 +25,7 @@ source "${BASE_DIR}"/util.sh | |||
|
|||
DRIVER_NAME=${DRIVER_NAME:-aws-ebs-csi-driver} | |||
CONTAINER_NAME=${CONTAINER_NAME:-ebs-plugin} | |||
DRIVER_START_TIME_THRESHOLD_SECONDS=60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :)
/lgtm |
Is this a bug fix or adding new feature?
Fixes #804
What is this PR about? / Why do we need it?
Add driver start time info during e2e test. so we have better understanding on driver's behavior.
Set threshold as 20s for now, may adjust as we have more info.