-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add autonode script exporter config #1064
Conversation
@nkinkade this PR is still draft but what do you think of it? The primary motivation is to have some activity on the sandbox and staging autonode deployments for our internal monitoring and development quality assurance before we release a new version for others to use. The main shortcoming currently is that this is testing VM to VM in GCE and the rates are probably much faster than they need to be for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you've set the cache_exit_code.sh to cache for an hour, which seems reasonable. And according to GCP VPC network pricing, intra-zone VPC networking is free, and inter-zone is $0.01/GiB. I see that currently the prometheus-federation cluster is deployed to us-central1-c, while the test autonode's are in us-central-a. Would it be possible to redeploy the test autonodes to us-central1-c? In that arrangement, it seems like the testing wouldn't incur any cost... as long as the traffic was using the VPC internal/private network addresses.
Reviewed 6 of 7 files at r1, all commit messages.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
k8s/prometheus-federation/deployments/script-exporter.yml
line 35 at r1 (raw file):
containers: - name: script-exporter image: measurementlab/script-exporter-support:sandbox-soltesz-add-ndt7
This seems like something that was meant for testing that didn't get changed back to the appropriate value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @stephen-soltesz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking such a careful look!
I've moved the autonode zone to us-central1-c in terraform - m-lab/terraform-support#106 this requires a manual step to delete the node before deploying terraform since the provider is not strict about the zone and will keep a node in us-central1-a if it already exists there.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @nkinkade)
k8s/prometheus-federation/deployments/script-exporter.yml
line 35 at r1 (raw file):
Previously, nkinkade wrote…
This seems like something that was meant for testing that didn't get changed back to the appropriate value?
Yes, it was only meant to be temporary while testing. Fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 1 approvals obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I've made changes to the Locate API so that monitoring request parameters are preserved. So, the early_exit=250
parameter should now be passed to the ndt-server and additionally reduce the total of resources used.
I've updated the ops tactical notes with all repos related to these changes for production deployment.
Reviewable status: complete! 1 of 1 approvals obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Reviewable status: complete! 1 of 1 approvals obtained
This change adds script exporter support for the autonode deployments.
This change is