Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autonode script exporter config #1064

Merged
merged 5 commits into from
Oct 22, 2024

Conversation

stephen-soltesz
Copy link
Contributor

@stephen-soltesz stephen-soltesz commented Oct 21, 2024

This change adds script exporter support for the autonode deployments.

This change is Reviewable

@stephen-soltesz
Copy link
Contributor Author

stephen-soltesz commented Oct 21, 2024

@nkinkade this PR is still draft but what do you think of it?

The primary motivation is to have some activity on the sandbox and staging autonode deployments for our internal monitoring and development quality assurance before we release a new version for others to use.

The main shortcoming currently is that this is testing VM to VM in GCE and the rates are probably much faster than they need to be for testing.

Copy link
Contributor

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you've set the cache_exit_code.sh to cache for an hour, which seems reasonable. And according to GCP VPC network pricing, intra-zone VPC networking is free, and inter-zone is $0.01/GiB. I see that currently the prometheus-federation cluster is deployed to us-central1-c, while the test autonode's are in us-central-a. Would it be possible to redeploy the test autonodes to us-central1-c? In that arrangement, it seems like the testing wouldn't incur any cost... as long as the traffic was using the VPC internal/private network addresses.

Reviewed 6 of 7 files at r1, all commit messages.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)


k8s/prometheus-federation/deployments/script-exporter.yml line 35 at r1 (raw file):

      containers:
      - name: script-exporter
        image: measurementlab/script-exporter-support:sandbox-soltesz-add-ndt7

This seems like something that was meant for testing that didn't get changed back to the appropriate value?

Copy link
Contributor

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @stephen-soltesz)

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking such a careful look!

I've moved the autonode zone to us-central1-c in terraform - m-lab/terraform-support#106 this requires a manual step to delete the node before deploying terraform since the provider is not strict about the zone and will keep a node in us-central1-a if it already exists there.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @nkinkade)


k8s/prometheus-federation/deployments/script-exporter.yml line 35 at r1 (raw file):

Previously, nkinkade wrote…

This seems like something that was meant for testing that didn't get changed back to the appropriate value?

Yes, it was only meant to be temporary while testing. Fixed now.

@stephen-soltesz stephen-soltesz marked this pull request as ready for review October 22, 2024 16:07
Copy link
Contributor

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I've made changes to the Locate API so that monitoring request parameters are preserved. So, the early_exit=250 parameter should now be passed to the ndt-server and additionally reduce the total of resources used.

I've updated the ops tactical notes with all repos related to these changes for production deployment.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained

Copy link
Contributor

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Reviewable status: :shipit: complete! 1 of 1 approvals obtained

@stephen-soltesz stephen-soltesz merged commit 7e3a737 into main Oct 22, 2024
3 checks passed
@stephen-soltesz stephen-soltesz deleted the sandbox-soltesz-autonode-testing branch October 22, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants