Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: support for dynamic admin url port #123619

Merged

Conversation

nameisbhaskar
Copy link
Contributor

In order to support multiple tenants on the same host, a unique, custom port can be assigned for DefaultAdminUIPort. This breaks because the port is hard-coded in prometheus config for scraping.

The change is to have a http server running in prometheus server that can dynamically update the configuration when a new cluster is brought up. More details of the solution is explained in the page - https://cockroachlabs.atlassian.net/wiki/spaces/~7120207825326fb5e546c194029506f2c5335e/pages/3458531376/Dynamic+Scrape+Configs+on+Prometheus+for+Roachprod

Fixes: #117125
Epic: none

@nameisbhaskar nameisbhaskar requested a review from a team as a code owner May 4, 2024 09:17
@nameisbhaskar nameisbhaskar requested review from herkolategan and DarrylWong and removed request for a team May 4, 2024 09:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nameisbhaskar
Copy link
Contributor Author

The prom-helper-service: https://github.com/cockroachlabs/prom-helper-service

@nameisbhaskar nameisbhaskar requested a review from srosenberg May 6, 2024 13:04
@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch from dfa0df8 to e0c588a Compare May 6, 2024 13:11
@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch from e0c588a to 038d539 Compare May 7, 2024 05:48
@nameisbhaskar nameisbhaskar requested a review from srosenberg May 7, 2024 05:50
@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch 2 times, most recently from caa21e7 to 59123f6 Compare May 7, 2024 07:35
Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left a few nits and comments; otherwise, LGTM.

@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch 3 times, most recently from 688d4be to b30b6a6 Compare May 8, 2024 07:19
@nameisbhaskar
Copy link
Contributor Author

Thanks @srosenberg for your review. I have made certain changes as per your comments and would be good if you can a look again.

@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch 2 times, most recently from 5ecf56a to 5b66816 Compare May 8, 2024 13:50
@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch from 5b66816 to f05ea18 Compare May 9, 2024 06:06
In order to support multiple tenants on the same host, a unique, custom port can be assigned for DefaultAdminUIPort. This breaks because the port is hard-coded in prometheus config for scraping.

The change is to have a http server running in prometheus server that can dynamically update the configuration when a new cluster is brought up. More details of the solution is explained in the page - https://cockroachlabs.atlassian.net/wiki/spaces/~7120207825326fb5e546c194029506f2c5335e/pages/3458531376/Dynamic+Scrape+Configs+on+Prometheus+for+Roachprod

Fixes: cockroachdb#117125
Epic: none
@nameisbhaskar nameisbhaskar force-pushed the user/bhaskar/dynamic_prom_port branch from f05ea18 to d290435 Compare May 9, 2024 06:26
@nameisbhaskar
Copy link
Contributor Author

bors r=@srosenberg,@renatolabs,@herkolategan

@nameisbhaskar
Copy link
Contributor Author

Thanks @srosenberg , @herkolategan, @renatolabs, @DarrylWong for reviewing this change!!

@craig craig bot merged commit b8ba30a into cockroachdb:master May 9, 2024
21 of 22 checks passed
@nameisbhaskar nameisbhaskar deleted the user/bhaskar/dynamic_prom_port branch May 9, 2024 13:35
craig bot pushed a commit that referenced this pull request May 10, 2024
123943: roachprod: fix prom target panic r=srosenberg a=msbutler

As of #123619, roachtests that start up a cockroach cluster on a subset of roachtest nodes on gce panics (i.e. all PCR roachtests) because `updatePrometheusTargets` assumes the roachtest starts up all nodes in the roachprod cluster. This patch relaxes this assumption.

Epic: none

Release note: none

Co-authored-by: Michael Butler <[email protected]>
Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, I only had a chance to take a closer look at this PR now!

Most comments are just suggestions for future improvements that can be done separately or when we are making changes to the code in the future.

That said, I think a few changes should be done earlier rather than later (the command line help string, the check for len(nodeIPPorts) > 0, and maybe logging of errors in Delete to make it easier for us to understand failures when they happen in the future).

pkg/cmd/roachprod/main.go Show resolved Hide resolved
pkg/cmd/roachprod/main.go Show resolved Hide resolved
pkg/cmd/roachprod/main.go Show resolved Hide resolved
pkg/roachprod/promhelperclient/client.go Show resolved Hide resolved
`

// createClusterConfigFile creates the cluster config file per node
func buildCreateRequest(nodes []string) (io.Reader, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would likely be simpler if we used the yaml package to handle the serialization for us. Then we would be able to go from data structure -> YAML without relying on Go templates which is IMO a little easier to understand.

pkg/roachprod/roachprod.go Show resolved Hide resolved
pkg/roachprod/roachprod.go Show resolved Hide resolved
pkg/roachprod/roachprod.go Show resolved Hide resolved
pkg/util/httputil/client.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: using custom AdminUIPort breaks automated prometheus scraping
4 participants