Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler and controller-manager ports for metrics #791

Merged
merged 2 commits into from
Dec 10, 2018

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Dec 5, 2018

This pull request opens the scheduler and controller-manager ports for the openshift and aws platforms.

cc @smarterclayton

@abhinavdahiya
Copy link
Contributor

@smarterclayton already has opened 9000-9999 for metrics #683

Please use those.

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

I was specifically instructed by @smarterclayton to do this patch 🙂 . The scheduler is in the same situation as the other ports around it, which is why those exist as well.

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

/retest

@abhinavdahiya
Copy link
Contributor

I was specifically instructed by @smarterclayton to do this patch slightly_smiling_face . The scheduler is in the same situation as the other ports around it, which is why those exist as well.

the ports around it are kubelet ports that are used proabaly for apiserver to kubelet communication for exec/logs.

you are opening new ports for metrics.. that's different and we already have a range opened for metrics.

And if these can't be in 9000-9999 range, only then we don't have a choice. :P

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

They can't be today, which is why I was instructed to do so. In the future (when 1.12 rebase lands) the master team may be able to fix this, but to get metrics into a working state now we're going for this.

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

/retest

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

/refresh

1 similar comment
@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

/refresh

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

umm ... I open the link and everything passed ... any ideas?

/refresh doesn't seem to help

@brancz
Copy link
Contributor Author

brancz commented Dec 5, 2018

/refresh

@smarterclayton
Copy link
Contributor

/retest

@@ -148,6 +148,46 @@ resource "aws_security_group_rule" "master_ingress_kubelet_insecure" {
self = true
}

resource "aws_security_group_rule" "master_ingress_kube_scheduler_from_worker" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do workers need direct access to the scheduler? This is surprising to me, but I'm not familiar with the scheduler implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cluster-monitoring) prometheus is running on workers, and prometheus needs to scrape the metrics

Copy link
Member

@wking wking Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and prometheus needs to scrape the metrics

Ugh. Ok. But it really feels like we should get Prometheus and other monitoring out into a different subnet or something then. Having these ports and the 9000-9999 range on all machines open to all workers seems like one more thing to worry about for security. Not something that's going to happen this week though.

Copy link
Contributor

@smarterclayton smarterclayton Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These services are already protected, we need to do a better job of scanning them to catch security regressions though.

We should not be depending on security groups for security. We need to find a way to ensure teams can own both exposing their metrics and securing their metrics. In the core control plane case we've already done that, but we need to keep improving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be depending on security groups for security.

No, but the more barriers there are between a given bug an a useful exploit, the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. My understanding is that once the 1.12 rebase lands the masters team will change the secure port to something in the open range (9000-10000).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the default secure ports are 10259 for the scheduler, and 10257 for the controller-manager. As we don't change this for the kubelet, I'd be hesitant to deviate from the defaults for other components.

@abhinavdahiya
Copy link
Contributor

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2018
@brancz
Copy link
Contributor Author

brancz commented Dec 6, 2018

/retest

1 similar comment
@brancz
Copy link
Contributor Author

brancz commented Dec 7, 2018

/retest

@wking
Copy link
Member

wking commented Dec 7, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2018
@brancz
Copy link
Contributor Author

brancz commented Dec 8, 2018

/retest

1 similar comment
@brancz
Copy link
Contributor Author

brancz commented Dec 8, 2018

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2018
@brancz
Copy link
Contributor Author

brancz commented Dec 8, 2018

/retest

@brancz
Copy link
Contributor Author

brancz commented Dec 8, 2018

/retest

@brancz
Copy link
Contributor Author

brancz commented Dec 9, 2018

Can someone spot any errors in the PR? It seems to me that the CI failures are flakes, but if that's the case it's very flaky.

/retest

@brancz
Copy link
Contributor Author

brancz commented Dec 9, 2018

/retest

1 similar comment
@brancz
Copy link
Contributor Author

brancz commented Dec 9, 2018

/retest

@brancz
Copy link
Contributor Author

brancz commented Dec 10, 2018

It seems this was a real failure, I believe it should be fixed now.

@wking
Copy link
Member

wking commented Dec 10, 2018

It seems this was a real failure, I believe it should be fixed now.

There were a number of CI issues over the weekend, which is what I'd thought was going on here. And it looks like you didn't actually change anything?

$ diff -u <(git show 69c4da3) <(git show 1d866fe)
--- /dev/fd/63	2018-12-10 11:43:46.879281433 -0800
+++ /dev/fd/62	2018-12-10 11:43:46.880281438 -0800
@@ -1,4 +1,4 @@
-commit 69c4da3017737f48f7fa5abb37d8b31782268853
+commit 1d866fe4b42d1a656b2c7f08ee39c9c45b3d5421
 Author: Frederic Branczyk <[email protected]>
 Date:   Wed Dec 5 17:30:02 2018 +0100
 
$ diff -u <(git show 69c4da3^) <(git show 1d866fe^)
--- /dev/fd/63	2018-12-10 11:44:36.581510484 -0800
+++ /dev/fd/62	2018-12-10 11:44:36.581510484 -0800
@@ -1,4 +1,4 @@
-commit 6d69febaac81921314e719073035496ca643f2d9
+commit cad1f25a176afe9593c8db6d6406a055c4e78d31
 Author: Frederic Branczyk <[email protected]>
 Date:   Wed Dec 5 17:29:22 2018 +0100
 

Did you expect to have fixed a bug in your PR that maybe you forgot to commit? Or are you just talking about the external CI issues? Anyway, still looks good to me:

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, brancz, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@brancz
Copy link
Contributor Author

brancz commented Dec 10, 2018

I was just talking about external issues. I wasn't changing anything, but wanted to make sure I didn't miss out on changes in case they are important for CI to pass so I rebased, sorry if that was unnecessary.

@wking
Copy link
Member

wking commented Dec 10, 2018

No worries, I was just making sure I was on the same page.

@openshift-merge-robot openshift-merge-robot merged commit b744c17 into openshift:master Dec 10, 2018
@brancz brancz deleted the sched-cm-ports branch December 10, 2018 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants