Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move networkpolicies out of /contrib into /common (#2385) #2457

Merged
merged 8 commits into from
Apr 16, 2024

Conversation

rawc0der
Copy link
Member

Which issue is resolved by this Pull Request:
Resolves #2385

Description of your changes:
Moved networkpolicies from /contrib to /common and added the base overlay to the example kustomization.

Checklist:

  • Unit tests pass:
    Make sure you have installed kustomize == ^5.0.0
    1. make generate-changed-only
    2. make test

@google-cla
Copy link

google-cla bot commented May 10, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@juliusvonkohout
Copy link
Member

can you add @TobiasGoerke as additional reviewer (not approver) in the owners file?

@juliusvonkohout
Copy link
Member

/LGTM

@@ -49,6 +50,8 @@ resources:
- ../common/istio-1-16/cluster-local-gateway/base
# Kubeflow namespace
- ../common/kubeflow-namespace/base
# NetworkPolicies
- ../common/networkpolicies/base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimwnasptd @juliusvonkohout Are there any concerns with enabling network policies by default? Is there any concern around how impacts any additional components users deploy either from /contrib or their own?

This also would mean that any changes made by WG that do not align with current network policies would need to be addressed during the manifest testing time frame of the release. Is that something that Manifest WG wants to take on for all future releases as part of the release cycle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a lot of discussion in #2385. It is more about how we want to do project governance and enforce security. I can do testing for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include automated tests with the PR if the netpol will be enabled by default, to test the current policies and future changes to the policies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly would you like to test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can't automatically trigger e2e right now, we can at least trigger all the component tests to ensure that it passes with netpol enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not think that it is a bit overkill to add the networkpolicies folder as path to 10 different GH actions? I would either use the e2e pipeline or make a simple test that just validates the networkpolicies with kustomize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that just capturing kustomize build output is not enough, as Kustomize allows you to build invalid resources not following the OpenAPI spec definition.
Example:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: minio
  namespace: kubeflow
spec:
  podSelector:
    matchExpressions:
        - sexy_key: app
          operator: In
          values:
            - minio # artifact storage

With kustomize build common/networkpolicies/base there is no error captured, but when doing static validation against OpenAPI-based validation tools kubeval or kubectl-validate, an error is raised.

Kubeval:

$ kustomize build common/networkpolicies/base | kubeval                              (⎈ |kind-kind-dex-auth:default)
PASS - stdin contains a valid NetworkPolicy (kubeflow.cache-server)
WARN - stdin contains an invalid NetworkPolicy (kubeflow.minio) - key: key is required
....

Kubectl Validate:

$ kustomize build common/networkpolicies/base > netpol.yaml                          
$ kubectl-validate netpol.yaml                                                   
netpol.yaml...ERROR
spec.podSelector.matchExpressions[0].sexy_key: Invalid value: value provided for unknown field
Error: validation failed

@annajung my proposal is to consider using OpenAPI validation tools for testing manifest generation, aside from any other e2e and runtime checks.

  • Kubectl Validate is really just emerging, but seems to be under the right umbrella (Kubernetes SIGs)
  • Kubeval seems to be discontinued
  • any other promissing tools?

I can try to do a bit of hacking around with the GH Action workflows to test using the kubectl-validate route... let me know what's your thoughts on this.

Copy link
Member Author

@rawc0der rawc0der Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the entire kustomize build example/ there are more errors surfaced:

$ kubectl-validate example.yaml                                                  

example.yaml...ERROR
Role.rbac.authorization.k8s.io "cert-manager-webhook:dynamic-serving" is invalid: metadata.name: Invalid value: "cert-manager-webhook:dynamic-serving": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Role.rbac.authorization.k8s.io "cert-manager-cainjector:leaderelection" is invalid: metadata.name: Invalid value: "cert-manager-cainjector:leaderelection": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Role.rbac.authorization.k8s.io "cert-manager:leaderelection" is invalid: metadata.name: Invalid value: "cert-manager:leaderelection": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
ClusterRole.rbac.authorization.k8s.io "cert-manager-controller-approve:cert-manager-io" is invalid: metadata.name: Invalid value: "cert-manager-controller-approve:cert-manager-io": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
ClusterRole.rbac.authorization.k8s.io "cert-manager-webhook:subjectaccessreviews" is invalid: metadata.name: Invalid value: "cert-manager-webhook:subjectaccessreviews": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
RoleBinding.rbac.authorization.k8s.io "cert-manager-webhook:dynamic-serving" is invalid: metadata.name: Invalid value: "cert-manager-webhook:dynamic-serving": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
RoleBinding.rbac.authorization.k8s.io "cert-manager-cainjector:leaderelection" is invalid: metadata.name: Invalid value: "cert-manager-cainjector:leaderelection": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
RoleBinding.rbac.authorization.k8s.io "cert-manager:leaderelection" is invalid: metadata.name: Invalid value: "cert-manager:leaderelection": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
ClusterRoleBinding.rbac.authorization.k8s.io "cert-manager-controller-approve:cert-manager-io" is invalid: metadata.name: Invalid value: "cert-manager-controller-approve:cert-manager-io": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
ClusterRoleBinding.rbac.authorization.k8s.io "cert-manager-webhook:subjectaccessreviews" is invalid: metadata.name: Invalid value: "cert-manager-webhook:subjectaccessreviews": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Maybe deserves separate topic to cover checking multiple components. Adding the name to the file/resource where the errors are found would be useful as well, but I think also just the name of the Kubernetes resources is good for starters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test could look something like the following:
Screenshot 2023-10-19 at 17 25 51

But the problem is that for custom resources, operators etc, the CRDs needs to be passed as local folders for openAPI schemas extraction, so more work is needed for this.

failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: cert-manager.io/v1
failed to retrieve validator: failed to locate OpenAPI spec for GV: kubeflow.org/v1beta1
failed to retrieve validator: failed to locate OpenAPI spec for GV: metacontroller.k8s.io/v1alpha1
failed to retrieve validator: failed to locate OpenAPI spec for GV: networking.istio.io/v1alpha3
failed to retrieve validator: failed to locate OpenAPI spec for GV: networking.istio.io/v1alpha3

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliusvonkohout Seldon isn't installed by default, so does it make sense to add this policy by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends. We can also think about moving it to /contrib/seldon

@google-oss-prow google-oss-prow bot removed the lgtm label Oct 7, 2023
@juliusvonkohout
Copy link
Member

@rawc0der why did you disable seldon? Seldon will still be used in Kubeflow 1.9.

@rawc0der
Copy link
Member Author

rawc0der commented Oct 9, 2023

I was actually thinking that /contrib components network policies should maybe be moved into their corresponding /contrib application folder path and only enabled by default in /common/networkpolicies/ if the component is required. Otherwise we end up having some unnecessary netpols for optional components.

I can enable it back as maybe there is no problem by running this netpol by default.

What's you view on this?

@juliusvonkohout
Copy link
Member

Kserve is aslo in /contrib but installed by default. Seldon is also quite popular. So having them in one place for the time being is what i prefer. @kimwnasptd what do you think?

@rawc0der
Copy link
Member Author

TO conclude, one example of where Istio AuthPolicies lacks some support is with services using protocol specific implementation, i.e. the databases: mysql, rds.

AuthorizationPolicy can only cover HTTP/gRPC based protocols at the moment.

Advantages of securing with Network policies:

  • allows to be universal on all application that use IP
  • allows applying policies to DNS, SQL databases, real-time streaming, other types of services

Without netpols:

  • The Istio’s proxy is based on Envoy, which is implemented as a user space daemon
  • Istio policy enforcement happens inside pod sidecar in the same network namespace.
  • Due to some pods getting access to CAP_NET_ADMIN, services can be compromised and proxy could be bypassed

Attack vector in Layer 7:

Attacking unprotected pods
Attempting to deny service to protected pods by sending lots of traffic
Exfiltrating data collected in the pod
Attacking the cluster infrastructure (servers or Kubernetes services)
Attacking services outside the mesh, like databases, storage arrays, or legacy systems.

Last take:

 Istio and Network Policy have different strengths in applying policy. Istio is application-protocol aware and highly flexible, making it ideal for applying policy in support of operational goals, like service routing, retries, circuit-breaking, etc, and for security that operates at the application layer, such as token validation. Network Policy is universal, highly efficient, and isolated from the pods, making it ideal for applying policy in support of network security goals

From this blog post.

@juliusvonkohout
Copy link
Member

@rawc0der

"rawc0der
User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.
"
I think you have to remove yourself from the owners list or become approved by google.

I now have the power to merge this kubeflow/internal-acls#584 (comment) , but i need to discuss this with Kimonas first.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 11, 2024

@rawcoder please fix the merge conflict and i think you have to remove yourself from the owners list or become approved by google. Otherwise i have to fork this before merging.

@juliusvonkohout
Copy link
Member

/hold

@rimolive rimolive mentioned this pull request Jan 22, 2024
10 tasks
@juliusvonkohout
Copy link
Member

@andreyvelich can you take this up with the KSC ?

@juliusvonkohout
Copy link
Member

@terrytangyuan can you take this up with the KSC ?

@terrytangyuan
Copy link
Member

terrytangyuan commented Apr 4, 2024

I think this is in our agenda but we haven't got to it yet. cc @kubeflow/kubeflow-steering-committee

@rimolive
Copy link
Member

@juliusvonkohout Do we have KSC approval for merging this?

@juliusvonkohout
Copy link
Member

I think the KSC still has to decide, but i provided most arguments in the community call, so i hope it is just a formal approval.

@jbottum
Copy link

jbottum commented Apr 16, 2024

I am +1 considering @juliusvonkohout says this is not breaking change. It is in line with a security profile strategy goals and helps to unblock a CNCF requirement for graduation. By default it will be turned on with this change. It can be disabled by commenting out one line. Can we get feedback from distributions ? @kimwnasptd @james-jwu @johnugeorge others?

@juliusvonkohout
Copy link
Member

As discussed in the community call with three KSC members we will merge it now.

/lgtm
/approve

Thank you for moving enabling it by default @rawcoder.

@google-oss-prow google-oss-prow bot added the lgtm label Apr 16, 2024
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout, rawc0der

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout
Copy link
Member

/unhold

@google-oss-prow google-oss-prow bot merged commit b9dddd2 into kubeflow:master Apr 16, 2024
2 checks passed
doncorsean pushed a commit to doncorsean/kubeflow-manifests that referenced this pull request Jul 18, 2024
…beflow#2457)

* Move networkpolicies out of /contrib into /common (kubeflow#2385)

* extend OWNERS file

* disable seldon NP

* enable seldon NP

* add rawc0der as Reviewer

* Remove rawc0der reviewer

* remove the duplicate resource
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Promote networkpolicies out of /contrib
6 participants