-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(port): Update grpc executor target port #2131
Conversation
Hi @groszewn. Thanks for your PR. I'm waiting for a SeldonIO member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. |
Services created by
|
@groszewn thanks for creating PR. However, the change to use http port as targetPort for Executor only was introduced because the Executor now multiplexes all traffic on http port. What is the issue you are seeing on the service resource created for the custom named service that you believe is caused by this change? |
Gotcha, understood. Let me lay out what I've been testing out then. For all of the tests, I just have a "model" that returns the identity function built with Seldon The following code is used from within a pod in the same namespace as the seldon deployment in our cluster. import grpc
from seldon_core.proto import prediction_pb2, prediction_pb2_grpc
channel = grpc.insecure_channel(<svc name>:<grpc port>)
stub = prediction_pb2_grpc.SeldonStub(channel)
proto = prediction_pb2.SeldonMessage()
proto.strData = "dfasdf"
stub.Predict(proto) Non-Istio Enabled Custom Service Endpoint (no sidecars)Successful response
Non-Istio Enabled Standard Service Endpoint (no sidecars)Successful response
Istio Enabled Custom Service Endpoint (with sidecar)
Istio Enabled Standard Service Endpoint (with sidecar)
|
I'm a little confused as to what differs between the service call to the custom service endpoint ( |
@groszewn Thanks for all the info! Sounds like a problem only when using sidecars. Are you able to provide the sidecar yaml? |
@glindsell The sidecar is auto-injected by Istio, but I can share the pod resource that's created if that helps. apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: restricted
prometheus.io/path: /prometheus
prometheus.io/scrape: "true"
seccomp.security.alpha.kubernetes.io/pod: runtime/default
seldon.io/svc-name: toymodel-maas
sidecar.istio.io/status: '{"version":"023ae377d1a8981380141286422d04a98c39883f0647804d1ec0a7b5683da18d","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
toymodel-maas: ""
creationTimestamp: "2020-07-15T15:26:26Z"
generateName: toymodel-maas-toymodel-maas-0-toymodel-maas-9fd96774b-
labels:
app: toymodel-maas-toymodel-maas-0-toymodel-maas
app.kubernetes.io/managed-by: seldon-core
fluentd: "true"
pod-template-hash: 9fd96774b
security.istio.io/tlsMode: istio
seldon-app: toymodel-maas
seldon-app-svc: toymodel-maas-toymodel-maas-toymodel-maas
seldon-deployment-id: toymodel-maas
version: toymodel-maas
name: toymodel-maas-toymodel-maas-0-toymodel-maas-9fd96774b-2wl7w
namespace: testns
...
spec:
containers:
- env:
- name: PREDICTIVE_UNIT_SERVICE_PORT
value: "9000"
- name: PREDICTIVE_UNIT_ID
value: toymodel-maas
- name: PREDICTIVE_UNIT_IMAGE
value: <model image>
- name: PREDICTOR_ID
value: toymodel-maas
- name: PREDICTOR_LABELS
value: '{"version":"toymodel-maas"}'
- name: SELDON_DEPLOYMENT_ID
value: toymodel-maas
- name: PREDICTIVE_UNIT_METRICS_SERVICE_PORT
value: "6000"
- name: PREDICTIVE_UNIT_METRICS_ENDPOINT
value: /prometheus
image: <model image>
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- /bin/sleep 10
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: grpc
timeoutSeconds: 1
name: toymodel-maas
ports:
- containerPort: 6000
name: metrics
protocol: TCP
- containerPort: 9000
name: grpc
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: grpc
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/podinfo
name: seldon-podinfo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-ppdlc
readOnly: true
- args:
- --sdep
- toymodel-maas
- --namespace
- testns
- --predictor
- toymodel-maas
- --port
- "8000"
- --protocol
- seldon
- --prometheus_path
- /prometheus
env:
- name: ENGINE_PREDICTOR
value: <Engine predictor value>
- name: REQUEST_LOGGER_DEFAULT_ENDPOINT
value: http://default-broker
- name: SELDON_LOG_MESSAGES_EXTERNALLY
value: "false"
image: seldonio/seldon-core-executor:1.2.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /live
port: 8000
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 60
name: seldon-container-engine
ports:
- containerPort: 8000
protocol: TCP
- containerPort: 8000
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: 8000
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 60
resources:
requests:
cpu: 100m
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsUser: 8888
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/podinfo
name: seldon-podinfo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-ppdlc
readOnly: true
- args:
- proxy
- sidecar
- --domain
- $(POD_NAMESPACE).svc.cluster.local
- --configPath
- /etc/istio/proxy
- --binaryPath
- /usr/local/bin/envoy
- --serviceCluster
- toymodel-maas-toymodel-maas-0-toymodel-maas.$(POD_NAMESPACE)
- --drainDuration
- 45s
- --parentShutdownDuration
- 1m0s
- --discoveryAddress
- istio-pilot.istio-system:15010
- --zipkinAddress
- zipkin.istio-system:9411
- --proxyLogLevel=info
- --dnsRefreshRate
- 300s
- --connectTimeout
- 10s
- --proxyAdminPort
- "15000"
- --controlPlaneAuthPolicy
- NONE
- --statusPort
- "15020"
- --applicationPorts
- 6000,9000,8000,8000
- --concurrency
- "2"
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ISTIO_META_POD_PORTS
value: |-
[
{"name":"metrics","containerPort":6000,"protocol":"TCP"}
,{"name":"grpc","containerPort":9000,"protocol":"TCP"}
,{"containerPort":8000,"protocol":"TCP"}
,{"name":"metrics","containerPort":8000,"protocol":"TCP"}
]
- name: ISTIO_META_CLUSTER_ID
value: Kubernetes
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.serviceAccountName
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ISTIO_META_CONFIG_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: SDS_ENABLED
value: "false"
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
- name: ISTIO_META_INCLUDE_INBOUND_PORTS
value: 6000,9000,8000,8000
- name: ISTIO_METAJSON_ANNOTATIONS
value: |
{"kubernetes.io/psp":"restricted","prometheus.io/path":"/prometheus","prometheus.io/scrape":"true","seccomp.security.alpha.kubernetes.io/pod":"runtime/default","seldon.io/svc-name":"toymodel-maas","toymodel-maas":""}
- name: ISTIO_METAJSON_LABELS
value: |
{"app":"toymodel-maas-toymodel-maas-0-toymodel-maas","app.kubernetes.io/managed-by":"seldon-core","fluentd":"true","pod-template-hash":"9fd96774b","seldon-app":"toymodel-maas","seldon-app-svc":"toymodel-maas-toymodel-maas-toymodel-maas","seldon-deployment-id":"toymodel-maas","version":"toymodel-maas"}
- name: ISTIO_META_WORKLOAD_NAME
value: toymodel-maas-toymodel-maas-0-toymodel-maas
- name: ISTIO_META_OWNER
value: <>
imagePullPolicy: IfNotPresent
name: istio-proxy
ports:
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
readinessProbe:
failureThreshold: 30
httpGet:
path: /healthz/ready
port: 15020
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 500m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-ppdlc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- istio-iptables
- -p
- "15001"
- -z
- "15006"
- -u
- "1337"
- -m
- REDIRECT
- -i
- '*'
- -x
- ""
- -b
- '*'
- -d
- "15020"
image: istio/proxyv2:1.4.3
imagePullPolicy: IfNotPresent
name: istio-init
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 10m
memory: 10Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_ADMIN
- NET_RAW
drop:
- ALL
privileged: false
readOnlyRootFilesystem: false
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-ppdlc
readOnly: true
priority: 1
priorityClassName: default-priority
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1
runAsUser: 8888
supplementalGroups:
- 1
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 20
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.annotations
path: annotations
name: seldon-podinfo
- name: default-token-ppdlc
secret:
defaultMode: 420
secretName: default-token-ppdlc
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
defaultMode: 420
optional: true
secretName: istio.default
... |
Based off of our discussion [here](https://seldondev.slack.com/archives/C8Y9A8G0Y/p1594744362472100). The current port for grpc traffic is named `grpc` (very reasonably). Due to the new pattern of multiplexing http/grpc traffic on the http port, issues have popped up when trying to send grpc traffic to a service defined by the `seldon.io/svc-name` due to the istio sidecar determining the protocol and only allowing grpc traffic through (more info [here](https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/)). This fix changes the name of the grpc port from `grpc` to `http2` to allow for multiplexing of traffic when istio sidecars are used, while still maintaining the ability for istio grpc metrics to be generated (more info [here](https://istio.io/latest/docs/reference/config/policy-and-telemetry/metrics/#metrics)). Signed off by: Nick Groszewski <[email protected]>
/ok-to-test |
Wed Jul 15 17:22:40 UTC 2020 impatient try |
/test notebooks |
Wed Jul 15 17:22:49 UTC 2020 impatient try |
Wed Jul 15 17:24:29 UTC 2020 impatient try |
/test integration |
Wed Jul 15 17:55:37 UTC 2020 impatient try |
/test notebooks |
Thu Jul 16 09:34:00 UTC 2020 impatient try |
Thu Jul 16 09:34:11 UTC 2020 impatient try |
@groszewn Would it be safer to add a new port to the SVC with name http2 rather than change the existing grpc port? |
@cliveseldon I think that sounds like a good idea. Are you thinking that change should propagate all the way out to the helm chart values and CRD? And any preference on a default port for |
@groszewn I was thinking we can use the same port. So just duplicate the existing setting and add a new name? |
/test notebooks |
Thu Jul 16 13:05:06 UTC 2020 impatient try |
@cliveseldon It's currently not possible in K8s to have a single service with multiple service ports using the same port. As an example, this manifest would lead to the following error.
However, flipping the |
@groszewn ok. But that would be a breaking change. Can we just clarify again what we are solving. As this would be for all Services exposed so user will probably be assuming 5001 maybe for existing applications and we need to take into account Ambassador and other configurations. Also, are we sure there is no downside to using http2 as the name in istio for grpc? |
@cliveseldon Sure, what we're solving with this is the ability for istio sidecars to multiplex traffic along with the executor. The istio sidecar determines the protocol to use from the service port name, so multiplexing would have to happen over As far as I can tell from the istio documentation, there should be no adverse impacts to leveraging http2 with istio (traffic management, metrics, etc. seem to all be supported in the same way). If we were to take the approach of having 3 service ports ( |
If sidcars break things at present. This would seem to be an issue for people using Istio or Ambassador. So I think we need to keep the 2 ports only? |
I think that approach should work. As far as I can tell, only Istio uses the service port name to infer the protocol. Since metrics and traffic management don't seem to be impacted by just changing the name, the switch should simply allow multiplexing when there is an istio sidecar and work as it currently does otherwise. |
@cliveseldon @glindsell just wanted to check back in on this before the weekend. Anything else needed from my end? Do the pipelines need to get kicked again? |
/test integration |
/test notebooks |
/approve |
Yes all good @groszewn |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cliveseldon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fri Jul 17 13:44:31 UTC 2020 impatient try |
Fri Jul 17 13:44:42 UTC 2020 impatient try |
/test integration |
/test notebooks |
Mon Jul 20 11:00:25 UTC 2020 impatient try |
Mon Jul 20 11:00:45 UTC 2020 impatient try |
@cliveseldon is there anything I need to do here or are the CI checks just being a little flaky? Looks like the integration tests failed different things in the different runs and the notebook test failure didn't look particularly related. |
/test integration |
/test notebooks |
Tue Jul 21 14:45:46 UTC 2020 impatient try |
Tue Jul 21 14:45:53 UTC 2020 impatient try |
@groszewn: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here. |
1 flaky notebook test failed so will force merge. |
Update to have gRPC traffic forwarded to the gRPC target port.
Signed off by: Nick Groszewski [email protected]
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: