Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observed envoy memory while adding|removing ingress #499

Closed
mattalberts opened this issue Jul 2, 2018 · 50 comments
Closed

Observed envoy memory while adding|removing ingress #499

mattalberts opened this issue Jul 2, 2018 · 50 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@mattalberts
Copy link

mattalberts commented Jul 2, 2018

Envoy Memory Investigation

Odd container memory observations do not affect the contour container. As seen in the graphs, contour memory changes are dwarfed by envoy memory changes. I wanted to start the question/discussion with the contour project to see if similar observations have been seen.

I found two possibly related issues in the envoy project

  1. Envoy OOM problem with TLS envoyproxy/envoy#3592
  2. stats: when hot-restart is disabled, each stat still consumes maxNameLength() bytes for the name. envoyproxy/envoy#3508

The likelihood of a relation is low, but I've included links to the issues for the sake of completenes.

I have yet to find the root cause, but it feels like handling around config (either a leak or purposeful check-point while merging changes).

@davecheney

  • Any ideas?
  • Have you seen anything similar?
  • Would you rather I reach out directly to envoy?

Summary

The goal is to observe container memory while adding and removing ingress definitions.

  • nothing other than the ingress definitions are changing.
  • there is also no traffic to the ingresses.

Spoilers

  • memory shows a large (linear) growth while adding TLS ingress definitions
  • the same growth is NOT see for non-TLS ingress definitions

Launch Context

To help eliminate possible causes for memory growth, both hot restart and envoy metrics scraping have been disabled.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: contour-ingress
  name: contour-ingress
  namespace: ingress-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: contour-ingress
    component: debug
  name: contour-ingress-debug
  namespace: ingress-system
spec:
  ports:
    - port: 80
      name: http
      protocol: TCP
      targetPort: 8080
    - port: 443
      name: https
      protocol: TCP
      targetPort: 8443
  selector:
    app: contour-ingress
    component: debug
  type: LoadBalancer
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: contour-ingress
    component: debug
  name: contour-ingress-debug
  namespace: ingress-system
spec:
  selector:
    matchLabels:
      app: contour-ingress
      component: debug
  replicas: 1
  template:
    metadata:
      labels:
        app: contour-ingress
        component: debug
      annotations:
        prometheus.io/scrape: "false"
        prometheus.io/port: "9001"
        prometheus.io/path: "/stats"
        prometheus.io/format: "prometheus"
    spec:
      dnsPolicy: ClusterFirst
      serviceAccountName: contour-ingress
      terminationGracePeriodSeconds: 30
      initContainers:
        - name: envoy-init
          image: gcr.io/heptio-images/contour:v0.5.0
          imagePullPolicy: IfNotPresent
          command: ["contour"]
          args:
            - bootstrap
            - /config/contour.yaml
            - --admin-address=0.0.0.0
          volumeMounts:
            - name: contour-config
              mountPath: /config
      containers:
        - name: contour
          image: gcr.io/heptio-images/contour:v0.5.0
          imagePullPolicy: IfNotPresent
          command: ["contour"]
          args:
            - serve
            - --incluster
            - --ingress-class-name=contour-debug
        - name: envoy
          image: docker.io/envoyproxy/envoy-alpine-debug:v1.6.0
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
              name: http
            - containerPort: 8443
              name: https
          command: ["envoy"]
          args:
            - -c /config/contour.yaml
            - --service-cluster cluster0
            - --service-node node0
            - --disable-hot-restart
            - --v2-config-only
            - --log-level info
          env:
            - name: HEAPCHECK
              value: "strict"
            - name: HEAPPROFILE
              value: "/tmp/envoy.hprof"
          volumeMounts:
            - name: contour-config
              mountPath: /config
      volumes:
        - name: contour-config
          emptyDir:
            medium: "Memory"
            sizeLimit: 8Mi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: contour-ingress
                  component: debug
              topologyKey: kubernetes.io/hostname
---

Observation ~ Adding Ingresses

Let's observe container memory while adding ingress definitions.

Though envoy specific metrics are currently unavailable (the scrape is disabled); container memory metrics were mirrored by envoy heap allocation prior to disabling.

>$ kubectl -n ingress-system top pods -l'app=contour-ingress,component=debug'
NAME                                          CPU(cores)   MEMORY(bytes)   
contour-ingress-debug-64d76ff7ff-sv7z9         121m         94Mi            

envoy_mem_state_a_pod_start_0_ing

The tool clone-http-ingress.sh is bash script that adds ingress definitions in blocks

  • n - total number of ingress definitions
  • b - block size

The ingress definition is generated from this template.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: $META_NAME
  labels:
    ingress: $INGRESS_CLASS
    role: test
  annotations:
    kubernetes.io/ingress.class: $INGRESS_CLASS
spec:
  tls:
  - hosts:
    - $SPEC_HOST
    secretName: httpbin
  rules:
  - host: $SPEC_HOST
    http:
      paths:
      - path: /
        backend:
          serviceName: httpbin
          servicePort: 8000
---

The command below will add a total of 5000 definitions in blocks of 100.

>$ ./clone-http-ingress.sh -b 100 -n 5000
REPORT
total=5000
block_size=100
delay enabled=0 delay=0.000000

BLOCK    MEMORY   DURATION
00000000 91Mi     17.20000000
00000001 91Mi     15.18000000
00000002 91Mi     19.02000000
00000003 1628Mi   21.68000000
00000004 1628Mi   19.76000000
00000005 1628Mi   20.33000000
00000006 3550Mi   19.08000000
00000007 3550Mi   23.22000000
00000008 3550Mi   24.30000000
00000009 6123Mi   25.92000000
00000010 6123Mi   24.22000000
00000011 8075Mi   28.61000000
00000012 8075Mi   24.51000000
00000013 10248Mi  24.16000000
00000014 10248Mi  30.01000000
00000015 12601Mi  27.30000000
00000016 12601Mi  27.13000000
00000017 12601Mi  29.85000000
00000018 14741Mi  31.88000000
00000019 16793Mi  32.22000000
00000020 16793Mi  34.20000000
00000021 18388Mi  34.93000000
00000022 18388Mi  38.84000000
00000023 20918Mi  40.79000000
00000024 22845Mi  41.11000000
00000025 22845Mi  40.71000000
00000026 24758Mi  43.63000000
00000027 27056Mi  45.06000000
00000028 28994Mi  50.16000000
00000029 30965Mi  46.90000000
00000030 30965Mi  47.86000000
00000031 33168Mi  47.39000000
00000032 35386Mi  48.52000000
00000033 37886Mi  51.76000000
00000034 39765Mi  58.24000000
00000035 42228Mi  56.12000000
00000036 44600Mi  51.11000000
00000037 46879Mi  54.84000000
00000038 46879Mi  53.24000000
00000039 49202Mi  51.93000000
00000040 51236Mi  55.18000000
00000041 53300Mi  53.77000000
00000042 54895Mi  55.32000000
00000043 56731Mi  57.56000000
00000044 58954Mi  58.81000000
00000045 60353Mi  61.28000000
00000046 62116Mi  57.53000000
00000047 63968Mi  61.20000000
00000048 65922Mi  61.22000000
00000049 67487Mi  61.11000000
  • envoy admin page is unresponsive ("/" or any "/command")
  • when enabled, scraping fails (admin /stats page is also unresponsive)
  • memory climbs linearly during addition
  • memory never decreases, stuck at ~64Gi

envoy_mem_state_b_fill_with_n_ing

>$ POD_NAME=$(kubectl get pods -n ingress-system -l "app=contour-ingress,component=debug" -o jsonpath="{.items[0].metadata.name}"); echo $POD_NAME
>$ kubectl -n ingress-system delete pod $POD_NAME
>$ kubectl -n ingress-system top pods -l'app=contour-ingress,component=debug'
NAME                                    CPU(cores)   MEMORY(bytes)   
contour-ingress-debug-64d76ff7ff-kt5hk   284m         1391Mi 
  • memory after reboot is <2Gi
  • all 5000 ingress definitions still exist

envoy_mem_state_d_restart_no_change

Observation ~ Deleting Ingresses

Continuing from the conditions above, let's observe memory while cleaning up the cluster.

>$ kubectl -n ingress-system top pods -l'app=contour-ingress,component=debug'
NAME                                    CPU(cores)   MEMORY(bytes)   
contour-ingress-debug-64d76ff7ff-kt5hk   259m         1228Mi  
  • all 5000 ingress definitions still exist
>$ kubectl delete ing --all
ingress.extensions "httpbin-contour-0" deleted
ingress.extensions "httpbin-contour-1" deleted
...
  • the command takes a while, but eventually completes
  • memory climbs linearly during deletion (~1Gi to ~8Gi)
>$ kubectl -n ingress-system top pods -l'app=contour-ingress,component=debug'
NAME                                    CPU(cores)   MEMORY(bytes)   
contour-ingress-debug-64d76ff7ff-kt5hk   272m         7853Mi   
  • memory never decrease, stuck at ~8Gi

envoy_mem_state_e_delete_ing

Heap Trace

Selecting a heap trace from during the middle of insertion, there does appear to be a fairly large leak. I've included the top set of leaks (not all), to avoid a huge copy-paste.

Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 5321.2 MB
Leak of 366498538 bytes in 436607 objects allocated from:
	@ 0090f485 unknown
	@ 00000000008df612 BUF_memdup ?
	@ 000000000091b217 CRYPTO_BUFFER_new ?
	@ 00000000008ba473 _ZN4bssl8internal11DeleterImplI7x509_stvE4FreeEPS2_ ?
	@ 00000000008baa39 _ZN4bssl8internal11DeleterImplI7x509_stvE4FreeEPS2_ ?
	@ 0000000000688f0c _ZN5Envoy3Ssl11ContextImplC2ERNS0_18ContextManagerImplERNS_5Stats5ScopeERKNS0_13ContextConfigE ?
	@ 0000000000689e06 _ZN5Envoy3Ssl17ServerContextImplC1ERNS0_18ContextManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS9_SaIS9_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEbRNS_7Runtime6LoaderE ?
	@ 000000000068c6af _ZN5Envoy3Ssl18ContextManagerImpl22createSslServerContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEb ?
	@ 000000000056e9bc _ZN5Envoy3Ssl22ServerSslSocketFactoryC2ERKNS0_19ServerContextConfigERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorISA_SaISA_EEbRNS0_14ContextManagerERNS_5Stats5ScopeE ?
	@ 000000000051f83c _ZN5Envoy6Server13Configuration26DownstreamSslSocketFactory28createTransportSocketFactoryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS8_SaIS8_EEbRKN6google8protobuf7MessageERNS1_29TransportSocketFactoryContextE ?
	@ 000000000055a02c _ZN5Envoy6Server12ListenerImplC2ERKN5envoy3api2v28ListenerERNS0_19ListenerManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbm ?
	@ 000000000055b575 _ZN5Envoy6Server19ListenerManagerImpl19addOrUpdateListenerERKN5envoy3api2v28ListenerEb ?
	@ 000000000076b0dd _ZN5Envoy6Server6LdsApi14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldIN5envoy3api2v28ListenerEEE ?
	@ 000000000076cce0 _ZN5Envoy6Config23GrpcMuxSubscriptionImplIN5envoy3api2v28ListenerEE14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldINS8_3AnyEEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ?
	@ 00000000007722b3 _ZN5Envoy6Config11GrpcMuxImpl16onReceiveMessageEOSt10unique_ptrIN5envoy3api2v217DiscoveryResponseESt14default_deleteIS6_EE ?
	@ 000000000076f622 _ZN5Envoy4Grpc25TypedAsyncStreamCallbacksIN5envoy3api2v217DiscoveryResponseEE23onReceiveMessageUntypedEOSt10unique_ptrIN6google8protobuf7MessageESt14default_deleteISA_EE ?
	@ 0000000000788c65 _ZN5Envoy4Grpc15AsyncStreamImpl6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000078d93b _ZN5Envoy4Http15AsyncStreamImpl10encodeDataERNS_6Buffer8InstanceEb ?
	@ 00000000006b2a93 _ZN5Envoy4Http5Http214ConnectionImpl15onFrameReceivedEPK13nghttp2_frame ?
	@ 00000000006b58b6 nghttp2_session_del ?
	@ 00000000006b94e1 nghttp2_session_mem_recv ?
	@ 00000000006b1aee _ZN5Envoy4Http5Http214ConnectionImpl8dispatchERNS_6Buffer8InstanceE ?
	@ 000000000066634e _ZN5Envoy4Http11CodecClient6onDataERNS_6Buffer8InstanceE ?
	@ 00000000006664cc _ZN5Envoy4Http11CodecClient15CodecReadFilter6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000056dc66 _ZN5Envoy7Network17FilterManagerImpl17onContinueReadingEPNS1_16ActiveReadFilterE ?
	@ 000000000056c5ce _ZN5Envoy7Network14ConnectionImpl11onReadReadyEv ?
	@ 000000000056cded _ZN5Envoy7Network14ConnectionImpl11onFileEventEj ?
	@ 0000000000566307 _ZN5Envoy5Event13FileEventImpl8activateEj ?
	@ 00000000008a1d11 event_add_nolock_ ?
	@ 00000000008a246e event_base_loop ?
	@ 000000000054dcdd _ZN5Envoy6Server12InstanceImpl3runEv ?
	@ 0000000000464850 _ZN5Envoy14MainCommonBase3runEv ?
Leak of 261964200 bytes in 436607 objects allocated from:
	@ 0090f485 unknown
	@ 00000000008ae16c SSL_CTX_new ?
	@ 00000000006888ae _ZN5Envoy3Ssl11ContextImplC2ERNS0_18ContextManagerImplERNS_5Stats5ScopeERKNS0_13ContextConfigE ?
	@ 0000000000689e06 _ZN5Envoy3Ssl17ServerContextImplC1ERNS0_18ContextManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS9_SaIS9_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEbRNS_7Runtime6LoaderE ?
	@ 000000000068c6af _ZN5Envoy3Ssl18ContextManagerImpl22createSslServerContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEb ?
	@ 000000000056e9bc _ZN5Envoy3Ssl22ServerSslSocketFactoryC2ERKNS0_19ServerContextConfigERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorISA_SaISA_EEbRNS0_14ContextManagerERNS_5Stats5ScopeE ?
	@ 000000000051f83c _ZN5Envoy6Server13Configuration26DownstreamSslSocketFactory28createTransportSocketFactoryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS8_SaIS8_EEbRKN6google8protobuf7MessageERNS1_29TransportSocketFactoryContextE ?
	@ 000000000055a02c _ZN5Envoy6Server12ListenerImplC2ERKN5envoy3api2v28ListenerERNS0_19ListenerManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbm ?
	@ 000000000055b575 _ZN5Envoy6Server19ListenerManagerImpl19addOrUpdateListenerERKN5envoy3api2v28ListenerEb ?
	@ 000000000076b0dd _ZN5Envoy6Server6LdsApi14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldIN5envoy3api2v28ListenerEEE ?
	@ 000000000076cce0 _ZN5Envoy6Config23GrpcMuxSubscriptionImplIN5envoy3api2v28ListenerEE14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldINS8_3AnyEEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ?
	@ 00000000007722b3 _ZN5Envoy6Config11GrpcMuxImpl16onReceiveMessageEOSt10unique_ptrIN5envoy3api2v217DiscoveryResponseESt14default_deleteIS6_EE ?
	@ 000000000076f622 _ZN5Envoy4Grpc25TypedAsyncStreamCallbacksIN5envoy3api2v217DiscoveryResponseEE23onReceiveMessageUntypedEOSt10unique_ptrIN6google8protobuf7MessageESt14default_deleteISA_EE ?
	@ 0000000000788c65 _ZN5Envoy4Grpc15AsyncStreamImpl6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000078d93b _ZN5Envoy4Http15AsyncStreamImpl10encodeDataERNS_6Buffer8InstanceEb ?
	@ 00000000006b2a93 _ZN5Envoy4Http5Http214ConnectionImpl15onFrameReceivedEPK13nghttp2_frame ?
	@ 00000000006b58b6 nghttp2_session_del ?
	@ 00000000006b94e1 nghttp2_session_mem_recv ?
	@ 00000000006b1aee _ZN5Envoy4Http5Http214ConnectionImpl8dispatchERNS_6Buffer8InstanceE ?
	@ 000000000066634e _ZN5Envoy4Http11CodecClient6onDataERNS_6Buffer8InstanceE ?
	@ 00000000006664cc _ZN5Envoy4Http11CodecClient15CodecReadFilter6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000056dc66 _ZN5Envoy7Network17FilterManagerImpl17onContinueReadingEPNS1_16ActiveReadFilterE ?
	@ 000000000056c5ce _ZN5Envoy7Network14ConnectionImpl11onReadReadyEv ?
	@ 000000000056cded _ZN5Envoy7Network14ConnectionImpl11onFileEventEj ?
	@ 0000000000566307 _ZN5Envoy5Event13FileEventImpl8activateEj ?
	@ 00000000008a1d11 event_add_nolock_ ?
	@ 00000000008a246e event_base_loop ?
	@ 000000000054dcdd _ZN5Envoy6Server12InstanceImpl3runEv ?
	@ 0000000000464850 _ZN5Envoy14MainCommonBase3runEv ?
	@ 00000000004156c8 main ?
	@ 00007f609ebcdb8d unknown
Leak of 244248578 bytes in 436607 objects allocated from:
	@ 0090f485 unknown
	@ 00000000008d7047 asn1_enc_save ?
	@ 00000000008d4ebd ASN1_item_ex_d2i ?
	@ 00000000008d519c ASN1_item_d2i ?
	@ 00000000008d53db ASN1_item_d2i ?
	@ 00000000008d4c3b ASN1_item_ex_d2i ?
	@ 00000000008d4f6a ASN1_item_d2i ?
	@ 000000000090f3c8 d2i_X509_AUX ?
	@ 0000000000913487 PEM_ASN1_read_bio ?
	@ 0000000000688edc _ZN5Envoy3Ssl11ContextImplC2ERNS0_18ContextManagerImplERNS_5Stats5ScopeERKNS0_13ContextConfigE ?
	@ 0000000000689e06 _ZN5Envoy3Ssl17ServerContextImplC1ERNS0_18ContextManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS9_SaIS9_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEbRNS_7Runtime6LoaderE ?
	@ 000000000068c6af _ZN5Envoy3Ssl18ContextManagerImpl22createSslServerContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEb ?
	@ 000000000056e9bc _ZN5Envoy3Ssl22ServerSslSocketFactoryC2ERKNS0_19ServerContextConfigERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorISA_SaISA_EEbRNS0_14ContextManagerERNS_5Stats5ScopeE ?
	@ 000000000051f83c _ZN5Envoy6Server13Configuration26DownstreamSslSocketFactory28createTransportSocketFactoryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS8_SaIS8_EEbRKN6google8protobuf7MessageERNS1_29TransportSocketFactoryContextE ?
	@ 000000000055a02c _ZN5Envoy6Server12ListenerImplC2ERKN5envoy3api2v28ListenerERNS0_19ListenerManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbm ?
	@ 000000000055b575 _ZN5Envoy6Server19ListenerManagerImpl19addOrUpdateListenerERKN5envoy3api2v28ListenerEb ?
	@ 000000000076b0dd _ZN5Envoy6Server6LdsApi14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldIN5envoy3api2v28ListenerEEE ?
	@ 000000000076cce0 _ZN5Envoy6Config23GrpcMuxSubscriptionImplIN5envoy3api2v28ListenerEE14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldINS8_3AnyEEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ?
	@ 00000000007722b3 _ZN5Envoy6Config11GrpcMuxImpl16onReceiveMessageEOSt10unique_ptrIN5envoy3api2v217DiscoveryResponseESt14default_deleteIS6_EE ?
	@ 000000000076f622 _ZN5Envoy4Grpc25TypedAsyncStreamCallbacksIN5envoy3api2v217DiscoveryResponseEE23onReceiveMessageUntypedEOSt10unique_ptrIN6google8protobuf7MessageESt14default_deleteISA_EE ?
	@ 0000000000788c65 _ZN5Envoy4Grpc15AsyncStreamImpl6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000078d93b _ZN5Envoy4Http15AsyncStreamImpl10encodeDataERNS_6Buffer8InstanceEb ?
	@ 00000000006b2a93 _ZN5Envoy4Http5Http214ConnectionImpl15onFrameReceivedEPK13nghttp2_frame ?
	@ 00000000006b58b6 nghttp2_session_del ?
	@ 00000000006b94e1 nghttp2_session_mem_recv ?
	@ 00000000006b1aee _ZN5Envoy4Http5Http214ConnectionImpl8dispatchERNS_6Buffer8InstanceE ?
	@ 000000000066634e _ZN5Envoy4Http11CodecClient6onDataERNS_6Buffer8InstanceE ?
	@ 00000000006664cc _ZN5Envoy4Http11CodecClient15CodecReadFilter6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000056dc66 _ZN5Envoy7Network17FilterManagerImpl17onContinueReadingEPNS1_16ActiveReadFilterE ?
	@ 000000000056c5ce _ZN5Envoy7Network14ConnectionImpl11onReadReadyEv ?
	@ 000000000056cded _ZN5Envoy7Network14ConnectionImpl11onFileEventEj ?
	@ 0000000000566307 _ZN5Envoy5Event13FileEventImpl8activateEj ?
Leak of 185121368 bytes in 436607 objects allocated from:
	@ 0068c684 unknown
	@ 000000000056e9bc _ZN5Envoy3Ssl22ServerSslSocketFactoryC2ERKNS0_19ServerContextConfigERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorISA_SaISA_EEbRNS0_14ContextManagerERNS_5Stats5ScopeE ?
	@ 000000000051f83c _ZN5Envoy6Server13Configuration26DownstreamSslSocketFactory28createTransportSocketFactoryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS8_SaIS8_EEbRKN6google8protobuf7MessageERNS1_29TransportSocketFactoryContextE ?
	@ 000000000055a02c _ZN5Envoy6Server12ListenerImplC2ERKN5envoy3api2v28ListenerERNS0_19ListenerManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbm ?
	@ 000000000055b575 _ZN5Envoy6Server19ListenerManagerImpl19addOrUpdateListenerERKN5envoy3api2v28ListenerEb ?
	@ 000000000076b0dd _ZN5Envoy6Server6LdsApi14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldIN5envoy3api2v28ListenerEEE ?
	@ 000000000076cce0 _ZN5Envoy6Config23GrpcMuxSubscriptionImplIN5envoy3api2v28ListenerEE14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldINS8_3AnyEEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ?
	@ 00000000007722b3 _ZN5Envoy6Config11GrpcMuxImpl16onReceiveMessageEOSt10unique_ptrIN5envoy3api2v217DiscoveryResponseESt14default_deleteIS6_EE ?
	@ 000000000076f622 _ZN5Envoy4Grpc25TypedAsyncStreamCallbacksIN5envoy3api2v217DiscoveryResponseEE23onReceiveMessageUntypedEOSt10unique_ptrIN6google8protobuf7MessageESt14default_deleteISA_EE ?
	@ 0000000000788c65 _ZN5Envoy4Grpc15AsyncStreamImpl6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000078d93b _ZN5Envoy4Http15AsyncStreamImpl10encodeDataERNS_6Buffer8InstanceEb ?
	@ 00000000006b2a93 _ZN5Envoy4Http5Http214ConnectionImpl15onFrameReceivedEPK13nghttp2_frame ?
	@ 00000000006b58b6 nghttp2_session_del ?
	@ 00000000006b94e1 nghttp2_session_mem_recv ?
	@ 00000000006b1aee _ZN5Envoy4Http5Http214ConnectionImpl8dispatchERNS_6Buffer8InstanceE ?
	@ 000000000066634e _ZN5Envoy4Http11CodecClient6onDataERNS_6Buffer8InstanceE ?
	@ 00000000006664cc _ZN5Envoy4Http11CodecClient15CodecReadFilter6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000056dc66 _ZN5Envoy7Network17FilterManagerImpl17onContinueReadingEPNS1_16ActiveReadFilterE ?
	@ 000000000056c5ce _ZN5Envoy7Network14ConnectionImpl11onReadReadyEv ?
	@ 000000000056cded _ZN5Envoy7Network14ConnectionImpl11onFileEventEj ?
	@ 0000000000566307 _ZN5Envoy5Event13FileEventImpl8activateEj ?
	@ 00000000008a1d11 event_add_nolock_ ?
	@ 00000000008a246e event_base_loop ?
	@ 000000000054dcdd _ZN5Envoy6Server12InstanceImpl3runEv ?
	@ 0000000000464850 _ZN5Envoy14MainCommonBase3runEv ?
	@ 00000000004156c8 main ?
	@ 00007f609ebcdb8d unknown
Leak of 121306021 bytes in 436607 objects allocated from:
	@ 0090f485 unknown
	@ 000000000091d8cd c2i_ASN1_BIT_STRING ?
	@ 00000000008d3cf6 asn1_ex_c2i ?
	@ 00000000008d414b asn1_ex_c2i ?
	@ 00000000008d4e0a ASN1_item_ex_d2i ?
	@ 00000000008d519c ASN1_item_d2i ?
	@ 00000000008d53db ASN1_item_d2i ?
	@ 00000000008d4c3b ASN1_item_ex_d2i ?
	@ 00000000008d519c ASN1_item_d2i ?
	@ 00000000008d53db ASN1_item_d2i ?
	@ 00000000008d4c3b ASN1_item_ex_d2i ?
	@ 00000000008d519c ASN1_item_d2i ?
	@ 00000000008d53db ASN1_item_d2i ?
	@ 00000000008d4c3b ASN1_item_ex_d2i ?
	@ 00000000008d4f6a ASN1_item_d2i ?
	@ 000000000090f3c8 d2i_X509_AUX ?
	@ 0000000000913487 PEM_ASN1_read_bio ?
	@ 0000000000688edc _ZN5Envoy3Ssl11ContextImplC2ERNS0_18ContextManagerImplERNS_5Stats5ScopeERKNS0_13ContextConfigE ?
	@ 0000000000689e06 _ZN5Envoy3Ssl17ServerContextImplC1ERNS0_18ContextManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS9_SaIS9_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEbRNS_7Runtime6LoaderE ?
	@ 000000000068c6af _ZN5Envoy3Ssl18ContextManagerImpl22createSslServerContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EERNS_5Stats5ScopeERKNS0_19ServerContextConfigEb ?
	@ 000000000056e9bc _ZN5Envoy3Ssl22ServerSslSocketFactoryC2ERKNS0_19ServerContextConfigERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorISA_SaISA_EEbRNS0_14ContextManagerERNS_5Stats5ScopeE ?
	@ 000000000051f83c _ZN5Envoy6Server13Configuration26DownstreamSslSocketFactory28createTransportSocketFactoryERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS8_SaIS8_EEbRKN6google8protobuf7MessageERNS1_29TransportSocketFactoryContextE ?
	@ 000000000055a02c _ZN5Envoy6Server12ListenerImplC2ERKN5envoy3api2v28ListenerERNS0_19ListenerManagerImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbm ?
	@ 000000000055b575 _ZN5Envoy6Server19ListenerManagerImpl19addOrUpdateListenerERKN5envoy3api2v28ListenerEb ?
	@ 000000000076b0dd _ZN5Envoy6Server6LdsApi14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldIN5envoy3api2v28ListenerEEE ?
	@ 000000000076cce0 _ZN5Envoy6Config23GrpcMuxSubscriptionImplIN5envoy3api2v28ListenerEE14onConfigUpdateERKN6google8protobuf16RepeatedPtrFieldINS8_3AnyEEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ?
	@ 00000000007722b3 _ZN5Envoy6Config11GrpcMuxImpl16onReceiveMessageEOSt10unique_ptrIN5envoy3api2v217DiscoveryResponseESt14default_deleteIS6_EE ?
	@ 000000000076f622 _ZN5Envoy4Grpc25TypedAsyncStreamCallbacksIN5envoy3api2v217DiscoveryResponseEE23onReceiveMessageUntypedEOSt10unique_ptrIN6google8protobuf7MessageESt14default_deleteISA_EE ?
	@ 0000000000788c65 _ZN5Envoy4Grpc15AsyncStreamImpl6onDataERNS_6Buffer8InstanceEb ?
	@ 000000000078d93b _ZN5Envoy4Http15AsyncStreamImpl10encodeDataERNS_6Buffer8InstanceEb ?
	@ 00000000006b2a93 _ZN5Envoy4Http5Http214ConnectionImpl15onFrameReceivedEPK13nghttp2_frame ?
	@ 00000000006b58b6 nghttp2_session_del ?
@mattalberts
Copy link
Author

envoy.hprof.0180.heap.txt

  • head trace output for envoy

@mattalberts mattalberts changed the title Observed container memory while adding|removing ingress Observed envoy memory while adding|removing ingress Jul 3, 2018
@davecheney davecheney self-assigned this Jul 5, 2018
@davecheney davecheney added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 5, 2018
@davecheney davecheney added this to the 0.6.0 milestone Jul 5, 2018
@davecheney
Copy link
Contributor

@mattalberts thank you for your detailed bug report. I'm sorry that Contour is behaviour like that, there is no reason for Contour to consume gigabytes of memory, let alone hundreds of megabytes.

Internally at Heptio I've also had a report that matches your symptoms, but from memory it never made it into an issue (/cc @alexbrand, please correct me if I'm wrong).

I see that you're using Contour 0.5.0. The tricky bit about investigating this issue is that over the last two weeks almost all of contours business logic has been rewritten to support the new CRD we're adding in 0.6. The downside of that is any effort spent investigating what is going on with 0.5 is probably wasted effort as the code that may be causing the issue has likely gone to data heaven.

I just landed #504 which marks the rewrite complete with respect to the features of Contour 0.5 -- everything that worked in 0.5 is confirmed to work in master now.

How would you feel about trying to reproduce the problem with gcr.io/heptio-images/contour:master? If you don't feel comfortable chasing :master, we'll be doing a 0.6 beta.1 release very soon, probably early next week.

I'm going to continue to investigate this issue and have tagged it as a blocker for 0.6 final.

@davecheney davecheney added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 5, 2018
@alexbrand
Copy link
Contributor

We saw a memory consumption issue in the contour process, which I was not able to reproduce in 0.6-alpha.1 (#424 (comment)).

It seems to me like this is a leak in Envoy though?

@mattalberts
Copy link
Author

@alexbrand @davecheney Hi! Thanks for writing back. Contour actually appears to be the reasonable container (I also really like the project :P); its envoy that appears to be leaky (though I wasn't certain where to origin the issue).

I'm happy to start eval'ing from master! Its pretty easy to stand up a test environment and re-run my scripts :).

@davecheney
Copy link
Contributor

Thanks for confirming. Even though contour's container is well behaved the issue causing Envoy to grow huge might be contour's fault; something with the way we're communicating with Envoy perhaps. I'm going to keep this issue open as a high priority, but also reference #443 which we'll do whenever we start on the 0.7 cycle (probably august)

@mattalberts
Copy link
Author

mattalberts commented Jul 6, 2018

Update ~ contour:master

Based on interested, I've updated to contour:master and re-run tests.

The results are largely the same; though, the slope of the line plotting memory growth is lower (e.g. we still grow into GBs of memory used, but top out at ~32GB rather than ~64GB).

@davecheney
@alexbrand

Observation ~ Adding Ingresses

malbook:ingress-system malberts$ ./tools/clone-http-ingress.sh -n 5000 -b 100
REPORT
total=5000
block_size=100
delay enabled=0 duration=0.000000

BLOCK    MEMORY   DURATION
00000000 38Mi     12.73000000
00000001 38Mi     13.51000000
00000002 38Mi     15.69000000
00000003 727Mi    16.91000000
00000004 727Mi    19.13000000
00000005 727Mi    18.27000000
00000006 1397Mi   19.66000000
00000007 1397Mi   17.98000000
00000008 1397Mi   20.06000000
00000009 2322Mi   20.20000000
00000010 2322Mi   21.42000000
00000011 2322Mi   23.06000000
00000012 3201Mi   26.34000000
00000013 3201Mi   26.48000000
00000014 3962Mi   26.93000000
00000015 3962Mi   31.62000000
00000016 4750Mi   26.29000000
00000017 4750Mi   29.35000000
00000018 5308Mi   31.72000000
00000019 5308Mi   29.80000000
00000020 6104Mi   33.49000000
00000021 6104Mi   33.40000000
00000022 7012Mi   36.17000000
00000023 7745Mi   44.15000000
00000024 8615Mi   48.12000000
00000025 8615Mi   46.05000000
00000026 9419Mi   42.14000000
00000027 10281Mi  41.55000000
00000028 10281Mi  43.36000000
00000029 11109Mi  45.34000000
00000030 11907Mi  47.71000000
00000031 12713Mi  70.68000000
00000032 13480Mi  70.05000000
00000033 15110Mi  76.67000000
00000034 15853Mi  66.57000000
00000035 16718Mi  75.32000000
00000036 17501Mi  77.54000000
00000037 19168Mi  72.29000000
00000038 20075Mi  74.16000000
00000039 20848Mi  82.56000000
00000040 22403Mi  95.71000000
00000041 23176Mi  75.30000000
00000042 23908Mi  76.13000000
00000043 25675Mi  85.38000000
00000044 26451Mi  81.63000000
00000045 27945Mi  83.64000000
00000046 28724Mi  85.74000000
00000047 30249Mi  87.59000000
00000048 31112Mi  90.99000000
00000049 32721Mi  100.62000000

contour_master_envoy_mem_fill_with_n_ing

Heap Trace

Again, I've only included a portion of the output to reduce the copy-paste size. At least I was able to get a more legible stack trace (the raw-trace is attached).

(pprof) top
Total: 9982.5 MB
Leak of 697222424 bytes in 831016 objects allocated from:
	@ 0090f485 unknown
	@ 00000000008df612 BUF_memdup ??:0
	@ 000000000091b217 CRYPTO_BUFFER_new ??:0
	@ 00000000008ba473 bssl::x509_to_buffer ssl_x509.cc:0
	@ 00000000008baa39 ssl_use_certificate ssl_x509.cc:0
	@ 0000000000688f0c Envoy::Ssl::ContextImpl::ContextImpl /proc/self/cwd/source/common/ssl/context_impl.cc:144
	@ 0000000000689e06 Envoy::Ssl::ServerContextImpl::ServerContextImpl /proc/self/cwd/source/common/ssl/context_impl.cc:439
	@ 000000000068c6af Envoy::Ssl::ContextManagerImpl::createSslServerContext /proc/self/cwd/source/common/ssl/context_manager_impl.cc:75
	@ 000000000056e9bc Envoy::Ssl::ServerSslSocketFactory::ServerSslSocketFactory /proc/self/cwd/source/common/ssl/ssl_socket.cc:375
	@ 000000000051f83c Envoy::Server::Configuration::DownstreamSslSocketFactory::createTransportSocketFactory /proc/self/cwd/source/server/config/network/ssl_socket.cc:36
	@ 000000000055a02c Envoy::Server::ListenerImpl::ListenerImpl /proc/self/cwd/source/server/listener_manager_impl.cc:203
	@ 000000000055b575 Envoy::Server::ListenerManagerImpl::addOrUpdateListener /proc/self/cwd/source/server/listener_manager_impl.cc:330
	@ 000000000076b0dd Envoy::Server::LdsApi::onConfigUpdate /proc/self/cwd/source/server/lds_api.cc:59
	@ 000000000076cce0 Envoy::Config::GrpcMuxSubscriptionImpl::onConfigUpdate /proc/self/cwd/bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:53
	@ 00000000007722b3 Envoy::Config::GrpcMuxImpl::onReceiveMessage /proc/self/cwd/source/common/config/grpc_mux_impl.cc:174
	@ 000000000076f622 Envoy::Grpc::TypedAsyncStreamCallbacks::onReceiveMessageUntyped /proc/self/cwd/bazel-out/k8-opt/bin/include/envoy/grpc/_virtual_includes/async_client_interface/envoy/grpc/async_client.h:172
	@ 0000000000788c65 Envoy::Grpc::AsyncStreamImpl::onData /proc/self/cwd/source/common/grpc/async_client_impl.cc:131
	@ 000000000078d93b Envoy::Http::AsyncStreamImpl::encodeData /proc/self/cwd/source/common/http/async_client_impl.cc:108
	@ 00000000006b2a93 Envoy::Http::Http2::ConnectionImpl::onFrameReceived /proc/self/cwd/source/common/http/http2/codec_impl.cc:445
	@ 00000000006b58b6 nghttp2_session_on_data_received /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:4881
	@ 00000000006b94e1 nghttp2_session_mem_recv /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:6443
	@ 00000000006b1aee Envoy::Http::Http2::ConnectionImpl::dispatch /proc/self/cwd/source/common/http/http2/codec_impl.cc:302
	@ 000000000066634e Envoy::Http::CodecClient::onData /proc/self/cwd/source/common/http/codec_client.cc:115
	@ 00000000006664cc Envoy::Http::CodecClient::CodecReadFilter::onData /proc/self/cwd/bazel-out/k8-opt/bin/source/common/http/_virtual_includes/codec_client_lib/common/http/codec_client.h:159
	@ 000000000056dc66 Envoy::Network::FilterManagerImpl::onContinueReading /proc/self/cwd/source/common/network/filter_manager_impl.cc:56
	@ 000000000056c5ce Envoy::Network::ConnectionImpl::onReadReady /proc/self/cwd/source/common/network/connection_impl.cc:443
	@ 000000000056cded Envoy::Network::ConnectionImpl::onFileEvent /proc/self/cwd/source/common/network/connection_impl.cc:419
	@ 0000000000566307 _FUN /proc/self/cwd/source/common/event/file_event_impl.cc:61
	@ 00000000008a1d11 event_process_active_single_queue.isra.29 /tmp/libevent.dep.build/libevent-2.1.8-stable/event.c:1639
	@ 00000000008a246e event_base_loop /tmp/libevent.dep.build/libevent-2.1.8-stable/event.c:1961
	@ 000000000054dcdd Envoy::Server::InstanceImpl::run /proc/self/cwd/source/server/server.cc:356
	@ 0000000000464850 Envoy::MainCommonBase::run /proc/self/cwd/source/exe/main_common.cc:83
Leak of 498609600 bytes in 831016 objects allocated from:
	@ 0090f485 unknown
	@ 00000000008ae16c SSL_CTX_new ??:0
	@ 00000000006888ae Envoy::Ssl::ContextImpl::ContextImpl /proc/self/cwd/source/common/ssl/context_impl.cc:34
	@ 0000000000689e06 Envoy::Ssl::ServerContextImpl::ServerContextImpl /proc/self/cwd/source/common/ssl/context_impl.cc:439
	@ 000000000068c6af Envoy::Ssl::ContextManagerImpl::createSslServerContext /proc/self/cwd/source/common/ssl/context_manager_impl.cc:75
	@ 000000000056e9bc Envoy::Ssl::ServerSslSocketFactory::ServerSslSocketFactory /proc/self/cwd/source/common/ssl/ssl_socket.cc:375
	@ 000000000051f83c Envoy::Server::Configuration::DownstreamSslSocketFactory::createTransportSocketFactory /proc/self/cwd/source/server/config/network/ssl_socket.cc:36
	@ 000000000055a02c Envoy::Server::ListenerImpl::ListenerImpl /proc/self/cwd/source/server/listener_manager_impl.cc:203
	@ 000000000055b575 Envoy::Server::ListenerManagerImpl::addOrUpdateListener /proc/self/cwd/source/server/listener_manager_impl.cc:330
	@ 000000000076b0dd Envoy::Server::LdsApi::onConfigUpdate /proc/self/cwd/source/server/lds_api.cc:59
	@ 000000000076cce0 Envoy::Config::GrpcMuxSubscriptionImpl::onConfigUpdate /proc/self/cwd/bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:53
	@ 00000000007722b3 Envoy::Config::GrpcMuxImpl::onReceiveMessage /proc/self/cwd/source/common/config/grpc_mux_impl.cc:174
	@ 000000000076f622 Envoy::Grpc::TypedAsyncStreamCallbacks::onReceiveMessageUntyped /proc/self/cwd/bazel-out/k8-opt/bin/include/envoy/grpc/_virtual_includes/async_client_interface/envoy/grpc/async_client.h:172
	@ 0000000000788c65 Envoy::Grpc::AsyncStreamImpl::onData /proc/self/cwd/source/common/grpc/async_client_impl.cc:131
	@ 000000000078d93b Envoy::Http::AsyncStreamImpl::encodeData /proc/self/cwd/source/common/http/async_client_impl.cc:108
	@ 00000000006b2a93 Envoy::Http::Http2::ConnectionImpl::onFrameReceived /proc/self/cwd/source/common/http/http2/codec_impl.cc:445
	@ 00000000006b58b6 nghttp2_session_on_data_received /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:4881
	@ 00000000006b94e1 nghttp2_session_mem_recv /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:6443
	@ 00000000006b1aee Envoy::Http::Http2::ConnectionImpl::dispatch /proc/self/cwd/source/common/http/http2/codec_impl.cc:302
	@ 000000000066634e Envoy::Http::CodecClient::onData /proc/self/cwd/source/common/http/codec_client.cc:115
	@ 00000000006664cc Envoy::Http::CodecClient::CodecReadFilter::onData /proc/self/cwd/bazel-out/k8-opt/bin/source/common/http/_virtual_includes/codec_client_lib/common/http/codec_client.h:159
	@ 000000000056dc66 Envoy::Network::FilterManagerImpl::onContinueReading /proc/self/cwd/source/common/network/filter_manager_impl.cc:56
	@ 000000000056c5ce Envoy::Network::ConnectionImpl::onReadReady /proc/self/cwd/source/common/network/connection_impl.cc:443
	@ 000000000056cded Envoy::Network::ConnectionImpl::onFileEvent /proc/self/cwd/source/common/network/connection_impl.cc:419
	@ 0000000000566307 _FUN /proc/self/cwd/source/common/event/file_event_impl.cc:61
	@ 00000000008a1d11 event_process_active_single_queue.isra.29 /tmp/libevent.dep.build/libevent-2.1.8-stable/event.c:1639
	@ 00000000008a246e event_base_loop /tmp/libevent.dep.build/libevent-2.1.8-stable/event.c:1961
	@ 000000000054dcdd Envoy::Server::InstanceImpl::run /proc/self/cwd/source/server/server.cc:356
	@ 0000000000464850 Envoy::MainCommonBase::run /proc/self/cwd/source/exe/main_common.cc:83
	@ 00000000004156c8 main /proc/self/cwd/source/exe/main.cc:30
	@ 00007f114a497b8d unknown
Leak of 464537944 bytes in 831016 objects allocated from:
	@ 0090f485 unknown
	@ 00000000008d7047 asn1_enc_save ??:0
	@ 00000000008d4ebd ASN1_item_ex_d2i ??:0
	@ 00000000008d519c asn1_template_noexp_d2i tasn_dec.c:0
	@ 00000000008d53db asn1_template_ex_d2i tasn_dec.c:0
	@ 00000000008d4c3b ASN1_item_ex_d2i ??:0
	@ 00000000008d4f6a ASN1_item_d2i ??:0
	@ 000000000090f3c8 d2i_X509_AUX ??:0
	@ 0000000000913487 PEM_ASN1_read_bio ??:0
	@ 0000000000688edc Envoy::Ssl::ContextImpl::ContextImpl /proc/self/cwd/source/common/ssl/context_impl.cc:143
	@ 0000000000689e06 Envoy::Ssl::ServerContextImpl::ServerContextImpl /proc/self/cwd/source/common/ssl/context_impl.cc:439
	@ 000000000068c6af Envoy::Ssl::ContextManagerImpl::createSslServerContext /proc/self/cwd/source/common/ssl/context_manager_impl.cc:75
	@ 000000000056e9bc Envoy::Ssl::ServerSslSocketFactory::ServerSslSocketFactory /proc/self/cwd/source/common/ssl/ssl_socket.cc:375
	@ 000000000051f83c Envoy::Server::Configuration::DownstreamSslSocketFactory::createTransportSocketFactory /proc/self/cwd/source/server/config/network/ssl_socket.cc:36
	@ 000000000055a02c Envoy::Server::ListenerImpl::ListenerImpl /proc/self/cwd/source/server/listener_manager_impl.cc:203
	@ 000000000055b575 Envoy::Server::ListenerManagerImpl::addOrUpdateListener /proc/self/cwd/source/server/listener_manager_impl.cc:330
	@ 000000000076b0dd Envoy::Server::LdsApi::onConfigUpdate /proc/self/cwd/source/server/lds_api.cc:59
	@ 000000000076cce0 Envoy::Config::GrpcMuxSubscriptionImpl::onConfigUpdate /proc/self/cwd/bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:53
	@ 00000000007722b3 Envoy::Config::GrpcMuxImpl::onReceiveMessage /proc/self/cwd/source/common/config/grpc_mux_impl.cc:174
	@ 000000000076f622 Envoy::Grpc::TypedAsyncStreamCallbacks::onReceiveMessageUntyped /proc/self/cwd/bazel-out/k8-opt/bin/include/envoy/grpc/_virtual_includes/async_client_interface/envoy/grpc/async_client.h:172
	@ 0000000000788c65 Envoy::Grpc::AsyncStreamImpl::onData /proc/self/cwd/source/common/grpc/async_client_impl.cc:131
	@ 000000000078d93b Envoy::Http::AsyncStreamImpl::encodeData /proc/self/cwd/source/common/http/async_client_impl.cc:108
	@ 00000000006b2a93 Envoy::Http::Http2::ConnectionImpl::onFrameReceived /proc/self/cwd/source/common/http/http2/codec_impl.cc:445
	@ 00000000006b58b6 nghttp2_session_on_data_received /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:4881
	@ 00000000006b94e1 nghttp2_session_mem_recv /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:6443
	@ 00000000006b1aee Envoy::Http::Http2::ConnectionImpl::dispatch /proc/self/cwd/source/common/http/http2/codec_impl.cc:302
	@ 000000000066634e Envoy::Http::CodecClient::onData /proc/self/cwd/source/common/http/codec_client.cc:115
	@ 00000000006664cc Envoy::Http::CodecClient::CodecReadFilter::onData /proc/self/cwd/bazel-out/k8-opt/bin/source/common/http/_virtual_includes/codec_client_lib/common/http/codec_client.h:159
	@ 000000000056dc66 Envoy::Network::FilterManagerImpl::onContinueReading /proc/self/cwd/source/common/network/filter_manager_impl.cc:56
	@ 000000000056c5ce Envoy::Network::ConnectionImpl::onReadReady /proc/self/cwd/source/common/network/connection_impl.cc:443
	@ 000000000056cded Envoy::Network::ConnectionImpl::onFileEvent /proc/self/cwd/source/common/network/connection_impl.cc:419
	@ 0000000000566307 _FUN /proc/self/cwd/source/common/event/file_event_impl.cc:61
Leak of 352350784 bytes in 831016 objects allocated from:
	@ 0068c684 unknown
	@ 000000000056e9bc Envoy::Ssl::ServerSslSocketFactory::ServerSslSocketFactory /proc/self/cwd/source/common/ssl/ssl_socket.cc:375
	@ 000000000051f83c Envoy::Server::Configuration::DownstreamSslSocketFactory::createTransportSocketFactory /proc/self/cwd/source/server/config/network/ssl_socket.cc:36
	@ 000000000055a02c Envoy::Server::ListenerImpl::ListenerImpl /proc/self/cwd/source/server/listener_manager_impl.cc:203
	@ 000000000055b575 Envoy::Server::ListenerManagerImpl::addOrUpdateListener /proc/self/cwd/source/server/listener_manager_impl.cc:330
	@ 000000000076b0dd Envoy::Server::LdsApi::onConfigUpdate /proc/self/cwd/source/server/lds_api.cc:59
	@ 000000000076cce0 Envoy::Config::GrpcMuxSubscriptionImpl::onConfigUpdate /proc/self/cwd/bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:53
	@ 00000000007722b3 Envoy::Config::GrpcMuxImpl::onReceiveMessage /proc/self/cwd/source/common/config/grpc_mux_impl.cc:174
	@ 000000000076f622 Envoy::Grpc::TypedAsyncStreamCallbacks::onReceiveMessageUntyped /proc/self/cwd/bazel-out/k8-opt/bin/include/envoy/grpc/_virtual_includes/async_client_interface/envoy/grpc/async_client.h:172
	@ 0000000000788c65 Envoy::Grpc::AsyncStreamImpl::onData /proc/self/cwd/source/common/grpc/async_client_impl.cc:131
	@ 000000000078d93b Envoy::Http::AsyncStreamImpl::encodeData /proc/self/cwd/source/common/http/async_client_impl.cc:108
	@ 00000000006b2a93 Envoy::Http::Http2::ConnectionImpl::onFrameReceived /proc/self/cwd/source/common/http/http2/codec_impl.cc:445
	@ 00000000006b58b6 nghttp2_session_on_data_received /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:4881
	@ 00000000006b94e1 nghttp2_session_mem_recv /tmp/nghttp2.dep.build/nghttp2-1.29.0/lib/nghttp2_session.c:6443
	@ 00000000006b1aee Envoy::Http::Http2::ConnectionImpl::dispatch /proc/self/cwd/source/common/http/http2/codec_impl.cc:302
	@ 000000000066634e Envoy::Http::CodecClient::onData /proc/self/cwd/source/common/http/codec_client.cc:115
	@ 00000000006664cc Envoy::Http::CodecClient::CodecReadFilter::onData /proc/self/cwd/bazel-out/k8-opt/bin/source/common/http/_virtual_includes/codec_client_lib/common/http/codec_client.h:159
	@ 000000000056dc66 Envoy::Network::FilterManagerImpl::onContinueReading /proc/self/cwd/source/common/network/filter_manager_impl.cc:56
	@ 000000000056c5ce Envoy::Network::ConnectionImpl::onReadReady /proc/self/cwd/source/common/network/connection_impl.cc:443
	@ 000000000056cded Envoy::Network::ConnectionImpl::onFileEvent /proc/self/cwd/source/common/network/connection_impl.cc:419
	@ 0000000000566307 _FUN /proc/self/cwd/source/common/event/file_event_impl.cc:61
	@ 00000000008a1d11 event_process_active_single_queue.isra.29 /tmp/libevent.dep.build/libevent-2.1.8-stable/event.c:1639
	@ 00000000008a246e event_base_loop /tmp/libevent.dep.build/libevent-2.1.8-stable/event.c:1961
	@ 000000000054dcdd Envoy::Server::InstanceImpl::run /proc/self/cwd/source/server/server.cc:356
	@ 0000000000464850 Envoy::MainCommonBase::run /proc/self/cwd/source/exe/main_common.cc:83
	@ 00000000004156c8 main /proc/self/cwd/source/exe/main.cc:30
	@ 00007f114a497b8d unknown

  8605.2  86.2%  86.2%   8605.2  86.2% OPENSSL_malloc
   821.0   8.2%  94.4%    821.0   8.2% OPENSSL_realloc
   336.0   3.4%  97.8%   9843.3  98.6% Envoy::Ssl::ContextManagerImpl::createSslServerContext
    53.5   0.5%  98.3%     53.5   0.5% __gnu_cxx::new_allocator::allocate (inline)
    46.3   0.5%  98.8%     46.3   0.5% std::__cxx11::basic_string::_M_construct
    19.0   0.2%  99.0%     20.0   0.2% std::_List_node::_List_node (inline)
    13.2   0.1%  99.1%     13.2   0.1% std::__cxx11::basic_string::reserve
    12.8   0.1%  99.2%     12.8   0.1% std::__cxx11::basic_string::_M_mutate
    12.7   0.1%  99.4%   9856.9  98.7% std::make_unique (inline)
     9.5   0.1%  99.5%      9.5   0.1% std::__fill_a (inline)

envoy.hprof.0220.heap.txt

@davecheney
Copy link
Contributor

davecheney commented Jul 7, 2018 via email

@mattalberts
Copy link
Author

mattalberts commented Jul 10, 2018

@davecheney
That is an interesting theory! The stack trace does suggest a leak related to certificate creation. I've been attempting to audit the code/classes responsible for the memory alloc/free related to certificates and keys.

Correct. The memory hungry container is envoy. Contour remains within reasonable range across ingress insertion. Here, I've grabbed a screenshot of each container (in isolation) at the same time window. I've annotated the images to separate the different stages

  • start - fresh start, no ingresses
  • insert - beging insertion (5000 total in blocks of 100)
  • stable - insert has finished, containers are idle
  • restart - delete the pod and let it recreate to see the difference in memory vs stable

Contour Memory

contour_mem_start_add_stable_start

Envoy Memory

envoy_mem_start_add_stable_start

@rosskukulinski
Copy link
Contributor

@mattalberts can you share clone-http-ingress.sh? We've been doing similar perf testing looking at the impact of memory for Envoy and Contour, but with IngressRoute objects, not Ingress (and without TLS)

Do all your Ingress objects share the same TLS key?

FWIW, I also hacked something similar together: https://github.com/rosskukulinski/contour-envoy-memory-batch.

Leaving this in the 0.6.0 milestone for now, but we may also want to see how things change with Envoy 1.7.

@mattalberts
Copy link
Author

@rosskukulinski

I sure can! Its a little rough, especially right here.

MEM=$(kubectl -n ingress-system top pods -l'app=contour-ingress,component=debug' | awk 'NR==2{print $3}')

clone-http-ingress.txt

  • replace the .txt suffix with .sh

@mattalberts
Copy link
Author

@rosskukulinski I have this script too (a derivation of the first script) that i used for log rate testing
(see #556). It adds generates a service, deployment and ingress (might be more helpful)

clone-echo.txt

  • replace the .txt suffix with .sh

@davecheney davecheney modified the milestones: 0.6.0, 0.7.0 Sep 17, 2018
@davecheney davecheney added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Oct 23, 2018
@davecheney davecheney removed their assignment Oct 23, 2018
@davecheney davecheney modified the milestones: 0.7.0, 0.8.0 Oct 23, 2018
@davecheney
Copy link
Contributor

Bumping to 0.8.

This issue is important, but can't be p0 this late in the cycle.

@davecheney davecheney added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Oct 23, 2018
@davecheney davecheney removed this from the 0.8.0 milestone Nov 13, 2018
davecheney added a commit to davecheney/contour that referenced this issue Jun 19, 2019
Updates projectcontour#499
Updates projectcontour#273
Updates projectcontour#1176

The XDS spec says that Envoy will always initiate a stream with a
discovery request, and expects the management server to respond with
only one discovery response. After that, Envoy will initiate another
discovery request containing an ACK or a NACK from the previous
response.

Currently Contour ignores the ACK/NACK, this is projectcontour#1176, however after
inspection of the current code it is evident that we're also not waiting
for Envoy to send the next discovery request.

This PR removes the inner `for {}` loop that would continue to reuse the
initial discovery request until the client disconnected. The previous
code was written in a time when we'd just implemented filtering and it
was possible for the filter to return no results, hence the inner loop
was--incorrectly--trying to loop until there was a result to return.

Huge thanks to @lrouquette who pointed this out.

Signed-off-by: Dave Cheney <[email protected]>
@davecheney davecheney modified the milestones: 0.14.0, 0.15.0 Jul 19, 2019
@davecheney
Copy link
Contributor

We've continued to work on #499 throughout 0.14 by reducing the number of spurious updates sent to Envoy. Moving to 0.15 as work continues.

@davecheney
Copy link
Contributor

Moving to 1.0 beta 1 as #1039 is not yet available. We'll continue to work on this ticket throughout the next few milestones, including #1159.

@davecheney davecheney modified the milestones: 0.15.0, 1.0.0-beta.1 Aug 21, 2019
@davecheney
Copy link
Contributor

In 0.15 we added filtering for unrelated secrets and services. Moving to the next milestone as there is no more work scheduled for this release.

@davecheney
Copy link
Contributor

Moving to the 1.0 release milestone as there is work scheduled for the release candidates.

@davecheney davecheney modified the milestones: 1.0.0-rc.1, 1.0.0 Sep 29, 2019
davecheney added a commit to davecheney/contour that referenced this issue Oct 21, 2019
Fixes projectcontour#1425
Fixes projectcontour#1385
Updates projectcontour#499

This PR threads the leader elected signal throught to
contour.EventHandler allowing it to skip writing status back to the API
unless it is currently the leader.

This should fixes projectcontour#1425 by removing the condition where several Contours
would fight to update status. This updates projectcontour#499 by continuing to reduce
the number of updates that Contour generates, thereby processes.

This PR does create a condition where during startup no Contour may be
the leader and the xDS tables reach steady state before anyone is
elected. This would mean the status of an object would be stale until
the next update from the API server after leadership was established.
To address this a mechanism to force a rebuild of the dag is added to
the EventHandler and wired to election success.

Signed-off-by: Dave Cheney <[email protected]>
davecheney added a commit that referenced this issue Oct 21, 2019
Fixes #1425
Fixes #1385
Updates #499

This PR threads the leader elected signal throught to
contour.EventHandler allowing it to skip writing status back to the API
unless it is currently the leader.

This should fixes #1425 by removing the condition where several Contours
would fight to update status. This updates #499 by continuing to reduce
the number of updates that Contour generates, thereby processes.

This PR does create a condition where during startup no Contour may be
the leader and the xDS tables reach steady state before anyone is
elected. This would mean the status of an object would be stale until
the next update from the API server after leadership was established.
To address this a mechanism to force a rebuild of the dag is added to
the EventHandler and wired to election success.

Signed-off-by: Dave Cheney <[email protected]>
@davecheney davecheney modified the milestones: 1.0.0, Backlog Oct 27, 2019
@davecheney
Copy link
Contributor

Hello,

TL;DR upgrade to Contour 1.2.0 or later and follow the recommendation to use Envoy 1.13.0 or later.

After some investigations on an internally reported issue I am pleased to say this issue can be bought to a close. The root cause of the issue was envoyproxy/envoy#7923 which caused envoy to keep N squared copies of the RDS database in memory for each LDS update. This meant that as the number of vhosts defined across Ingress/IngressRoute/HTTPProxy documents that used TLS grew, this would consume N*N memory on the envoy side for each configuration update. Said another way, the memory consumed by Envoy for each configuration was quadratic, not linear. This issues was resolved upstream in envoyproxy/envoy#9209 and shipped as part of Envoy 1.13.0.

I am marking this issue as complete against the 1.2.0 milestone. The remaining work to reduce the cost of LDS updates is tracked on #1039 which at the time of writing remains blocked on upstream support for FDS.

/cc @michmike @pickledrick

@davecheney davecheney modified the milestones: Backlog, 1.2.0 Feb 27, 2020
@davecheney davecheney removed blocked Blocked waiting on a dependency priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 27, 2020
@michmike
Copy link
Contributor

great work team to bring this to a close!

@mattalberts
Copy link
Author

mattalberts commented Feb 27, 2020 via email

sunjayBhatia pushed a commit that referenced this issue Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants