Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage #193

Open
smaznet opened this issue May 12, 2024 · 11 comments
Open

High CPU usage #193

smaznet opened this issue May 12, 2024 · 11 comments
Labels
need info The issue is missing critical information

Comments

@smaznet
Copy link

smaznet commented May 12, 2024

i used caddy-l4 for forwarding data to unix sockets but i have cpu issue with caddy

handler reverse_proxy with proxy_protocol enabled

@mohammed90
Copy link
Collaborator

It's hard to resolve such claim without specifics. You can start by reading this page to extract CPU profiles from Caddy. Share the profiles with us.

I've attached a template below that will help make this easier and faster! This will require some effort on your part -- please understand that we will be dedicating time to fix the bug you are reporting if you can just help us understand it and reproduce it easily.

This template will ask for some information you've already provided; that's OK, just fill it out the best you can. 👍 I've also included some helpful tips below the template. Feel free to let me know if you have any questions!

Thank you again for your report, we look forward to resolving it!

Template

## 1. Environment

### 1a. Operating system and version

```
paste here
```


### 1b. Caddy version (run `caddy version` or paste commit SHA)

```
paste here
```


### 1c. Go version (if building Caddy from source; run `go version`)

```
paste here
```


## 2. Description

### 2a. What happens (briefly explain what is wrong)




### 2b. Why it's a bug (if it's not obvious)




### 2c. Log output

```
paste terminal output or logs here
```



### 2d. Workaround(s)




### 2e. Relevant links




## 3. Tutorial (minimal steps to reproduce the bug)




Helpful tips

  1. Environment: Please fill out your OS and Caddy versions, even if you don't think they are relevant. (They are always relevant.) If you built Caddy from source, provide the commit SHA and specify your exact Go version.

  2. Description: Describe at a high level what the bug is. What happens? Why is it a bug? Not all bugs are obvious, so convince readers that it's actually a bug.

    • 2c) Log output: Paste terminal output and/or complete logs in a code block. DO NOT REDACT INFORMATION except for credentials.
    • 2d) Workaround: What are you doing to work around the problem in the meantime? This can help others who encounter the same problem, until we implement a fix.
    • 2e) Relevant links: Please link to any related issues, pull requests, docs, and/or discussion. This can add crucial context to your report.
  3. Tutorial: What are the minimum required specific steps someone needs to take in order to experience the same bug? Your goal here is to make sure that anyone else can have the same experience with the bug as you do. You are writing a tutorial, so make sure to carry it out yourself before posting it. Please:

    • Start with an empty config. Add only the lines/parameters that are absolutely required to reproduce the bug.
    • Do not run Caddy inside containers.
    • Run Caddy manually in your terminal; do not use systemd or other init systems.
    • If making HTTP requests, avoid web browsers. Use a simpler HTTP client instead, like curl.
    • Do not redact any information from your config (except credentials). Domain names are public knowledge and often necessary for quick resolution of an issue!
    • Note that ignoring this advice may result in delays, or even in your issue being closed. 😞 Only actionable issues are kept open, and if there is not enough information or clarity to reproduce the bug, then the report is not actionable.

Example of a tutorial:

Create a config file:
{ ... }

Open terminal and run Caddy:

$ caddy ...

Make an HTTP request:

$ curl ...

Notice that the result is ___ but it should be ___.

@mohammed90 mohammed90 added the need info The issue is missing critical information label May 12, 2024
@smaznet
Copy link
Author

smaznet commented May 13, 2024

1. Environment

Ubuntu 22.04.2 LTS

Caddy version:

v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=

config

......
    {
    "listen": [
            ":5000",
            ":2087"
          ],
          "routes": [
            {
              "match": [
                {
                  "tls": {}
                }
              ],
              "handle": [
                {
                  "handler": "tls",
                  "connection_policies": [
                    {
                      "client_authentication": {
                        "mode": "request"
                      }
                    }
                  ]
                },
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "unix//sockets/r-{l4.tls.server_name}.socket"
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "match": [
                {
                  "http": []
                }
              ],
              "handle": [
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "unix//sockets/r-{l4.http.host}.socket"
                      ]
                    }
                  ]
                }
              ]
            }
          ]}
  
    ....

profile.svg

profile

@mohammed90
Copy link
Collaborator

We need the full file, not just a cropped image. Also share the full config.

@smaznet
Copy link
Author

smaznet commented May 13, 2024

Okay my test enviroment
needed tools

OS

ubuntu 22.04

add this line to /etc/hosts

127.0.0.1 sub.dom.com

Haproxy config:

frontend http_front
    bind /tmp/r-sub.dom.com:2087.socket accept-proxy
    default_backend static_backend

backend static_backend
    mode http
    http-request return status 200 content-type text/plain string "Hello World"

Caddy full Config

{
  "logging": {
    "logs": {
      "default": {
        "level": "DEBUG"
      }
    }
  },
  "apps": {
    "tls": {
      "cache": {
        "capacity": 2000
      },
      "certificates": {
        "automate": [
          "sub.dom.com"
        ]
      },
      "automation": {
        "policies": [
          {
            "issuers": [
              {
                "module": "internal",
                "lifetime": "600d"
              }
            ]
          }
        ]
      }
    },
    "layer4": {
      "servers": {
        "ws": {
          "listen": [
            ":5000",
            ":2087"
          ],
          "routes": [
            {
              "match": [
                {
                  "tls": {}
                }
              ],
              "handle": [
                {
                  "handler": "tls",
                  "connection_policies": [
                    {
                      "client_authentication": {
                        "mode": "request"
                      }
                    }
                  ]
                },
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "unix//tmp/r-{l4.tls.server_name}:2087.socket"
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "match": [
                {
                  "http": []
                }
              ],
              "handle": [
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "unix//tmp/r-{l4.http.host}.socket"
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    }
  }
}

Test steps

haproxy -f hp.conf &
caddy run --config test.json

with ssl:

echo "GET https://sub.dom.com:2087" | ./vegeta attack -output /dev/null -rate=0 -max-workers 1000 -insecure

without ssl:

echo "GET http://sub.dom.com:2087" | ./vegeta attack -output /dev/null -rate=0 -max-workers 1000

profiles

NOTE: .gz is fake for upload to github remove this from file name

profile with ssl

profile (7).gz

profile without ssl

profile (6).gz

@smaznet
Copy link
Author

smaznet commented May 13, 2024

We need the full file, not just a cropped image

its not cropped image its svg file open in other tab

@mholt
Copy link
Owner

mholt commented May 13, 2024

I noticed the GitHub app has a hard time with SVGs.

looks like a LOT of time is being spent in syscalls... reading and writing file descriptors (sockets) in the proxy module. I assume these are TCP packets from the client, based on the listen address?

@smaznet If you remove the PROXY protocol config does the CPU usage improve?

@smaznet
Copy link
Author

smaznet commented May 13, 2024

i'll try

@ydylla
Copy link
Collaborator

ydylla commented May 13, 2024

@smaznet What exactly do you mean with high CPU usage? What do you expect as normal usage?
Sending requests as fast as possible from 1000 workers with vegeta will obviously max out your CPU.

Since I was afraid this had something to do with the recent matching rewrite (see #192) I did some own benchmarking. With my favorite testing config 😅 :

prefetch.json
{
  "admin": {
    "disabled": true
  },
  "logging": {
    "logs": {
      "default": {"level":"DEBUG", "encoder": {"format":"console"}}
    }
  },
  "apps": {
    "tls": {
      "certificates": {
        "automate": ["localhost"]
      },
      "automation": {
        "policies": [{
          "subjects": ["localhost"],
          "issuers": [{
            "module": "internal"
          }]
        }]
      }
    },
    "layer4": {
      "servers": {
        "https": {
          "listen": ["0.0.0.0:10443"],
          "routes": [
            {
              "match": [
                {"proxy_protocol": {}}
              ],
              "handle": [
                {
                  "handler": "proxy_protocol",
                  "timeout": "2s",
                  "allow": ["127.0.0.1/32"]
                }
              ]
            },
            {
              "match": [
                {"tls": {"sni": ["localhost"]}}
              ],
              "handle": [
                {"handler": "tls"}
              ]
            },
            {
              "match": [
                {"http": [{"host": ["localhost"]}]}
              ],
              "handle": [
                {
                  "handler": "proxy",
                  "upstreams": [{"dial": ["127.0.0.1:10080"]}]
                }
              ]
            }
          ]
        }
      }
    },
    "http": {
      "servers": {
        "backend": {
          "protocols": ["h1","h2","h2c"],
          "listen": ["127.0.0.1:10080"],
          "routes": [
            {
              "handle": [{
                "handler": "static_response",
                "status_code": "200",
                "body": "Hello World\n",
                "headers": {
                  "Content-Type": ["text/plain"]
                }
              }]
            }
          ]
        }
      }
    }
  }
}

These are the numbers I get on my machine for the commit 145ec36 (before the matching rewrite) and f049165 is the current master:

PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output 145ec3-http-1.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output 145ec3-http-2.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output 145ec3-http-3.bin
PS G:\Temp> .\vegeta.exe report .\145ec3-http-1.bin
Requests      [total, rate, throughput]         9411901, 156865.33, 156855.21
Duration      [total, attack, wait]             1m0s, 1m0s, 3.87ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.102ms, 3.036ms, 1.614ms, 7.674ms, 10.589ms, 17.863ms, 1.149s
Bytes In      [total, mean]                     112942812, 12.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9411901
Error Set:
PS G:\Temp> .\vegeta.exe report .\145ec3-http-2.bin
Requests      [total, rate, throughput]         9527806, 158794.31, 158792.91
Duration      [total, attack, wait]             1m0s, 1m0s, 527µs
Latencies     [min, mean, 50, 90, 95, 99, max]  534.7µs, 1.688ms, 595.513µs, 4.06ms, 6.213ms, 12.558ms, 1.119s
Bytes In      [total, mean]                     114333672, 12.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9527806
Error Set:
PS G:\Temp> .\vegeta.exe report .\145ec3-http-3.bin
Requests      [total, rate, throughput]         9404961, 156749.46, 156749.24
Duration      [total, attack, wait]             1m0s, 1m0s, 84.3µs
Latencies     [min, mean, 50, 90, 95, 99, max]  84.3µs, 3.381ms, 1.777ms, 8.625ms, 11.849ms, 19.253ms, 972.624ms
Bytes In      [total, mean]                     112859532, 12.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9404961
Error Set:
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output f04916-http-1.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output f04916-http-2.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output f04916-http-3.bin
PS G:\Temp> .\vegeta.exe report .\f04916-http-1.bin
Requests      [total, rate, throughput]         9356041, 155928.36, 155920.48
Duration      [total, attack, wait]             1m0s, 1m0s, 3.035ms
Latencies     [min, mean, 50, 90, 95, 99, max]  22.8µs, 1.815ms, 684.44µs, 4.331ms, 6.731ms, 12.387ms, 1.376s
Bytes In      [total, mean]                     112272492, 12.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9356041
Error Set:
PS G:\Temp> .\vegeta.exe report .\f04916-http-2.bin
Requests      [total, rate, throughput]         9228292, 153804.43, 153799.90
Duration      [total, attack, wait]             1m0s, 1m0s, 1.767ms
Latencies     [min, mean, 50, 90, 95, 99, max]  110.4µs, 3.416ms, 1.819ms, 8.547ms, 11.795ms, 19.691ms, 1.148s
Bytes In      [total, mean]                     110739504, 12.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9228292
Error Set:
PS G:\Temp> .\vegeta.exe report .\f04916-http-3.bin
Requests      [total, rate, throughput]         9500202, 158335.41, 158329.55
Duration      [total, attack, wait]             1m0s, 1m0s, 2.22ms
Latencies     [min, mean, 50, 90, 95, 99, max]  606.6µs, 1.762ms, 649.258µs, 4.178ms, 6.483ms, 12.463ms, 1.213s
Bytes In      [total, mean]                     114002424, 12.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9500202
Error Set:
PS G:\Temp>

So just a <1% difference in average throughput. But running caddy and vegeta on the same machine is not very sensible anyways since they fight over cpu time.

Adding sending and receiving of proxy_protocol to the config did not change the numbers significantly. Running with TLS was like expected a bit slower with an average throughput of 130590.14

@mholt
Copy link
Owner

mholt commented May 13, 2024

FWIW I'm also guessing normal operation. Lots of traffic = lots of CPU. 🤷‍♂️

By far the VAST majority of time is in syscalls anyway. Syscalls are out of Caddy's realm.

Repository owner deleted a comment from hndjdjdj Jun 3, 2024
@T4cC0re
Copy link

T4cC0re commented Jun 10, 2024

Hey there!

We have also seen this issue, and when grabbing a profile of the process, it seems a lot of time is being wasted in userspace in the layer4 module.

Screenshot 2024-06-10 at 11 05 25

CPU utilization went from ~1% to ~600%. We have isolated this to be a change bewteen commit c5d815d (good, pre 19:00) and ce9789f (awful, post 19:00)

Screenshot 2024-06-10 at 11 08 05

Please find attached cpu profiles captured (no GC, binary, 60s duration each) during regular traffic periods with Caddy at version 2.8.4 and either commit of the module.

profiles.zip

The following configuration was used: (redacted)

We need to use unix sockets as an intermediate because of a use-case we have.

{
  "logging": {
    "logs": {
      "default": {
        "level": "info",
        "writer": {
          "output": "stdout"
        },
        "encoder": {
          "format": "json"
        }
      }
    }
  },
  "apps": {
    "layer4": {
      "servers": {
        "http": {
          "listen": [
            ":80"
          ],
          "routes": [
            {
              "match": [
                {
                  "http": [
                    {
                      "host": [
                        "gitlab.REDACTED"
                      ]
                    }
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "unix//var/opt/caddy/caddy.gitlab"
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "match": [
                {
                  "http": [
                    {
                      "host": [
                        "registry.REDACTED"
                      ]
                    }
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "unix//var/opt/caddy/caddy.registry"
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        },
        "https": {
          "listen": [
            ":443"
          ],
          "routes": [
            {
              "match": [
                {
                  "tls": {}
                }
              ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "match": [
                        {
                          "tls": {
                            "sni": [
                              "gitlab.REDACTED"
                            ]
                          }
                        }
                      ],
                      "handle": [
                        {
                          "handler": "proxy",
                          "proxy_protocol": "v2",
                          "upstreams": [
                            {
                              "dial": [
                                "unix//var/opt/caddy/caddy.gitlab"
                              ]
                            }
                          ]
                        }
                      ]
                    },
                    {
                      "match": [
                        {
                          "tls": {
                            "sni": [
                              "registry.REDACTED"
                            ]
                          }
                        }
                      ],
                      "handle": [
                        {
                          "handler": "proxy",
                          "proxy_protocol": "v2",
                          "upstreams": [
                            {
                              "dial": [
                                "unix//var/opt/caddy/caddy.registry"
                              ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        },
        "ssh": {
          "listen": [
            ":22"
          ],
          "routes": [
            {
              "handle": [
                {
                  "handler": "proxy",
                  "proxy_protocol": "v2",
                  "upstreams": [
                    {
                      "dial": [
                        "10.0.1.1:2022"
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    },
    "http": {
      "http_port": 80,
      "https_port": 443,
      "servers": {
        "gitlab": {
          "strict_sni_host": true,
          "listen": [
            "unix//var/opt/caddy/caddy.gitlab"
          ],
          "listener_wrappers": [
            {
              "wrapper": "proxy_protocol",
              "timeout": 0
            },
            {
              "wrapper": "http_redirect"
            },
            {
              "wrapper": "tls"
            }
          ],
          "metrics": {},
          "routes": [
            {
              "match": [
                {
                  "host": [
                    "gitlab.REDACTED"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "headers",
                  "response": {
                    "set": {
                      "Server": [
                        "REDACTED"
                      ]
                    }
                  }
                },
                {
                  "encodings": {
                    "gzip": {},
                    "zstd": {}
                  },
                  "handler": "encode",
                  "prefer": [
                    "zstd",
                    "gzip"
                  ]
                },
                {
                  "handler": "reverse_proxy",
                  "load_balancing": {
                    "selection_policy": {
                      "policy": "round_robin"
                    }
                  },
                  "upstreams": [
                    {
                      "dial": "10.0.1.1:8081"
                    }
                  ]
                }
              ],
              "terminal": true
            }
          ]
        },
        "registry": {
          "strict_sni_host": true,
          "listen": [
            "unix//var/opt/caddy/caddy.registry"
          ],
          "listener_wrappers": [
            {
              "wrapper": "proxy_protocol",
              "timeout": 0
            },
            {
              "wrapper": "http_redirect"
            },
            {
              "wrapper": "tls"
            }
          ],
          "metrics": {},
          "routes": [
            {
              "match": [
                {
                  "host": [
                    "registry.REDACTED"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "headers",
                  "response": {
                    "set": {
                      "Server": [
                        "REDACTED"
                      ]
                    }
                  }
                },
                {
                  "encodings": {
                    "gzip": {},
                    "zstd": {}
                  },
                  "handler": "encode",
                  "prefer": [
                    "zstd",
                    "gzip"
                  ]
                },
                {
                  "handler": "reverse_proxy",
                  "load_balancing": {
                    "selection_policy": {
                      "policy": "round_robin"
                    }
                  },
                  "upstreams": [
                    {
                      "dial": "10.0.1.1:18081"
                    }
                  ]
                }
              ],
              "terminal": true
            }
          ]
        }
      }
    },
    "tls": {
      "certificates": {
        "load_folders": [
          "/var/opt/caddy/certificates"
        ]
      },
      "automation": {
        "policies": [
          {
            "on_demand": true,
            "issuers": [
              {
                "email": "REDACTED",
                "module": "acme",
                "challenges": {
                  "http": {
                    "disabled": true
                  },
                  "tls-alpn": {
                    "disabled": false
                  }
                }
              }
            ]
          }
        ],
        "on_demand": {
          "permission": {
            "endpoint": "http://127.0.0.1:2223",
            "module": "http"
          }
        }
      }
    }
  }
}

@mholt
Copy link
Owner

mholt commented Jun 10, 2024

Thanks for the details. So lots of time spent matching TLS in your case.

Does the following PR help? Can you try this branch? #199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need info The issue is missing critical information
Projects
None yet
Development

No branches or pull requests

5 participants