Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http server hangs after ~16,400 connections #66

Closed
gopherbot opened this issue Nov 11, 2009 · 18 comments
Closed

http server hangs after ~16,400 connections #66

gopherbot opened this issue Nov 11, 2009 · 18 comments

Comments

@gopherbot
Copy link
Contributor

by andy.gayton:

What steps will reproduce the problem?

1. Start the example hello world web server:

package main

import (
    "http";
    "io";
)

// hello world, the web server
func HelloServer(c *http.Conn, req *http.Request) {
    io.WriteString(c, "hello, world!\n");
}

func main() {
    http.Handle("/hello", http.HandlerFunc(HelloServer));
    err := http.ListenAndServe(":12345", nil);
    if err != nil {
        panic("ListenAndServe: ", err.String())
    }
}

2. Run apache bench against the server:

ab -n 100000 -c 1 http://localhost:12345/hello

What is the expected output? What do you see instead?

On my laptop, the server hangs around 16400 requests:

andy@romana:~$ ab -n 100000 -c 1 http://localhost:12345/hello
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
apr_poll: The timeout specified has expired (70007)
Total of 16387 requests completed

The server doesn't need to be restarted, to repeat another attempt, which
will also hang around 16400 requests.

What is your $GOOS?  $GOARCH?

andy@romana:~$ echo $GOOS $GOARCH
darwin 386

Which revision are you sync'ed to?  (hg log -l 1)

andy@romana:~/go$ hg log -l 1
changeset:   3952:64e703cb307d
tag:         tip
user:        Russ Cox <[email protected]>
date:        Tue Nov 10 14:09:01 2009 -0800
summary:     update video links
@rsc
Copy link
Contributor

rsc commented Nov 11, 2009

Comment 1:

Status changed to Accepted.

@gopherbot
Copy link
Contributor Author

Comment 2 by roblesjm:

I made a simple web server benchmark ( http://bit.ly/2pzAFC ) 
I have no problems with GoLang version using 100.000 requests with the Apache Web 
server benchmark tool, however the python version returned "Timeout"
around the 47.000 request.

@gopherbot
Copy link
Contributor Author

Comment 3 by gill.naina:

i want to try web development with google's go language
can i get some help how to start?

@gopherbot
Copy link
Contributor Author

Comment 4 by andy.gayton:

interesting roblesjm.  i retried the bench on my linux laptop ubuntu hardy, and there
were no problems.
i also fired up go on an ec2 instance, to be able to profile with reproduceable
results - also no problems with regards to the server hanging. 
for interest, running two instances of the go example hello world http server on an
EC2 High-CPU Medium Instance, and running 2 instances of apache bench from a 2nd
High-CPU Medium Instance was able to put through around 10-11k requests per second.
in comparison, running a hello world wsgi server behind spawning/eventlet with 2
process, 0 threads (greenlets) was able to put through 3500 requests per second.
so the hanging may either just be an issue on darwin, .. or only an issue on my
macbook :)

@gopherbot
Copy link
Contributor Author

Comment 5 by mike.kinney:

I've been playing with different values of "n" arg for ab. I've had it work sometimes
for 20,000, but not for values over that. When if fails, it shows a couple of
different values for different runs:  
apr_poll: The timeout specified has expired (70007)
Total of 16385 requests completed
apr_poll: The timeout specified has expired (70007)
Total of 16384 requests completed
apr_poll: The timeout specified has expired (70007)
Total of 16386 requests completed
Found this page:
http://serverfault.com/questions/10852/what-limits-the-maximum-number-of-connections-on-a-linux-server
But, don't know how to tune either variable in osx:
mike-kinneys-macbook-pro:runtime mikekinney$ sysctl -A 2>&1 | grep tcp_max_orphans
mike-kinneys-macbook-pro:runtime mikekinney$ sysctl -A 2>&1 | grep tcp_tw_reuse
I've tried running ab with -k option:
ab -k -n 20000 -c 1 http://localhost:12345/hello
but it still shows:
apr_poll: The timeout specified has expired (70007)
Total of 16386 requests completed
It might be a matter of dealing "keep-alive"... don't know.
See: http://www.mail-archive.com/dev@couchdb.apache.org/msg05082.html 
Hope that helps. Let me know if there's anything I can do to help. (dbg, log, etc)
echo $GOOS $GOARCH
darwin 386
mike-kinneys-macbook-pro:mike mikekinney$ hg log -l 1
changeset:   4037:cd0140653802
tag:         tip
user:        David Titarenco <[email protected]>
date:        Fri Nov 13 18:06:47 2009 -0800
summary:     Created new Conn.Flush() public method so the fd pipeline can be drained
arbitrarily by the user.
osx 10.6.2
mike-kinneys-macbook-pro:http mikekinney$ gcc --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1)

@wcn3
Copy link

wcn3 commented Nov 14, 2009

Comment 6:

When I modify the HelloServer function to specify a header using
c.SetHeader("Content-Type", "text/plain; charset=utf-8");
the problem goes away on my MacBook Pro.  If I remove this line, it fails as described
in 
the initial bug.

@rsc
Copy link
Contributor

rsc commented Nov 15, 2009

Comment 7:

Status changed to LongTerm.

@rsc
Copy link
Contributor

rsc commented Nov 15, 2009

Comment 8:

Owner changed to [email protected].

@rsc
Copy link
Contributor

rsc commented Nov 15, 2009

Comment 9:

Status changed to Accepted.

@wcn3
Copy link

wcn3 commented Nov 16, 2009

Comment 10:

This issue is caused by the OS running out of sockets.  ab and Go are cycling through 
socket pairs for communication faster than the OS can reallocate them for reuse.
While Go supports keeping the connection alive using HTTP/1.1 techniques, the ab 
binary uses a different technique.  This results in the -k flag for ab having no affect.
The ab man page recommends running the binary on a different machine that the 
server being tested.  This would reduce the rate of socket consumption by 50% and 
allow the OS to return sockets to the server.

@rsc
Copy link
Contributor

rsc commented Nov 17, 2009

Comment 11:

Thanks for diagnosing this.  Is there something Go should be doing differently?

@wcn3
Copy link

wcn3 commented Nov 17, 2009

Comment 12:

I'm not sure.  Go does support keepalive using HTTP 1.1, but ab is sending HTTP 1.0 in 
its GET requests.  I'm not familiar enough with HTTP to understand how the two differ.  
As a quick experiment, I hacked http/server.go to try and use the HTTP 1.1 keepalive 
code anways, but that didn't work.  If you want me to work on this, I'm happy to do so.

@rsc
Copy link
Contributor

rsc commented Nov 17, 2009

Comment 13:

We're always happy to accept help.  I wondered if it was
something as simple as the HTTP server not closing its
file descriptors, but it seems like you'd run out much
earlier.  It could be that we need to set the SO_REUSEPORT
or some such option on some such socket, or set the
linger time to 0, or something like that.

@wcn3
Copy link

wcn3 commented Nov 18, 2009

Comment 14:

The only change I was able to make that had any affect was to compile the 'ab' binary
such that it set 
SO_LINGER with a linger time of 0 seconds.  At that point, I got these results.
Server Software:        
Server Hostname:        127.0.0.1
Server Port:            12345
Document Path:          /hello
Document Length:        16 bytes
Concurrency Level:      1
Time taken for tests:   27.913 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      7600000 bytes
HTML transferred:       1600000 bytes
Requests per second:    3582.62 [#/sec] (mean)
Time per request:       0.279 [ms] (mean)
Time per request:       0.279 [ms] (mean, across all concurrent requests)
Transfer rate:          265.90 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      14
Processing:     0    0   0.6      0      27
Waiting:        0    0   0.6      0      20
Total:          0    0   0.6      0      41
Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      1
 100%     41 (longest request)
Even having a socket linger time of 1 second was sufficient to trigger the socket
exhaustion problem.
No changes were necessary on the Go side.  Modified Go network code with explicit socket
shutdown() calls 
and setting SO_LINGER and SO_REUSEADDR had no effect on performance.

@rsc
Copy link
Contributor

rsc commented Nov 18, 2009

Comment 15:

Sounds like this is an ab problem and not Go's fault, then.
Marking as WontFix.
Thanks for investigating, Bill.

Status changed to WontFix.

@gopherbot
Copy link
Contributor Author

Comment 16 by [email protected]:

I'm not sure if I'm seeing something related to this. But here's what I'm finding... 
Let me know if I should open a new issue.
I'm using a simple web serve app..
package main
import (
  "encoding/json"
  "net/http"
)
// structs
type Reading struct {
  Id   string `json:"id"`
  Name string `json:"name"`
}
func main() {
http.HandleFunc("/machines/", func(w http.ResponseWriter, r *http.Request) {
  // Setup readings
  readings := prepareReadings()
  // return readings
  w.Write([]byte(readingsToString(readings)))
})
  http.ListenAndServe(":3000", nil)
}
func readingsToString(readings []Reading) string {
  data, err := json.Marshal(readings)
  if err != nil {
    panic(err)
  }
  return string(data)
}
func prepareReadings() []Reading {
  var readings []Reading
  for i := 1; i <= 1; i++ {
    readings = append(readings, Reading{Name: "Thing"})
  }
  return readings
}
As you can see not much to it.  I've setup multiple load generation servers that are
separate from the web server itself.  So in total I have 17 machines.  1 web server, and
16 load generation servers.  On the load generation servers I am using siege, not ab.  
Running this command on all servers: siege -v "http://192.168.122.31:3000/machines/
POST" -c 500 -r 100 -b
Causes me to start getting connection timed out messages.
My file descriptor limits for the web server are pretty high...
[api #3312 -- limits]
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             59479                59479                processes
Max open files            4999999              4999999              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       59479                59479                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
When I use the command 'lsof | wc -l', I dont' get above 1000. Generally in the ~800-850
range.
When I use the command 'watch --interval=2 'netstat -tuna |grep "SYN_RECV"|wc -l'', I am
generally in the ~130-250 range.
I'm not sure if this is related, or possibly a problem with siege at this point.
Any advice?

@gopherbot
Copy link
Contributor Author

Comment 17 by UQadri:

This is an old thread by following are my findings for whatever it is worth.
From what I found out this seems to be a MacOSx problem. See the following thread for
information 
http://stackoverflow.com/questions/1216267/ab-program-freezes-after-lots-of-requests-why
This is from the above thread:
On Mac OS X the default ephemeral port range is 49152 to 65535, for a total of 16384
ports. You can check this with the sysctl command:
$ sysctl net.inet.ip.portrange.first net.inet.ip.portrange.last
net.inet.ip.portrange.first: 49152
net.inet.ip.portrange.last: 65535
Changing this configuration to start from 32768 helped
$ sudo sysctl -w net.inet.ip.portrange.first=32768
net.inet.ip.portrange.first: 49152 -> 32768
Another configuration that I changed was for default timeout
Set the default timeout to 1000ms like so:
$ sudo sysctl -w net.inet.tcp.msl=1000
net.inet.tcp.msl: 15000 -> 1000
After changing above configuration following are the results 
Server Software:        
Server Hostname:        localhost
Server Port:            4000
Document Path:          /
Document Length:        12 bytes
Concurrency Level:      1
Time taken for tests:   17.005 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      14800000 bytes
HTML transferred:       1200000 bytes
Requests per second:    5880.58 [#/sec] (mean)
Time per request:       0.170 [ms] (mean)
Time per request:       0.170 [ms] (mean, across all concurrent requests)
Transfer rate:          849.93 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      12
Processing:     0    0   0.0      0       1
Waiting:        0    0   0.0      0       1
Total:          0    0   0.1      0      12
Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%     12 (longest request)

@davecheney
Copy link
Contributor

Comment 18:

Thank you for your investigation, please do not continue to comment on this issue, it
has been closed for nearly four years.
I suggest starting a new thread on the golang-nuts mailing list as more people read that
than are subscribed to the issue comments.

Labels changed: added restrict-addissuecomment-commit.

@golang golang locked and limited conversation to collaborators Dec 8, 2014
minux added a commit to minux/goios that referenced this issue Mar 2, 2015
…(SB), Rx

This is the first step in fixing golang#66 and make it possible to build PIE
without text relocations (Darwin/ARM64 forbids text relocations.)

Internal linking is fully working (with verifyAsm=false), but external
linking is not.
minux added a commit to minux/goios that referenced this issue Mar 2, 2015
1. all.bash passed with internal linking
2. misc/cgo/test passed with external linking
3. GOOBJ=2 go install std passed.

Fixes golang#66.
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants