Error writing file: Broken pipe (IO::Error) #9065

jwoertink · 2020-04-14T15:55:59Z

I'm seeing this error in my production Lucky app pretty often. Prior to Crystal 0.34 I would see it maybe a handful of times a day, but after upgrading to Crystal 0.34, we noticed we were seeing it on almost every request coming in.

As per @waj comment you can see the easiest way to get a similar error is from:

$ crystal eval 'puts "hi"' | echo

Unhandled exception: Error writing file: Broken pipe (IO::Error)
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/crystal/system/unix/file_descriptor.cr:82:13 in 'unbuffered_write'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/io/buffered.cr:218:5 in 'flush'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/io/buffered.cr:179:7 in 'write_byte'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/char.cr:775:9 in 'to_s'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/io.cr:175:5 in '<<'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/io.cr:189:5 in 'print'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/io.cr:244:5 in 'puts'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/io.cr:219:5 in 'puts'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/kernel.cr:377:3 in 'puts'
  from /eval:1:1 in '__crystal_main'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/crystal/main.cr:105:5 in 'main_user_code'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/crystal/main.cr:91:7 in 'main'
  from /Users/jeremywoertink/.asdf/installs/crystal/0.34.0/src/crystal/main.cr:114:3 in 'main'

Though, there's several other ways to get a similar error like shown in this thread

crystal eval "1.upto(1000) { |i| puts i }" | head

Or just putting those lines in to a separate crystal file and running it like normal.

The text was updated successfully, but these errors were encountered:

ysbaddaden · 2020-04-14T15:57:10Z

Duplicate of #2713.

waj · 2020-04-14T15:59:15Z

Well... apparently I keep running in circles 😆
#7810 (comment)

waj · 2020-04-14T16:19:47Z

@jwoertink note that this also happens in Ruby. It's just that you need to write more items to trigger:

$ ruby -e "1.upto(100000) { |i| puts i }" | head -n 1
1
-e:1:in `write': Broken pipe @ io_write - <STDOUT> (Errno::EPIPE)
	from -e:1:in `puts'
	from -e:1:in `puts'
	from -e:1:in `block in <main>'
	from -e:1:in `upto'
	from -e:1:in `<main>'

jwoertink · 2020-04-14T16:27:10Z

Ah interesting. So crystal tanks out earlier where Ruby can handle a little bit more it seems. I wonder what node does differently...

This takes a while to run, but it finishes.

node -e "for(let x = 1; x < 1000000; x++) { console.log(x) }" | head

Now, I'm not actually doing any command line piping in Lucky. I am currently using a slightly older version of Dexter which just wraps the old logger. There is a chance that the new logger might fix the issue, but we saw several hours of downtime on our app, and there's a worry that pushing this up could take the app down again.

waj · 2020-04-14T16:30:17Z

You're talking about this error, rigth? https://forum.crystal-lang.org/t/unhandled-exception-error-writing-file-broken-pipe-errno/1396/4

In that case, it's unrelated because it's the TCP socket what is closed. Maybe the client moved to another page already? Any idea what trigger those? Client disconnection should be handled (and ignored) by the HTTP::Server.

jwoertink · 2020-04-14T16:33:16Z

Yes, that error. And actually, in that case, that was me booting my app locally and hitting the home page. I wish I could just say "here's my app!" 😂 Is there a good way I can extract some of this out? I guess I'll try and make a mini http server and logger and see what I can get to happen.

jwoertink · 2020-04-14T16:46:27Z

Ok, so I just checked bugsnag, here's the last error we got which was before dropping back down to 0.33

Errno Error writing to socket: Broken pipe 
    /usr/share/crystal/src/io/evented.cr:156:7 in 'unbuffered_write' <file>
    /usr/share/crystal/src/http/server/response.cr:209:11 in 'unbuffered_write' <file>
    src/handlers/route_handler.cr:19:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next' <file>
    src/handlers/cdn_validator_handler.cr:5:5 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/request_processor.cr:48:11 in 'process' <file>
    /usr/share/crystal/src/fiber.cr:255:3 in 'run' <file>
    ??? <file>

In case that helps any. I think I'm going to give the new logger a try and see if that makes a difference. But as for the original post, this issue is still valid as far as saying there's a crystal compiler issue even though the error is technically valid, right?

asterite · 2020-04-14T16:55:44Z

Could it be that the server is writing the response but the client already closed the connection? I'll try to reproduce that scenario...

waj · 2020-04-14T16:57:21Z

Yes, the error from the compiler should be handled somehow to avoid that nasty output. But it's totally unrelated despite the message similarity.

Regarding your error with the HTTP server, I can see in that backtrace that the exception is an Errno. That doesn't exist anymore in 0.34.

I just tried with this simple server and doing curl and immediately cancel the request. You can see that broken pipe errors are raising but they are handled by the http server.

require "http/server"

server = HTTP::Server.new do |context|
  context.response.content_type = "text/plain"

  10000000.times do |i|
    context.response.print "Hello world! #{i}\n"
  end
rescue e
  puts e.inspect
  raise e
end

address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listen

jwoertink · 2020-04-14T16:59:39Z

Oh yes, you're right. I have several different errors in here with the same error message. This is the right one:

IO::Error Error writing to socket: Broken pipe 
    /usr/share/crystal/src/io/evented.cr:156:7 in 'unbuffered_write' <file>
    /usr/share/crystal/src/http/server/response.cr:216:11 in 'unbuffered_write' <file>
    src/handlers/route_handler.cr:19:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next' <file>
    lib/lucky/src/lucky/error_handler.cr:15:5 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call' <file>
    /usr/share/crystal/src/http/server/request_processor.cr:48:11 in 'process' <file>
    /usr/share/crystal/src/fiber.cr:255:3 in 'run' <file>
    ??? <file>

asterite · 2020-04-14T17:03:37Z

@waj are you able to see the errors? See #9067

waj · 2020-04-14T17:07:11Z

@asterite you're right, I thought HTTP::Server was already silently ignoring client disconnections but it wasn't logging anything at all. Now with the new logger we could test that more effectively.

jwoertink · 2020-04-14T17:09:14Z

I appreciate you guys helping me through all this ❤️

ysbaddaden · 2020-04-14T17:25:51Z

See https://pmhahn.github.io/SIGPIPE/

We want to handle EPIPE where write(2) failed and don't want to deal with a global SIGPIPE, that will interrupt the running thread, then detect which file descriptor failed, ... We also must ignore/handle SIGPIPE, otherwise the program will abort with Program received and didn't handle signal PIPE for example. See:

Signal::PIPE.reset
loop { print "foo\n" }

Nobody wants a HTTP server to exit because a client closed a connection. But we're so used to simple command line tools that merely trap SIGPIPE to exit the program ASAP, that we assume that a failed read/write on STDIN and STDOUT isn't important, but... it's still an error and it must be handled.

Now, the exit on SIGPIPE behavior can be achieved in Crystal with the following:

LibC.signal(LibC::SIGPIPE, ->(signal : Int32) { LibC._exit(0) })
loop { print "foo\n" }

We can't use Signal::PIPE.trap { exit } because we delay signal handlers so the Crystal handler will run on safe Crystal stack, but we still install a handler for SIGPIPE which means the signal would be handled, and write(2) will return an EPIPE before the signal handler is given a chance to run and exit the program. We thus rely on raw C calls to exit immediately (calling exit would be unsafe).

waj · 2020-04-14T18:08:08Z

@ysbaddaden would a Signal::PIPE.trap_and_exit work in that case? For simple CLI tools it might be useful. Of course that would be ungracefully exiting the process so no chance to clean up resources, so I'm not sure it's a good practice anyway. Also, it would be desirable check which fd actually was the one that failed and react only to STDOUT or any other specified.

ysbaddaden · 2020-04-15T09:24:42Z

Maybe something on IO would fit better? Like IO.exit_on_broken_pipe. It's IO related after all, Signal is just a mean to convey the broken pipe. Or maybe STDIN#read, STDOUT#write and STDERR#write could check for EPIPE and exit or raise.

jhass · 2020-04-15T09:50:38Z

Maybe we should have a special exception for EPIPE from STDIN/STDOUT/STDERR which is catched in main? User code could still catch it earlier then to override the default behavior of just exiting. Or maybe main could catch this generally and try to identify it coming from STDIN/STDOUT/STDERR, I guess the exception would need to carry the FD for that.

rdp · 2020-04-21T05:23:35Z

Is there an easy way to repro this failing? (and not failing in 0.33)?

jwoertink · 2020-04-21T15:13:37Z

Not really because it happened in 0.33 too (intermittently). It's just something changed in 0.34 that made it more apparent. I do see it occasionally locally when running Lucky through overmind, but it's very rare.

On a side note, there was an issue in Lucky with the old logger that seems to be fixed with the new logger. I'm going to deploy today to see if, magically, this issue is also fixed 🤷‍♂️

Dan-Do · 2020-09-07T14:48:10Z

Ok, so I just checked bugsnag, here's the last error we got which was before dropping back down to 0.33

Errno Error writing to socket: Broken pipe 
    /usr/share/crystal/src/io/evented.cr:156:7 in 'unbuffered_write' <file>
    /usr/share/crystal/src/http/server/response.cr:209:11 in 'unbuffered_write' <file>
    src/handlers/route_handler.cr:19:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next' <file>
    src/handlers/cdn_validator_handler.cr:5:5 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/handler.cr:26:7 in 'call' <file>
    /usr/share/crystal/src/http/server/request_processor.cr:48:11 in 'process' <file>
    /usr/share/crystal/src/fiber.cr:255:3 in 'run' <file>
    ??? <file>

In case that helps any. I think I'm going to give the new logger a try and see if that makes a difference. But as for the original post, this issue is still valid as far as saying there's a crystal compiler issue even though the error is technically valid, right?

I had the same issue which rarely occurred. Don't know how to re-produce it, but I am pretty sure it happens from version 0.34 and later. Maybe it related to the changes on src/io.cr and src/io/*.cr.

Exception: Error while writing data to the client (HTTP::Server::ClientError)
  from ../../../../../usr/share/crystal/src/http/server/response.cr:215:9 in 'unbuffered_write'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:217:5 in 'flush'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:156:7 in 'write'
  from ../../../../../usr/share/crystal/src/http/server/response.cr:86:7 in 'write'
  from ../../../../../usr/share/crystal/src/io.cr:1120:7 in 'copy'
  from lib/kemal/src/kemal/helpers/helpers.cr:155:7 in 'send_file'
  from lib/kemal/src/kemal/static_file_handler.cr:66:9 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/exception_handler.cr:8:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/log_handler.cr:8:37 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/request_processor.cr:50:11 in 'process'
  from ../../../../../usr/share/crystal/src/http/server.cr:513:5 in 'handle_client'
  from ../../../../../usr/share/crystal/src/http/server.cr:468:13 in '->'
  from ../../../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
  from ../../../../../usr/share/crystal/src/fiber.cr:92:34 in '->'
  from ???
Caused by: Error writing to socket: Broken pipe (IO::Error)
  from ../../../../../usr/share/crystal/src/io/evented.cr:82:13 in 'unbuffered_write'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:152:14 in 'write'
  from ../../../../../usr/share/crystal/src/http/server/response.cr:211:11 in 'unbuffered_write'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:217:5 in 'flush'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:156:7 in 'write'
  from ../../../../../usr/share/crystal/src/http/server/response.cr:86:7 in 'write'
  from ../../../../../usr/share/crystal/src/io.cr:1120:7 in 'copy'
  from lib/kemal/src/kemal/helpers/helpers.cr:155:7 in 'send_file'
  from lib/kemal/src/kemal/static_file_handler.cr:66:9 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/exception_handler.cr:8:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/log_handler.cr:8:37 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/request_processor.cr:50:11 in 'process'
  from ../../../../../usr/share/crystal/src/http/server.cr:513:5 in 'handle_client'
  from ../../../../../usr/share/crystal/src/http/server.cr:468:13 in '->'
  from ../../../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
  from ../../../../../usr/share/crystal/src/fiber.cr:92:34 in '->'
  from ???

jwoertink · 2020-09-07T15:51:23Z

Yeah, I still see the issue in 0.35.1. It's just never consistent enough to really understand what part is causing it. My guess is maybe something with a connection pool. I see you're using kemal @Dan-Do, are you doing any database stuff? Or just straight web requests?

Dan-Do · 2020-09-08T09:09:44Z

Or just straight web requests?

I am using ArangoDB but I don't think that's the cause. This issue only occured to static resource.

Dan-Do · 2020-09-08T11:44:57Z

I wonder if other functions not adapt to the following pull:
#8707

martinetd · 2020-10-04T08:25:29Z

Not sure if that'll be any help, but I also see this often on close's flush on our invidious instance:

Oct  2 08:07:33 invidious invidious[11745]: Unhandled exception in spawn: Error while flushing data to the client (HTTP::Server::ClientError)
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server/response.cr:258:9 in 'unbuffered_flush'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/buffered.cr:219:5 in 'close'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server/request_processor.cr:62:11 in 'process'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server.cr:498:5 in '->'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
Oct  2 08:07:33 invidious invidious[11745]:   from ???
Oct  2 08:07:33 invidious invidious[11745]: Caused by: Error writing to socket: Broken pipe (IO::Error)
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/evented.cr:156:7 in 'unbuffered_write'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/buffered.cr:219:5 in 'unbuffered_flush'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/buffered.cr:219:5 in 'close'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server/request_processor.cr:62:11 in 'process'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server.cr:498:5 in '->'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
Oct  2 08:07:33 invidious invidious[11745]:   from ???

then if it happens enough we somehow run out of fd and also get

Oct  2 17:00:01 invidious invidious[28308]: 2020-10-02T15:00:01.766671Z  ERROR - http.server: Error while connecting a new socket
Oct  2 17:00:01 invidious invidious[28308]: accept: Too many open files (Socket::Error)
Oct  2 17:00:01 invidious invidious[28308]:   from ../../../usr/share/crystal/src/socket.cr:276:17 in 'accept?'
Oct  2 17:00:01 invidious invidious[28308]:   from ../../../usr/share/crystal/src/http/server.cr:459:13 in '->'
Oct  2 17:00:01 invidious invidious[28308]:   from ../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
Oct  2 17:00:01 invidious invidious[28308]:   from ???

As a workaround we restart invidious in a crontab and it's a bit weird, it normally doesn't happen but when it happens we get a lot e.g. with a restart every 10 minutes we get:

root@invidious:/var/log# zcat syslog.1.gz | grep -E '^Oct  2 17' | sed -ne 's/.*\(Broken\|Too many\)/\1/p' | uniq -c
   1609 Too many open files (Socket::Error)
      1 Oct  2 17:00:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1715 Broken pipe (IO::Error)
  36404 Too many open files (Socket::Error)
      1 Oct  2 17:10:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1698 Broken pipe (IO::Error)
 136532 Too many open files (Socket::Error)
      1 Oct  2 17:20:02 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
    893 Broken pipe (IO::Error)
 327483 Too many open files (Socket::Error)
      1 Oct  2 17:30:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   2010 Broken pipe (IO::Error)
      1 Oct  2 17:40:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1488 Broken pipe (IO::Error)
 140047 Too many open files (Socket::Error)
      1 Oct  2 17:50:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1688 Broken pipe (IO::Error)

The numbers make me think it's an attack of sort (I, err, don't think we'd normally get 300k pages opened in 10 minutes for our small instance that keeps breaking down; but it got worse after invidio.us stopped -- I've just enabled access logs so will be able to confirm that soon... It's just too weird we get almost none for a while then a burst of them and restarting doesn't make it stop)

I've tried reproducing and I can semi-reliably get the Broken pipe to happen if I load a page and press escape before it's done loading, so nginx will close the socket while invidious still tries to flush data and we pretty much get what I'd expect to get, I'd say these messages could just be silenced as they likely used to be and that part is an artefact of the log changes.

OTOH the too many open files error makes me think we're leaking a number of fd when this happens, and I'd like to understand why -- in HTTP::Server the 'close' method is just setting a bool so I assume this is supposed to be closed by garbage collection once the server is no longer referenced?
I'm not familiar with the internals but I don't see anything done much different in case of error or normal close?
Could be somewhere else though I'll try to reproduce more reliably once I get access pattern from access logs (so in a few days..)

Thanks!

Dan-Do · 2021-01-03T14:06:01Z

Not sure if that'll be any help, but I also see this often on close's flush on our invidious instance:
Oct  2 08:07:33 invidious invidious[11745]: Unhandled exception in spawn: Error while flushing data to the client (HTTP::Server::ClientError)
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server/response.cr:258:9 in 'unbuffered_flush'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/buffered.cr:219:5 in 'close'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server/request_processor.cr:62:11 in 'process'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server.cr:498:5 in '->'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
Oct  2 08:07:33 invidious invidious[11745]:   from ???
Oct  2 08:07:33 invidious invidious[11745]: Caused by: Error writing to socket: Broken pipe (IO::Error)
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/evented.cr:156:7 in 'unbuffered_write'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/buffered.cr:219:5 in 'unbuffered_flush'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/io/buffered.cr:219:5 in 'close'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server/request_processor.cr:62:11 in 'process'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/http/server.cr:498:5 in '->'
Oct  2 08:07:33 invidious invidious[11745]:   from ../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
Oct  2 08:07:33 invidious invidious[11745]:   from ???
then if it happens enough we somehow run out of fd and also get
Oct  2 17:00:01 invidious invidious[28308]: 2020-10-02T15:00:01.766671Z  ERROR - http.server: Error while connecting a new socket
Oct  2 17:00:01 invidious invidious[28308]: accept: Too many open files (Socket::Error)
Oct  2 17:00:01 invidious invidious[28308]:   from ../../../usr/share/crystal/src/socket.cr:276:17 in 'accept?'
Oct  2 17:00:01 invidious invidious[28308]:   from ../../../usr/share/crystal/src/http/server.cr:459:13 in '->'
Oct  2 17:00:01 invidious invidious[28308]:   from ../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
Oct  2 17:00:01 invidious invidious[28308]:   from ???
As a workaround we restart invidious in a crontab and it's a bit weird, it normally doesn't happen but when it happens we get a lot e.g. with a restart every 10 minutes we get:
root@invidious:/var/log# zcat syslog.1.gz | grep -E '^Oct  2 17' | sed -ne 's/.*\(Broken\|Too many\)/\1/p' | uniq -c
   1609 Too many open files (Socket::Error)
      1 Oct  2 17:00:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1715 Broken pipe (IO::Error)
  36404 Too many open files (Socket::Error)
      1 Oct  2 17:10:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1698 Broken pipe (IO::Error)
 136532 Too many open files (Socket::Error)
      1 Oct  2 17:20:02 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
    893 Broken pipe (IO::Error)
 327483 Too many open files (Socket::Error)
      1 Oct  2 17:30:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   2010 Broken pipe (IO::Error)
      1 Oct  2 17:40:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1488 Broken pipe (IO::Error)
 140047 Too many open files (Socket::Error)
      1 Oct  2 17:50:01 invidious systemd[1]: Started Invidious (An alternative YouTube front-end).
   1688 Broken pipe (IO::Error)
The numbers make me think it's an attack of sort (I, err, don't think we'd normally get 300k pages opened in 10 minutes for our small instance that keeps breaking down; but it got worse after invidio.us stopped -- I've just enabled access logs so will be able to confirm that soon... It's just too weird we get almost none for a while then a burst of them and restarting doesn't make it stop)

I've tried reproducing and I can semi-reliably get the Broken pipe to happen if I load a page and press escape before it's done loading, so nginx will close the socket while invidious still tries to flush data and we pretty much get what I'd expect to get, I'd say these messages could just be silenced as they likely used to be and that part is an artefact of the log changes.

OTOH the too many open files error makes me think we're leaking a number of fd when this happens, and I'd like to understand why -- in HTTP::Server the 'close' method is just setting a bool so I assume this is supposed to be closed by garbage collection once the server is no longer referenced?
I'm not familiar with the internals but I don't see anything done much different in case of error or normal close?
Could be somewhere else though I'll try to reproduce more reliably once I get access pattern from access logs (so in a few days..)

Thanks!

I have the same issue using kemal. How to reproduce (don't know if it works on your environment):

setup local server (build using kemal), then start the process on terminal
open Firefox then go to the home url of your site, for example mine is http://0.0.0.0:8080/
click any link to go to other page, for example http://0.0.0.0:8080/page1
open terminal, press CTLR+C to shutdown your server process
start a new server instance
go to the browser, click the back button to go 1 level of history (ex http://0.0.0.0:8080/)
check to kemal error log

Exception: Error while writing data to the client (HTTP::Server::ClientError)
  from ../../../../../usr/share/crystal/src/http/server/response.cr:216:9 in 'unbuffered_write'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:152:14 in 'write'
  from ../../../../../usr/share/crystal/src/http/server/response.cr:86:7 in 'write'
  from ../../../../../usr/share/crystal/src/io.cr:470:7 in 'write_utf8'
  from ../../../../../usr/share/crystal/src/string.cr:4961:5 in 'to_s'
  from ../../../../../usr/share/crystal/src/io.cr:174:5 in '<<'
  from ../../../../../usr/share/crystal/src/io.cr:188:5 in 'print'
  from lib/kemal/src/kemal/route_handler.cr:55:7 in 'process_request'
  from lib/kemal/src/kemal/route_handler.cr:17:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/websocket_handler.cr:13:14 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/filter_handler.cr:21:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/exception_handler.cr:8:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from ../../../../../usr/share/crystal/src/http/server/handlers/static_file_handler.cr:103:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/null_log_handler.cr:5:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/request_processor.cr:50:11 in 'process'
  from ../../../../../usr/share/crystal/src/http/server.cr:513:5 in 'handle_client'
  from ../../../../../usr/share/crystal/src/http/server.cr:468:13 in '->'
  from ../../../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
  from ../../../../../usr/share/crystal/src/fiber.cr:92:34 in '->'
  from ???
Caused by: Error writing to socket: Broken pipe (IO::Error)
  from ../../../../../usr/share/crystal/src/io/evented.cr:82:13 in 'unbuffered_write'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:152:14 in 'write'
  from ../../../../../usr/share/crystal/src/http/server/response.cr:209:11 in 'unbuffered_write'
  from ../../../../../usr/share/crystal/src/io/buffered.cr:152:14 in 'write'
  from ../../../../../usr/share/crystal/src/http/server/response.cr:86:7 in 'write'
  from ../../../../../usr/share/crystal/src/io.cr:470:7 in 'write_utf8'
  from ../../../../../usr/share/crystal/src/string.cr:4961:5 in 'to_s'
  from ../../../../../usr/share/crystal/src/io.cr:174:5 in '<<'
  from ../../../../../usr/share/crystal/src/io.cr:188:5 in 'print'
  from lib/kemal/src/kemal/route_handler.cr:55:7 in 'process_request'
  from lib/kemal/src/kemal/route_handler.cr:17:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/websocket_handler.cr:13:14 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/filter_handler.cr:21:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/exception_handler.cr:8:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from ../../../../../usr/share/crystal/src/http/server/handlers/static_file_handler.cr:103:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/null_log_handler.cr:5:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from ../../../../../usr/share/crystal/src/http/server/request_processor.cr:50:11 in 'process'
  from ../../../../../usr/share/crystal/src/http/server.cr:513:5 in 'handle_client'
  from ../../../../../usr/share/crystal/src/http/server.cr:468:13 in '->'
  from ../../../../../usr/share/crystal/src/primitives.cr:255:3 in 'run'
  from ../../../../../usr/share/crystal/src/fiber.cr:92:34 in '->'
  from ???

jwoertink · 2021-01-04T16:21:16Z

Good call @Dan-Do it looks like that same basic flow is happening on Lucky too. Which really means it's something in HTTP::Server (or further down) since those are the two commonalities between Kemal and Lucky.

sdogruyol · 2021-04-05T15:30:05Z

Is there any update on this?

jwoertink · 2021-04-05T15:31:56Z

I haven't updated to Crystal 1.0 yet, so it may or may not be fixed, but with 0.36.1 we still see the issue.

SamantazFox · 2021-09-02T13:06:28Z

Bump! On invidious, we simply removed the exception here:
https://github.com/crystal-lang/crystal/blob/6d9a1d583/src/http/server/response.cr#L274-L279

jwoertink · 2021-09-02T15:11:29Z

Using Crystal 1.1.1, and I still see this from time to time. It seems to only happen when a connection has gone stale. Like if I'm sitting on a page on my site, then my session times out, then on my next interaction I'm logged out, redirected and the broken pipe is thrown. It doesn't seem to really affect anything other than noise in the logs though.

jwoertink · 2021-11-04T22:20:05Z

Looks like @BrucePerens may have found the issue in Lucky luckyframework/lucky#1608 ;however, I wonder if just rescuing IO::Error is the right way to go here, or if there's a larger underlying issue within HTTP itself? 🤔

BrucePerens · 2021-11-04T22:46:15Z

If you want to test this, just delay your HTTP response until the browser times out.

straight-shoota · 2022-03-09T12:32:13Z

I've tried to reproduce these conditions with this simple program:

require "http"

channel = Channel(Nil).new

server = HTTP::Server.new do |context|
  channel.receive
  context.response << "foo"
  context.response.flush
  puts "wrote response"
end

address = server.bind_unused_port
puts "Listening on #{address}"

spawn do
  client = HTTP::Client.new(address.address, address.port)
  client.read_timeout = 1.seconds
  response = client.get("/")
  puts response.body
rescue exc
  exc.inspect_with_backtrace(STDERR)
  channel.send(nil)
  sleep 1.seconds
  exit
end

server.listen

It raises a TimeoutError for the client, but there's no server error, even when writing after the client has already disconnected.

jwoertink · 2022-03-09T15:54:01Z

I wonder if it has anything to do with the static file handler 🤔 I can't remember if I've seen it with an API only Lucky app, but I definitely see it all the time on every Lucky app (including the website) that I've worked with. It's for sure a tricky issue.

BrucePerens · 2022-03-09T21:36:00Z

I tried some variations of the @straight-shoota example and could not break it. Let's close this one and investigate more on the Lucky side.

jwoertink · 2022-03-09T21:38:29Z

Yeah, we can investigate more on the Lucky side, though, some of the people in this thread got the error using Kemal, so it doesn't seem to be Lucky specific.

BrucePerens · 2022-03-09T21:44:10Z

The stack trace says it's happening in the Crystal stdlib. We just have to learn the right conditions before we ask any more of the Crystal folks.

martinetd · 2022-03-09T23:35:51Z

The original error doesn't seem to happen anymore.

Your program doesn't try to write to the socket while it's closed (I have no idea how async works in crystal), so I just removed the client and ran curl manually with this, hitting ^c before 2s:

require "http"

channel = Channel(Nil).new

server = HTTP::Server.new do |context|
# channel.receive
  sleep 2.seconds
  context.response.content_type = "text/plain"
  context.response.print "Okay"
  context.response.flush
  puts "wrote response"
rescue exc
  exc.inspect_with_backtrace(STDERR)
  exit
end

address = server.bind_unused_port
puts "Listening on #{address}"

server.listen

At which point the server does get sigpipes writing its response:

[pid 464811] accept(7, NULL, NULL)      = 11
[pid 464811] fcntl(11, F_GETFL)         = 0x2 (flags O_RDWR)
[pid 464811] fcntl(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 464811] accept(7, NULL, NULL)      = -1 EAGAIN (Resource temporarily unavailable)
[pid 464811] epoll_ctl(8, EPOLL_CTL_ADD, 7, {events=EPOLLIN, data={u32=7, u64=7}}) = 0
[pid 464811] recvfrom(11, "GET / HTTP/1.1\r\nHost: 127.0.0.1:"..., 8192, 0, NULL, NULL) = 79
[pid 464811] getpeername(11, {sa_family=AF_INET, sin_port=htons(36706), sin_addr=inet_addr("127.0.0.1")}, [28 => 16]) = 0
[pid 464811] getsockname(11, {sa_family=AF_INET, sin_port=htons(37851), sin_addr=inet_addr("127.0.0.1")}, [28 => 16]) = 0
[pid 464811] epoll_wait(8, [], 32, 380) = 0
[pid 464811] epoll_wait(8, [], 32, 1620) = 0
[pid 464811] sendto(11, "HTTP/1.1 200 OK\r\nConnection: kee"..., 106, 0, NULL, 0) = 106
[pid 464811] write(3, "wrote response", 14) = 14
[pid 464811] write(3, "\n", 1)          = 1
[pid 464811] sendto(11, "0\r\n\r\n", 5, 0, NULL, 0) = -1 EPIPE (Broken pipe)
[pid 464811] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=464811, si_uid=1000} ---
[pid 464811] sendto(11, "0\r\n\r\n", 5, 0, NULL, 0) = -1 EPIPE (Broken pipe)
[pid 464811] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=464811, si_uid=1000} ---
[pid 464811] close(11)                  = 0

But it ignores it (no stack trace printed despite the rescue block)
I might have gotten the syntax error, but from strace in my example a SIGPIPE is produced and there is code somewhere in crystal to handle it.

sdogruyol · 2024-06-03T15:28:54Z

I think this deserves more love, high load Crystal web apps still see this error, here's a comment from Kemal issue list kemalcr/kemal#591 (comment)

straight-shoota · 2024-06-03T16:13:44Z

I believe we can easily ignore errors from broken pipes in some places such as HTTP::Server::Response::Output#unbuffered_write. An HTTP server usually shouldn't care when the client hung up.

That said, it seems we still don't quite understand the exact error conditions. There's no reproduction with plain stdlib, right?

jwoertink · 2024-06-03T16:58:25Z

I've still never been able to reproduce it outside of my app, but on Crystal 1.12 and I still get it. Though, for me I don't need high load to reproduce it locally.

Sija · 2024-06-03T19:58:30Z

I believe it should be enough to close the connection from the client-side before server is able to fulfill the request. You could simulate it with some long-running requests (sleep should be enough), I think.

Dan-Do · 2024-06-04T14:57:27Z

I've still never been able to reproduce it outside of my app, but on Crystal 1.12 and I still get it. Though, for me I don't need high load to reproduce it locally.

It happens 2 times on around 300 requests (static files handler of kemalcr) on my localhost PC. Don't know how to reproduce because it's random.

curioustolearn · 2024-09-04T13:53:32Z

I posted this on Kemal repo. But wanted to post it here too so that it gets more visibility. I am using Crystal 1.13.2.

I ran into this issue today when about 60 people tried to access my server at the same time. Here is the error I saw (multiple times) and the server became unresponsive. I just wanted to share this here. Is there a solution to this? It seems like it is not only a problem with logs getting cluttered with these messages, it also affects the responsiveness of the server.

Thank you.

Sep 03 14:26:02 webapps-server crystalquiz[62307]: Unhandled exception in spawn: Error while flushing data to the client (HTTP::Server::ClientError)
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/http/server/response.cr:328:9 in 'unbuffered_flush'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/io/buffered.cr:251:5 in 'flush'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/io/buffered.cr:258:5 in 'close'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/http/server/response.cr:298:9 in 'close'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/http/server/request_processor.cr:63:11 in 'process'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/http/server.cr:521:5 in 'handle_client'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/http/server.cr:451:5 in '->'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/fiber.cr:143:11 in 'run'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/fiber.cr:95:34 in '->'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from ???
Sep 03 14:26:02 webapps-server crystalquiz[62307]: Caused by: Error writing to socket (#<TCPSocket:0x76ed8e064690>): Broken pipe (IO::Error)
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/io/evented.cr:45:11 in 'write'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/crystal/system/socket.cr:89:5 in 'system_write'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/socket.cr:442:16 in 'unbuffered_write'
Sep 03 14:26:02 webapps-server crystalquiz[62307]:   from /usr/share/crystal/src/io/buffered.cr:250:5 in 'flush'

jwoertink · 2024-09-04T14:38:05Z

Is there a solution to this?

I still get this all the time. I haven't been able to reliably recreate the issue in simple Crystal though. No clue what causes it or how to fix it.

jhass added kind:feature topic:stdlib topic:stdlib:files topic:stdlib:networking labels Apr 19, 2020

jhass mentioned this issue Apr 19, 2020

Broken pipe on WebSocketHandler #9135

Closed

y2k2mt mentioned this issue Apr 20, 2020

Easy way to handle disconnections? y2k2mt/sse.cr#14

Closed

Dan-Do mentioned this issue Jan 7, 2021

send_file HTTP::Server::ClientError kemalcr/kemal#591

Open

unixfox mentioned this issue Jun 30, 2021

"HTTP::Server::ClientError" and "Broken pipe (IO::Error)" spam the log iv-org/invidious#1416

Open

alexAubin mentioned this issue Oct 1, 2021

It stops quite often with Error 502 nginx YunoHost-Apps/invidious_ynh#53

Closed

jwoertink mentioned this issue Nov 4, 2021

Lucky::TextResponse#print raises exception if the client has gone away. luckyframework/lucky#1608

Closed

jwoertink mentioned this issue Nov 12, 2021

Updates to TextReponse#print luckyframework/lucky#1609

Merged

5 tasks

Sija mentioned this issue Jan 26, 2022

"Broken pipe" error on crystal tool hierarchy #11772

Closed

yanecc mentioned this issue May 16, 2023

NotImplementedError of Socket in Win32 #13328

Closed

dentarg mentioned this issue Oct 18, 2023

Error when consuming (Error writing to socket: Broken pipe (IO::Error)) cloudamqp/amqproxy#137

Closed

Sija mentioned this issue Dec 2, 2023

Heroku build fails but continues and succeeds anyways. #14035

Closed

Error writing file: Broken pipe (IO::Error) #9065

Error writing file: Broken pipe (IO::Error) #9065

Comments

jwoertink commented Apr 14, 2020

ysbaddaden commented Apr 14, 2020

waj commented Apr 14, 2020

waj commented Apr 14, 2020

jwoertink commented Apr 14, 2020

waj commented Apr 14, 2020

jwoertink commented Apr 14, 2020

jwoertink commented Apr 14, 2020

asterite commented Apr 14, 2020

waj commented Apr 14, 2020

jwoertink commented Apr 14, 2020

asterite commented Apr 14, 2020

waj commented Apr 14, 2020

jwoertink commented Apr 14, 2020

ysbaddaden commented Apr 14, 2020

waj commented Apr 14, 2020

ysbaddaden commented Apr 15, 2020

jhass commented Apr 15, 2020

rdp commented Apr 21, 2020 • edited Loading

jwoertink commented Apr 21, 2020

Dan-Do commented Sep 7, 2020

jwoertink commented Sep 7, 2020

Dan-Do commented Sep 8, 2020

Dan-Do commented Sep 8, 2020

martinetd commented Oct 4, 2020 • edited Loading

Dan-Do commented Jan 3, 2021

jwoertink commented Jan 4, 2021

sdogruyol commented Apr 5, 2021

jwoertink commented Apr 5, 2021

SamantazFox commented Sep 2, 2021

jwoertink commented Sep 2, 2021

jwoertink commented Nov 4, 2021 • edited Loading

BrucePerens commented Nov 4, 2021

straight-shoota commented Mar 9, 2022

jwoertink commented Mar 9, 2022

BrucePerens commented Mar 9, 2022

jwoertink commented Mar 9, 2022

BrucePerens commented Mar 9, 2022

martinetd commented Mar 9, 2022

sdogruyol commented Jun 3, 2024

straight-shoota commented Jun 3, 2024

jwoertink commented Jun 3, 2024

Sija commented Jun 3, 2024

Dan-Do commented Jun 4, 2024

curioustolearn commented Sep 4, 2024

jwoertink commented Sep 4, 2024

rdp commented Apr 21, 2020 •

edited

Loading

martinetd commented Oct 4, 2020 •

edited

Loading

jwoertink commented Nov 4, 2021 •

edited

Loading