Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken pipes #382

Closed
rofinn opened this issue Jan 25, 2019 · 18 comments
Closed

Broken pipes #382

rofinn opened this issue Jan 25, 2019 · 18 comments

Comments

@rofinn
Copy link

rofinn commented Jan 25, 2019

Julia 1.0.3
HTTP.jl 0.8.0
MbedTLS.jl 0.6.6

We seem to be be randomly getting a broken pipe error when trying to fetch objects from S3. From what I'm gathering from the traceback the error is occuring when trying to close the SSL connection, so maybe this is issue should be filed against MbedTLS.jl instead?

ERROR: LoadError: IOError: write: broken pipe (EPIPE)
Stacktrace:
 [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./event.jl:196
 [2] wait() at ./event.jl:255
 [3] uv_write(::Sockets.TCPSocket, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:782
 [4] unsafe_write(::Sockets.TCPSocket, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:830
 [5] f_send(::Ptr{Nothing}, ::Ptr{UInt8}, ::UInt64) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:255
 [6] ssl_close_notify(::MbedTLS.SSLContext) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:728
 [7] close(::MbedTLS.SSLContext) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:190
 [8] close at /root/.julia/packages/HTTP/GN0Te/src/ConnectionPool.jl:336 [inlined]
 [9] close(::HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}) at /root/.julia/packages/HTTP/GN0Te/src/ConnectionPool.jl:325
 [10] #request#1(::Nothing, ::Type, ::Int64, ::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:iofunction, :verbose, :require_ssl_verification),Tuple{Nothing,Int64,Bool}}}, ::Function, ::Type{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer}}, ::HTTP.URIs.URI, ::HTTP.Messages.Request, ::String) at /root/.julia/packages/HTTP/GN0Te/src/ConnectionRequest.jl:60
 [11] #request at ./none:0 [inlined]
 [12] #request#1 at /root/.julia/packages/HTTP/GN0Te/src/ExceptionRequest.jl:19 [inlined]
 [13] #request at ./none:0 [inlined]
 [14] #request#1(::VersionNumber, ::String, ::Nothing, ::Nothing, ::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol},NamedTuple{(:verbose, :require_ssl_verification),Tuple{Int64,Bool}}}, ::Function, ::Type{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::String) at /root/.julia/packages/HTTP/GN0Te/src/MessageRequest.jl:47
 [15] (::getfield(HTTP, Symbol("#kw##request")))(::NamedTuple{(:verbose, :require_ssl_verification),Tuple{Int64,Bool}}, ::typeof(HTTP.request), ::Type{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::String) at ./none:0
 [16] http_request(::Dict{Symbol,Any}) at /root/.julia/packages/AWSCore/dlGL3/src/http.jl:36
 [17] macro expansion at /root/.julia/packages/AWSCore/dlGL3/src/AWSCore.jl:415 [inlined]
 [18] macro expansion at /root/.julia/packages/Retry/0jMye/src/repeat_try.jl:192 [inlined]
 [19] do_request(::Dict{Symbol,Any}) at /root/.julia/packages/AWSCore/dlGL3/src/AWSCore.jl:393
 [20] #s3#1(::Dict{String,String}, ::SubString{String}, ::Dict{String,String}, ::String, ::String, ::Bool, ::Bool, ::Function, ::Dict{Symbol,Any}, ::String, ::SubString{String}) at /root/.julia/packages/AWSS3/eYu6G/src/AWSS3.jl:108
 [21] #s3 at ./none:0 [inlined]
 [22] macro expansion at /root/.julia/packages/AWSS3/eYu6G/src/AWSS3.jl:165 [inlined]
 [23] macro expansion at /root/.julia/packages/Retry/0jMye/src/repeat_try.jl:192 [inlined]
 [24] #s3_get#2(::String, ::Bool, ::Bool, ::Function, ::Dict{Symbol,Any}, ::SubString{String}, ::SubString{String}) at /root/.julia/packages/AWSS3/eYu6G/src/AWSS3.jl:163
 [25] s3_get(::Dict{Symbol,Any}, ::SubString{String}, ::SubString{String}) at /root/.julia/packages/AWSS3/eYu6G/src/AWSS3.jl:163
@rofinn
Copy link
Author

rofinn commented Jan 25, 2019

We're also seeing similar errors when calling writing objects.

[warn | CloudWatchLogs]: IOError: write: broken pipe (EPIPE)
try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./event.jl:196
wait() at ./event.jl:255
uv_write(::TCPSocket, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:782
unsafe_write(::TCPSocket, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:830
f_send(::Ptr{Nothing}, ::Ptr{UInt8}, ::UInt64) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:255
ssl_close_notify(::MbedTLS.SSLContext) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:728
close(::MbedTLS.SSLContext) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:190
close at /root/.julia/packages/HTTP/YjRCz/src/ConnectionPool.jl:344 [inlined]
close(::HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}) at /root/.julia/packages/HTTP/YjRCz/src/ConnectionPool.jl:333
#request#1(::Nothing, ::Type, ::Int64, ::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:iofunction, :verbose, :require_ssl_verification),Tuple{Nothing,Int64,Bool}}}, ::Function, ::Type{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer}}, ::HTTP.URIs.URI, ::HTTP.Messages.Request, ::String) at /root/.julia/packages/HTTP/YjRCz/src/ConnectionRequest.jl:56
#request#1(::VersionNumber, ::String, ::Nothing, ::Nothing, ::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol},NamedTuple{(:verbose, :require_ssl_verification),Tuple{Int64,Bool}}}, ::Function, ::Type{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::String) at ./none:0
(::getfield(HTTP, Symbol("#kw##request")))(::NamedTuple{(:verbose, :require_ssl_verification),Tuple{Int64,Bool}}, ::typeof(HTTP.request), ::Type{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::String) at ./none:0
http_request(::Dict{Symbol,Any}) at /root/.julia/packages/AWSCore/dlGL3/src/http.jl:36
macro expansion at /root/.julia/packages/AWSCore/dlGL3/src/AWSCore.jl:415 [inlined]
macro expansion at /root/.julia/packages/Retry/0jMye/src/repeat_try.jl:192 [inlined]
do_request(::Dict{Symbol,Any}) at /root/.julia/packages/AWSCore/dlGL3/src/AWSCore.jl:393
#service_json#9(::Base.Iterators.Pairs{Symbol,Any,NTuple{6,Symbol},NamedTuple{(:service, :version, :json_version, :target, :operation, :args),Tuple{String,String,String,String,String,Base.Iterators.Pairs{Symbol,Any,NTuple{4,Symbol},NamedTuple{(:logEvents, :logGroupName, :logStreamName, :sequenceToken),Tuple{Array{LogEvent,1},String,String,String}}}}}}, ::Function, ::Dict{Symbol,Any}) at /root/.julia/packages/AWSCore/dlGL3/src/AWSCore.jl:262
#service_json at ./none:0 [inlined]
│ IOError: write: broken pipe (EPIPE)
│ Stacktrace:
│ [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./event.jl:196
│ [2] wait() at ./event.jl:255
│ [3] uv_write(::TCPSocket, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:782
│ [4] unsafe_write(::TCPSocket, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:830
│ [5] f_send(::Ptr{Nothing}, ::Ptr{UInt8}, ::UInt64) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:255
│ [6] ssl_write(::MbedTLS.SSLContext, ::Ptr{UInt8}, ::UInt64) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:706
│ [7] ssl_unsafe_write(::MbedTLS.SSLContext, ::Ptr{UInt8}, ::UInt64) at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:225
│ [8] unsafe_write at /root/.julia/packages/MbedTLS/r1Ufc/src/ssl.jl:397 [inlined]
│ [9] unsafe_write(::HTTP.Streams.Stream{HTTP.Messages.Response,HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}}, ::Ptr{UInt8}, ::UInt64) at /root/.julia/packages/HTTP/YjRCz/src/Streams.jl:95
│ [10] unsafe_write at ./io.jl:509 [inlined]
│ [11] macro expansion at ./gcutils.jl:87 [inlined]
│ [12] write at ./io.jl:532 [inlined]
│ [13] writebody(::HTTP.Streams.Stream{HTTP.Messages.Response,HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}}, ::HTTP.Messages.Request, ::Array{UInt8,1}) at /root/.julia/packages/HTTP/YjRCz/src/StreamRequest.jl:100
│ [14] (::getfield(HTTP.StreamRequest, Symbol("##2#3")){HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext},HTTP.Messages.Request,Array{UInt8,1}})() at ./task.jl:259
└ @ HTTP.StreamRequest ~/.julia/packages/HTTP/YjRCz/src/StreamRequest.jl:60

@rofinn rofinn changed the title Broken pipe when closing SSL connections Broken pipes Jan 25, 2019
@rofinn
Copy link
Author

rofinn commented Jan 25, 2019

I feel like HTTP.jl should retry on IOErrors, but I'm not sure at what level the retry should happen. Maybe the HTTP.jl/src/ConnectionRequest.jl and HTTP.jl/src/StreamRequest.jl level?

@iamed2
Copy link
Contributor

iamed2 commented Jan 25, 2019

https://github.com/JuliaWeb/HTTP.jl/blob/master/src/RetryRequest.jl#L34

I believe there are some conditions where it already retries on IOError

@rofinn
Copy link
Author

rofinn commented Jan 25, 2019

I'm a little confused about when RetryLayer is being used though cause it doesn't show up in the stacktraces and I don't see it getting constructed in the HTTP.jl code. Is that something callers are expected to use (e.g., AWSCore.jl)?

@rofinn
Copy link
Author

rofinn commented Jan 25, 2019

Oh, this might be an issue with how AWSCore.jl is using HTTP.jl?

https://github.com/JuliaCloud/AWSCore.jl/blob/master/src/http.jl#L31

@iamed2
Copy link
Contributor

iamed2 commented Jan 25, 2019

The default stack includes a RetryLayer but it can be overridden by a request. If retry=true is passed, it will retry if isrecoverable. AWSCore disables it but also retries on IOError itself: https://github.com/JuliaCloud/AWSCore.jl/blob/ae00b8e7c9b6eefc8f36f1a9285dd49e1d5ce6e8/src/http.jl#L54

@iamed2
Copy link
Contributor

iamed2 commented Jan 25, 2019

It could be that it's not cleaning up bad connections properly and all retries attempt to use the same bad connection from the pool?

@rofinn
Copy link
Author

rofinn commented Jan 30, 2019

Oh, at least 2 of the errors posted are throwing a broken pipe error when closing the connection in a catch. This explains why AWCore isn't retrying the request... it's only retrying on HTTP.IOError and not the Base.IOError that's being thrown inside this block.

...
catch e
        @debug 1 "❗️  ConnectionLayer $e. Closing: $io"
        close(io)
        rethrow(isioerror(e) ? IOError(e, "during request($url)") : e)
...

I feel like close(::MbedTLS.SSLContext) should be able to handle that case with a try/catch and checking for a broken pipe error.

@iamed2
Copy link
Contributor

iamed2 commented Jan 30, 2019

Can you link the location of that block?

@rofinn
Copy link
Author

rofinn commented Jan 30, 2019

@rofinn
Copy link
Author

rofinn commented Jan 30, 2019

Seems like close(::MbedTLS.SSLContext) should just ignore broken pipe errors cause that just means the connection was already closed by the peer.

@rofinn
Copy link
Author

rofinn commented Jan 30, 2019

I'll also note that the StreamRequest code also doesn't wrap the error in an HTTP.IOError, so maybe the easiest solution for now is to simply add Base.IOError to the retry behaviour in AWSCore.jl?

https://github.com/JuliaWeb/HTTP.jl/blob/master/src/StreamRequest.jl#L54

@rofinn
Copy link
Author

rofinn commented Jan 31, 2019

On a related note, it's kinda hard to differentiate Base.IOErrors and HTTP.IOErrors in the logs. Maybe HTTP could include the HTTP. in its show method?

rofinn added a commit to invenia/AWSCore.jl that referenced this issue Jan 31, 2019
HTTP.jl can often get `Base.IOError`s (e.g., broken pipe when closing a connection) which
aren't caught and wrapped in an `HTTP.IOError`.

JuliaWeb/HTTP.jl#382
rofinn added a commit to invenia/AWSCore.jl that referenced this issue Jan 31, 2019
HTTP.jl can often get `Base.IOError`s (e.g., broken pipe when closing a connection) which
aren't caught and wrapped in an `HTTP.IOError`.

JuliaWeb/HTTP.jl#382
rofinn added a commit to invenia/AWSCore.jl that referenced this issue Jan 31, 2019
HTTP.jl can often get `Base.IOError`s (e.g., broken pipe when closing a connection) which
aren't caught and wrapped in an `HTTP.IOError`.

JuliaWeb/HTTP.jl#382
rofinn added a commit to invenia/AWSCore.jl that referenced this issue Jan 31, 2019
HTTP.jl can often get `Base.IOError`s (e.g., broken pipe when closing a connection) which
aren't caught and wrapped in an `HTTP.IOError`.

JuliaWeb/HTTP.jl#382
bors bot added a commit to JuliaCloud/AWSCore.jl that referenced this issue Jan 31, 2019
57: Retry on `Base.IOError`s r=iamed2 a=rofinn

HTTP.jl can often get `Base.IOError`s (e.g., broken pipe when closing a connection) which
aren't caught and wrapped in an `HTTP.IOError`.

JuliaWeb/HTTP.jl#382

Co-authored-by: rofinn <[email protected]>
@quinnj
Copy link
Member

quinnj commented May 31, 2019

So I've discovered a scenario in my own application that leads to these broken pipe errors: if a bunch (purposely a little vague w/ the amount here) of requests are all being attempted simultaneously (i.e. using @async), then we can get into the unfortunate scenario where a request is started, it calls a getconnection method to actually connect a socket to the remote server, doing this IO causes a task switch to other tasks also trying to make requests, then when the task scheduler finally switches back to our request task, it attempts to actually send the request on the socket (i.e. write). Now, this is where things get unfortunate, because depending on how bogged down things are, that task switch away after getconnection and before write can be long enough that our remote server thinks the connection is dead and will close it server side.

I've also seen this in the context of AWS S3 uploading when the getconnection and initial write calls succeed, but then task switching causes too much delay and AWS returns a specific error like "received initial PUT request, but closed due to inactivity".

@quinnj
Copy link
Member

quinnj commented May 31, 2019

But I also appreciate the discussion and investigative work that's been done in this issue so far; I do believe there are some cleanups we can do in MbedTLS/HTTP w/ retry/close behavior here; they tend to be a bit of a red-herring to the actual error going on.

@rofinn
Copy link
Author

rofinn commented May 31, 2019

Would it be possible to catch that case, warn and reschedule the request to run synchronously? Maybe using a lock to avoid context switching between opening the connection and writing the data?

@quinnj
Copy link
Member

quinnj commented Oct 9, 2020

Maybe using a lock to avoid context switching between opening the connection and writing the data?

Part of the really core issue is that there doesn't exist such a "lock" to prevent context switching between @asynced Tasks. But we could potentially try to track the time elapsed between connect and write and include that in automatic retry logic.

I'm starting to mull over some refactoring ideas to try and simplify and clean up all the lower-level code that manages sockets/writing. A lot of that code has grown to be a bit messy over the years and is due for some updating.

@quinnj
Copy link
Member

quinnj commented May 25, 2022

I'm going to close this as I believe the significant refactorings on master have probably resolved this, which touched directly on cleaning up our close/retry/error propagation and better handling of errors in various places. We can always revisit/open new issues once 1.0 is out and if people see related/new issues.

This was referenced Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants