-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added prop_timeouts_enforced #3532
Conversation
944f930
to
f22bc05
Compare
ouroboros-network-framework/test/Test/Ouroboros/Network/Server2.hs
Outdated
Show resolved
Hide resolved
ouroboros-network-framework/test/Test/Ouroboros/Network/Server2.hs
Outdated
Show resolved
Hide resolved
ouroboros-network-framework/test/Test/Ouroboros/Network/Server2.hs
Outdated
Show resolved
Hide resolved
ouroboros-network-framework/test/Test/Ouroboros/Network/Server2.hs
Outdated
Show resolved
Hide resolved
f22bc05
to
19823a8
Compare
19823a8
to
1c6c416
Compare
1c6c416
to
c959f27
Compare
c959f27
to
76b97be
Compare
be7358c
to
e0f47cd
Compare
e0f47cd
to
b128aba
Compare
e0a6d71
to
3172691
Compare
@coot I refactored the code as in our call. It turns out I ended up needing to be extra explicit on all the states but I think the code turned out better than the one I had! |
@@ -92,6 +89,8 @@ writeQueueChannel QueueChannel { qcWrite } msg = | |||
Nothing -> return False | |||
Just q -> writeTQueue q msg | |||
>> case msg of | |||
-- Match SO_LINGER set with 0 interval | |||
-- and close both ends without any handshake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a todo, to simulate a lost MsgClose
, this is easy to do: we need to close the write channel, without sending the MsgClose
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say in a way we are already simulating this case by writing the message and not waiting on receiving some sort of Ack. Of course since this is a simulation environment we do not actually need to do much because the message doesn't go through the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say in a way we are already simulating this case by writing the message and not waiting on receiving some sort of Ack.
That's not right; resting a connection does not make it wait on an ack, the kernel will just forget the state of the connection when it sends the packet with RST
option set. If the receiver does not get the packet it will get an error when next time it will try to send a packet (a reset will be send as a reply).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will experiment with simulating a lost message like you suggested and see if that still passes the test. The problem before was that we were waiting for an Ack and that could take very long. Hopefully just closing and losing the MsgClose message will not delay it much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it and the tests are passing 😄 I think now it simulates the LINGER option more faithfully.
ouroboros-network-framework/src/Ouroboros/Network/ConnectionManager/Core.hs
Outdated
Show resolved
Hide resolved
ouroboros-network-framework/src/Ouroboros/Network/ConnectionManager/Core.hs
Outdated
Show resolved
Hide resolved
@@ -682,8 +687,7 @@ withConnectionManager ConnectionManagerArguments { | |||
-- using 'cancel' here, since we want to block until connection | |||
-- handler thread terminates. | |||
traverse_ cancel (getConnThread connState) | |||
) | |||
state | |||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it work if we throwTo
and then wait for all the threads to terminate; this would avoid calling cancel
from a separate thread. The drawback is that if a connection is executing the cleanup
handler, it will block for the TIME_WAIT
interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throwTo
is also possibly blocking right? waiting on the thread to actually receive the exception. So I do not think we would gain much from using throwTo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancel
blocks until thread terminatesthrowTo
blocks until an exception is raised
so cancel
can block for much longer than throwTo
, since during TIME_WAIT
interval, we will accept asynchronous exceptions (though they will be ignored). This will unblock throwTo
, but not cancel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
3172691
to
14a5304
Compare
11a04f0
to
03c9390
Compare
We are not ignoring state' we are using it to build the trace. That's all
that matters here because the inbound governor is going to die. What we had
previously was a hack and I was building the trace using the empty state,
with this patch it is a cleaner way to do it.
…On Wed, 26 Jan 2022, 08:47 Marcin Szamotulski, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ouroboros-network-framework/src/Ouroboros/Network/InboundGovernor.hs
<#3532 (comment)>
:
> @@ -112,9 +112,10 @@ inboundGovernor trTracer tracer serverControlChannel inboundIdleTimeout
(\(e :: SomeAsyncException) -> do
state <- atomically $ readTVar st
_ <- Map.traverseWithKey
- (\connId _ ->
+ (\connId _ -> do
+ let state' = unregisterConnection connId state
If we want to do that properly, we should thread the state and use
Map.foldrWithKey instead, otherwise ignoring state' is just confusing.
—
Reply to this email directly, view it on GitHub
<#3532 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE2SULLLOHOEBXNHWBPDTELUX6YLDANCNFSM5JRGQCOQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
ouroboros-network-framework/src/Ouroboros/Network/ConnectionManager/Core.hs
Show resolved
Hide resolved
Hmmm right, is not very faithful.
…On Wed, 26 Jan 2022, 08:53 Marcin Szamotulski, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In network-mux/src/Network/Mux/Bearer/AttenuatedChannel.hs
<#3532 (comment)>
:
> @@ -92,6 +89,8 @@ writeQueueChannel QueueChannel { qcWrite } msg =
Nothing -> return False
Just q -> writeTQueue q msg
>> case msg of
+ -- Match SO_LINGER set with 0 interval
+ -- and close both ends without any handshake
I'd say in a way we are already simulating this case by writing the
message and not waiting on receiving some sort of Ack.
That's not right; resting a connection does not make it wait on an ack,
the kernel will just forget the state of the connection when it sends the
packet with RST option set. If the receiver does not get the packet it
will get an error when next time it will try to send a packet (a reset will
be send as a reply).
—
Reply to this email directly, view it on GitHub
<#3532 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE2SULO56E6NRHYQ53XDGLDUX6ZAHANCNFSM5JRGQCOQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
03c9390
to
d44c87f
Compare
getTraceEvents :: forall a b. Typeable b | ||
=> SimTrace a | ||
-> [(Time, b)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we write this function to return a Trace
not a list of events, we will not need a separate splitConns
(which is unfortunate, because this means we duplicate critical code for some of the tests).
There's traceSelectEventsDynamic
which filters Dynamic
values checking their type.
All together transitionSignal
(a few lines above) is probably slightly more complex than:
transitionSignal :: [[(Time, AbstractTransitionTrace SimAddr)]]
transitionSignal = Trace.toList
. splitConns
. traceSelectTraceEventsDynamic
$ trace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The splitConns
function that works on Trace
does not get us the Time of the event which is needed for this particular test. That's why I duplicated the code. I gave your suggestion a go and it doesn't seem to have that type (it has [[AbstractTransition]]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we lose the Time
info when we go from SimTrace
to Trace
via traceSelectTraceEventsDynamic
, that's why I duplicated the code to write a version that would give me the Time
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a fix for that by using contra-variance of the tracer. One needs to define
data WithTime a = WithTime Time a
And then use
tracerWithTime :: Tracer m (WithTime a) -> Tracer m a
tracerWithTime (Tracer f) = Tracer $ \a -> WithTime <$> getMonotonicTime <*> f a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, but I am failing to see how that is going to help, multinodeExperiment
requires Tracer m (WithName (Name peerAddr) (AbstractTransitionTrace peerAddr))
. And we need to extract the time from SimEvent
, at least it seems cleaner to extract it from there than from getMonotonicTime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding
traceSelectTraceEventsTime
:: (SimEventType -> Maybe b)
-> SimTrace a
-> Trace (SimResult a) (Time, b)
traceSelectTraceEventsTime fn = bifoldr ( \ v _acc -> Nil v )
( \ eventCtx acc
-> case fn (seType eventCtx) of
Nothing -> acc
Just b -> Cons (seTime eventCtx, b) acc
)
undefined -- it is ignored
to IOSim.hs
?
This would avoid making a convoluted change to this test's code and avoid duplication in this case, and I think it is a nice, justified addition to IOSim module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that here something similar is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep io-sim
api as small as possible, the other place should also use contravariant tracer to get the time; another thing is that this also works in IO
, not just io-sim
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid your suggestion does not typecheck :/ Tracer
performs an m ()
action, so all we really can do is to perform getMonotonicTime
in order to print it, not add it to WithTime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
λ data WithTime a = WithTime Time a
λ :t contramapM $ \a -> flip WithTime a <$> getMonotonicTime
contramapM $ \a -> WithTime <$> getMonotonicTime <*> pure a
:: MonadMonotonicTime m => Tracer m (WithTime a) -> Tracer m a
d44c87f
to
d5b7352
Compare
3c58980
to
bf01804
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @bolt12!
Just q -> | ||
case msg of | ||
MsgClose -> writeTVar qcWrite Nothing | ||
_ -> writeTQueue q msg | ||
>> return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do
notation is probably a better choice. >>
nicely inlines with ->
, I like $>
too though some don't (e.g. Duncan).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
This includes changes to the way a node behaves when its InboundGovernor dies: changed link to linkOnly (const True) to correctly propagate AsyncCancelled exceptions to the ConnectionManager and thus terminate it as well. The ConnectionManager needs to take care of closing the connection handlers of each connection in a way that does not make us go beyond the enforced timeouts. Since closing a connection is blocking we need to mask exceptions during the exception handler and since doing it sequentially can lead to waits on atomically blocks, we close each connection in parallel.
bf01804
to
10a5400
Compare
bors merge |
3532: Added prop_timeouts_enforced r=bolt12 a=bolt12 Co-authored-by: Armando Santos <[email protected]>
This makes it so splitConns can be reused with any With type wrapper.
10a5400
to
826f2db
Compare
Canceled. |
bors merge |
Build succeeded: |
3532: Added prop_timeouts_enforced r=bolt12 a=bolt12 Co-authored-by: Armando Santos <[email protected]>
This PR adds tests for Connection Manager and Inbound Governor to check if their timeout values are being enforced.
This is important because timeouts are part of the specification and one needs to guarantee we are respecting them within certain margin of error.