Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added prop_timeouts_enforced #3532

Merged
merged 5 commits into from
Feb 3, 2022
Merged

Conversation

bolt12
Copy link
Contributor

@bolt12 bolt12 commented Dec 7, 2021

This PR adds tests for Connection Manager and Inbound Governor to check if their timeout values are being enforced.

This is important because timeouts are part of the specification and one needs to guarantee we are respecting them within certain margin of error.

@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from f22bc05 to 19823a8 Compare December 16, 2021 11:16
@bolt12 bolt12 requested a review from coot December 16, 2021 11:16
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from 19823a8 to 1c6c416 Compare December 17, 2021 16:21
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from 1c6c416 to c959f27 Compare December 20, 2021 17:25
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from c959f27 to 76b97be Compare January 10, 2022 17:46
@bolt12 bolt12 linked an issue Jan 11, 2022 that may be closed by this pull request
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch 4 times, most recently from be7358c to e0f47cd Compare January 12, 2022 14:21
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from e0f47cd to b128aba Compare January 14, 2022 09:45
@bolt12 bolt12 changed the title Added prop_connection_manager_timeouts_enforced Added prop_timeouts_enforced Jan 14, 2022
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from e0a6d71 to 3172691 Compare January 18, 2022 13:05
@bolt12
Copy link
Contributor Author

bolt12 commented Jan 18, 2022

@coot I refactored the code as in our call. It turns out I ended up needing to be extra explicit on all the states but I think the code turned out better than the one I had!

@@ -92,6 +89,8 @@ writeQueueChannel QueueChannel { qcWrite } msg =
Nothing -> return False
Just q -> writeTQueue q msg
>> case msg of
-- Match SO_LINGER set with 0 interval
-- and close both ends without any handshake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a todo, to simulate a lost MsgClose, this is easy to do: we need to close the write channel, without sending the MsgClose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say in a way we are already simulating this case by writing the message and not waiting on receiving some sort of Ack. Of course since this is a simulation environment we do not actually need to do much because the message doesn't go through the wire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say in a way we are already simulating this case by writing the message and not waiting on receiving some sort of Ack.

That's not right; resting a connection does not make it wait on an ack, the kernel will just forget the state of the connection when it sends the packet with RST option set. If the receiver does not get the packet it will get an error when next time it will try to send a packet (a reset will be send as a reply).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will experiment with simulating a lost message like you suggested and see if that still passes the test. The problem before was that we were waiting for an Ack and that could take very long. Hopefully just closing and losing the MsgClose message will not delay it much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it and the tests are passing 😄 I think now it simulates the LINGER option more faithfully.

@@ -682,8 +687,7 @@ withConnectionManager ConnectionManagerArguments {
-- using 'cancel' here, since we want to block until connection
-- handler thread terminates.
traverse_ cancel (getConnThread connState)
)
state
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it work if we throwTo and then wait for all the threads to terminate; this would avoid calling cancel from a separate thread. The drawback is that if a connection is executing the cleanup handler, it will block for the TIME_WAIT interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throwTo is also possibly blocking right? waiting on the thread to actually receive the exception. So I do not think we would gain much from using throwTo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • cancel blocks until thread terminates
  • throwTo blocks until an exception is raised

so cancel can block for much longer than throwTo, since during TIME_WAIT interval, we will accept asynchronous exceptions (though they will be ignored). This will unblock throwTo, but not cancel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from 3172691 to 14a5304 Compare January 21, 2022 10:57
@bolt12 bolt12 requested a review from coot January 21, 2022 10:57
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch 2 times, most recently from 11a04f0 to 03c9390 Compare January 24, 2022 13:16
@bolt12
Copy link
Contributor Author

bolt12 commented Jan 26, 2022 via email

@bolt12
Copy link
Contributor Author

bolt12 commented Jan 26, 2022 via email

@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from 03c9390 to d44c87f Compare January 26, 2022 10:03
@bolt12 bolt12 requested a review from coot January 26, 2022 10:11
Comment on lines 2700 to 2702
getTraceEvents :: forall a b. Typeable b
=> SimTrace a
-> [(Time, b)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we write this function to return a Trace not a list of events, we will not need a separate splitConns (which is unfortunate, because this means we duplicate critical code for some of the tests).

There's traceSelectEventsDynamic which filters Dynamic values checking their type.
All together transitionSignal (a few lines above) is probably slightly more complex than:

      transitionSignal :: [[(Time, AbstractTransitionTrace SimAddr)]]
      transitionSignal = Trace.toList                      
                       . splitConns
                       . traceSelectTraceEventsDynamic
                       $ trace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The splitConns function that works on Trace does not get us the Time of the event which is needed for this particular test. That's why I duplicated the code. I gave your suggestion a go and it doesn't seem to have that type (it has [[AbstractTransition]] )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we lose the Time info when we go from SimTrace to Trace via traceSelectTraceEventsDynamic, that's why I duplicated the code to write a version that would give me the Time as well.

Copy link
Contributor

@coot coot Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a fix for that by using contra-variance of the tracer. One needs to define

data WithTime a = WithTime Time a

And then use

tracerWithTime :: Tracer m (WithTime a) -> Tracer m a
tracerWithTime (Tracer f) = Tracer $ \a -> WithTime <$> getMonotonicTime <*> f a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, but I am failing to see how that is going to help, multinodeExperiment requires Tracer m (WithName (Name peerAddr) (AbstractTransitionTrace peerAddr)). And we need to extract the time from SimEvent, at least it seems cleaner to extract it from there than from getMonotonicTime

Copy link
Contributor Author

@bolt12 bolt12 Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding

traceSelectTraceEventsTime
    :: (SimEventType -> Maybe b)
    -> SimTrace a
    -> Trace (SimResult a) (Time, b)
traceSelectTraceEventsTime fn = bifoldr ( \ v _acc -> Nil v )
                                        ( \ eventCtx acc
                                         -> case fn (seType eventCtx) of
                                              Nothing -> acc
                                              Just b  -> Cons (seTime eventCtx, b) acc
                                        )
                                        undefined -- it is ignored

to IOSim.hs ?

This would avoid making a convoluted change to this test's code and avoid duplication in this case, and I think it is a nice, justified addition to IOSim module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here something similar is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep io-sim api as small as possible, the other place should also use contravariant tracer to get the time; another thing is that this also works in IO, not just io-sim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid your suggestion does not typecheck :/ Tracer performs an m () action, so all we really can do is to perform getMonotonicTime in order to print it, not add it to WithTime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

λ data WithTime a = WithTime Time a
λ :t contramapM $ \a -> flip WithTime a <$> getMonotonicTime
contramapM $ \a -> WithTime <$> getMonotonicTime <*> pure a
  :: MonadMonotonicTime m => Tracer m (WithTime a) -> Tracer m a

@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from d44c87f to d5b7352 Compare January 28, 2022 11:00
@bolt12 bolt12 requested a review from coot February 1, 2022 17:42
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from 3c58980 to bf01804 Compare February 2, 2022 14:14
Copy link
Contributor

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @bolt12!

Comment on lines 96 to 100
Just q ->
case msg of
MsgClose -> writeTVar qcWrite Nothing
_ -> writeTQueue q msg
>> return True
Copy link
Contributor

@coot coot Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do notation is probably a better choice. >> nicely inlines with ->, I like $> too though some don't (e.g. Duncan).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

This includes changes to the way a node behaves when its InboundGovernor
dies: changed link to linkOnly (const True) to correctly propagate
AsyncCancelled exceptions to the ConnectionManager and thus terminate it
as well.

The ConnectionManager needs to take care of closing the connection
handlers of each connection in a way that does not make us go beyond the
enforced timeouts. Since closing a connection is blocking we need to
mask exceptions during the exception handler and since doing it
sequentially can lead to waits on atomically blocks, we close each
connection in parallel.
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from bf01804 to 10a5400 Compare February 3, 2022 10:30
@bolt12
Copy link
Contributor Author

bolt12 commented Feb 3, 2022

bors merge

iohk-bors bot added a commit that referenced this pull request Feb 3, 2022
3532: Added prop_timeouts_enforced r=bolt12 a=bolt12



Co-authored-by: Armando Santos <[email protected]>
This makes it so splitConns can be reused with any With type wrapper.
@bolt12 bolt12 force-pushed the bolt12/p2p-master-timeouts-enforced branch from 10a5400 to 826f2db Compare February 3, 2022 11:08
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Feb 3, 2022

Canceled.

@bolt12
Copy link
Contributor Author

bolt12 commented Feb 3, 2022

bors merge

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Feb 3, 2022

Build succeeded:

@iohk-bors iohk-bors bot merged commit a3eafb5 into master Feb 3, 2022
@iohk-bors iohk-bors bot deleted the bolt12/p2p-master-timeouts-enforced branch February 3, 2022 11:22
coot pushed a commit that referenced this pull request May 16, 2022
3532: Added prop_timeouts_enforced r=bolt12 a=bolt12



Co-authored-by: Armando Santos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment