Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request: LP self-suspend functionality #15

Closed
JohnPJenkins opened this issue Oct 29, 2014 · 2 comments
Closed

Enhancement request: LP self-suspend functionality #15

JohnPJenkins opened this issue Oct 29, 2014 · 2 comments
Assignees

Comments

@JohnPJenkins
Copy link

Many times, when developing optimistic models, we are able to determine < LP state, event > pairs which represent infeasible model behavior. These types of simulation states typically arise when time warp causes us to receive and potentially process messages in an order we don't expect.

For example, consider a client/server protocol in which a server sends an ACK to a client upon completion of some event. In optimistic mode, the client can see what amounts to duplicate ACKs from the server due to the server LP rolling back and re-sending an ACK.

While some models can gracefully cope with such issues, more complex models can have troubles (the client in the example could for instance destroy the request metadata after receiving an ACK).

A solution, as noted in the "Dark Side of Risk" paper, is to introduce LP "self-suspend" functionality. If an LP is able to detect a < state, message > pair which is incorrect / unexpected in a well-behaved simulation, the LP should be able to put itself into suspend mode, refusing to process messages until rolled back to a pre < state, message > state. There are two benefits: 1) it greatly reduces the difficulty in tracking down and distinguishing proper model bugs from bugs arising from time-warp related issues such as out-of-order event receipt and 2) it improves simulation performance by pruning the number of processed events that we know are invalid and will be rolled back anyways.

I suggest the function signature tw_suspend(tw_lp *lp, int do_suspend_event_rc, const char * format, ...), with the following semantics:

  • After a call to tw_suspend, all subsequent events (both forward and reverse) that arrive at the suspended LP shall be processed as if they were no-ops. The reverse event handler of the event that caused the suspend will be run if do_orig_event_rc is nonzero; otherwise, the reverse event handler shall additionally be a no-op. Typically, do_orig_event_rc == 0 is desired, as good coding practices for moderate-or-greater complexity simulations dictate state/event validation prior to modifying LP state (partial rollbacks are very undesirable), but there may be messy logic in the user code for which a partial rollback is warranted (operations that free memory as a side effect of operations, for example).
  • An LP exits suspend state upon rolling back the event that caused the suspend (whether or not that event is processed as a no-op).
  • Upon GVT, if an LP is in self-suspend mode and the event that caused the suspend has a timestamp less than that of GVT, then the simulator shall report the format string of suspended LP(s) and exit.
  • A NULL format string is acceptable for performance purposes, e.g. when doing "production" simulation runs.
@MaiaBlanco
Copy link

Just chiming in; I'm currently looking at manually implementing this in the CODES dragonfly model to get around some pesky asserts that are crashing the AspenNet model (built on CODES for the network side of the simulation). This would certainly be helpful and easier to add to models if it were a part of ROSS.

@gonsie
Copy link
Member

gonsie commented Nov 6, 2015

closed by pull request #75

@gonsie gonsie closed this as completed Nov 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants