Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Centralized Federated Execution #264

Open
5 tasks
byeonggiljun opened this issue Aug 22, 2023 · 1 comment
Open
5 tasks

Optimizing Centralized Federated Execution #264

byeonggiljun opened this issue Aug 22, 2023 · 1 comment
Assignees
Labels
enhancement Enhancement of existing feature federated

Comments

@byeonggiljun
Copy link
Collaborator

byeonggiljun commented Aug 22, 2023

This issue is inherited from this discussion, mostly from @edwardalee's description. I'm writing this issue to try to make a more readable document as that discussion is very long.

Motivation

image

Suppose that the Sender, which triggers at 100ms intervals, only occasionally sends an output message, say, on average, every few seconds. Currently, Sender sends LTC, ABS, and NET messages every 100 msec, even with no interesting information. In the below visualization (from @ChadliaJerad), the RTI is on the left, the Sender in the middle, and the Receiver on the right. Every message is redundant because Receiver has nothing to do with those messages (its next event tag is 2 sec, which is the timeout value). So if we can eliminate those messages, network overhead can be reduced.

diagram

Solution

A new message type, Next Downstream Tag (NDT) is proposed to resolve this inefficiency. When the RTI receives a NET from a downstream federate, it should notify upstream federates with an NDT message. Federates should maintain a ndt_queue (sorted by tag) that keeps track of NDT messages received from the RTI. Whenever an upstream federate reaches completion of a tag g, it has to check the NDT queue and if there is no output being produced, send an LTC(g) (and NET) *iff g >= peek(ndt_queue).

NET Handling Mechanism

RTI Side

When the RTI receives a NET(g_d), it sends NDT(g_d) to upstream federates that have not yet completed g_d. As a further performance optimization, the RTI may decide to only send NDT to federates that produce a lot of LTC and NET messages without producing output.

Federate Side

When an upstream federate receives an NDT(g_d), it should

  1. Push the tag g_d onto the ndt_queue.
  2. If output is being produced or g_d <= g, send LTC(g_d) and proper NET so that the RTI can give a grant to downstream federates.
  3. Pop the ndt_queue until peek(ndt_queue) > g

A federate doesn’t have to send ABS, NET, or LTC at the tag g if g < peek(ndt_queue). Of course, NET and LTC should be sent if there is any actual output.

Things to discuss

  • How do we efficiently look up which federates to send an NDT to?
  • How can we handle a federation with cyclic dependency between federates? Do we just break the cycle at the point of the sender of the NET in response to which an NDT should be send?

TODOs

RTI

  • Add a command line argument for turning on the NDT messages.
  • When receiving NET at g, send NDT to upstream federates that did not complete the tag g
  • (For further optimization) Discuss how to not send unnecessary NDTs and implement the solution

Federate

  • Create ndt_queue to manage NDTs
  • Eliminating unnecessary NET, LTC, and ABS messages based on information from ndt_queue
@hokeun
Copy link
Member

hokeun commented Aug 22, 2023

Very nice summary, thanks @byeong-gil ! I just have a minor suggestion. How about you add the remaining tasks with checkboxes at the end of the issue description above to keep track of this work for everyone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement of existing feature federated
Projects
None yet
2 participants