You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The barrier release time for the current Mothership/Software barrier is too short. This means that some parts of the application can start running and sending packets before other parts are even started. If this occurs, this results in some packets being thrown away. For an asynchronous application, this is not necessarily a problem as more packets can be emitted. For GALS applications, the loss of a packet causes local synchronisation to be lost and the entire application to freeze.
An application that illustrates this problem: A 19x19 GALS arithmetic grid works on Ayres while a 20x20 does not. Modifying the Softswitch so that "AA" is printed whenever a packet is thrown away in the barrier shows that the latter throws away packets, causing the application to hang. Running in debug mode exacerbates the problem and makes smaller applications fail.
Possible solutions:
Increase softswitch_delay(). This is not sustainable in the long run.
Buffer packets received during the barrier (rather than discard) and then play them back. This is not sustainable and there are too many questions over implementation that works for every application.
Make the softswitches sit at a tinselIdle() call once released so that they only progress as one. Requires Softswitch: Implement Hardware Idle #242 and makes our softswitch/barrier release logic inherently single application (or at least requires all applications to be launched at the same time).
Change the barrier release logic to use the debug UART network rather than the actual network. While it sounds bad, this removes instantiation from the normal network and means that we are not throwing away packets in the barrier. It has the added benefit of falling back to network pushback to stop started parts running ahead too far.
The text was updated successfully, but these errors were encountered:
OK, so there is an initial working version that appears to sort this and makes GALs applications work as expected.
It is on the BUGFIX-0285-HardwareIdleBarrier branch but needs FEATURE-0242-HardwareIdle-Mothership to be merged in locally to actually work. This will be consolidated in the next day or so.
The initial version makes the feature configurable by preprocessor macro. If the Softswitch is built with SOFTSWITCH_HWIDLE_BARRIER defined (which can be defined by calling make with SOFTSWITCH_HWIDLE_BARRIER=1 as an argument), the feature is enabled and a call to tinselIdle() is made in softswitch_delay() rather than using an unoptimised spinner loop.
Currently, Composer force enables this feature. I will add in Composer commands to configure it from the command line.
This is somewhat related to #247.
The barrier release time for the current Mothership/Software barrier is too short. This means that some parts of the application can start running and sending packets before other parts are even started. If this occurs, this results in some packets being thrown away. For an asynchronous application, this is not necessarily a problem as more packets can be emitted. For GALS applications, the loss of a packet causes local synchronisation to be lost and the entire application to freeze.
An application that illustrates this problem: A 19x19 GALS arithmetic grid works on Ayres while a 20x20 does not. Modifying the Softswitch so that "AA" is printed whenever a packet is thrown away in the barrier shows that the latter throws away packets, causing the application to hang. Running in debug mode exacerbates the problem and makes smaller applications fail.
Possible solutions:
softswitch_delay()
. This is not sustainable in the long run.tinselIdle()
call once released so that they only progress as one. Requires Softswitch: Implement Hardware Idle #242 and makes our softswitch/barrier release logic inherently single application (or at least requires all applications to be launched at the same time).The text was updated successfully, but these errors were encountered: