diff --git a/source/mothership_design.md b/source/mothership.md similarity index 94% rename from source/mothership_design.md rename to source/mothership.md index c3f7eb7..455faea 100644 --- a/source/mothership_design.md +++ b/source/mothership.md @@ -66,7 +66,7 @@ NB: Terminology in this document: to take people from the past to the present after all. I'll try to be explicit wherever I use this term. -Figure 1 shows the class structure diagram for the proposed Mothership design. +Figure 1 shows the class structure diagram for the Mothership. ![Mothership class structure diagram](images/mothership_data_structure.png) @@ -268,11 +268,11 @@ no other information about the application yet. | | `soPath` | Also loads the supervisor, and | | | | provisions its API. | +-----------------+-----------------------+-----------------------------------+ -| `CMND`, `RECL` | 0. `std::string` | Removes information for an | -| | `appName` | application, by name, from the | -| | | Mothership. Does nothing on a | -| | | running application (it must be | -| | | stopped first). | +| `CMND`, `BRKN` | 0. `std::string` | Marks an application as broken, | +| | `appName` | due to an error happening on | +| | | another Mothership. The | +| | | must be recalled and redeployed | +| | | to be used. | +-----------------+-----------------------+-----------------------------------+ | `CMND`, `INIT` | 0. `std::string` | Takes a fully-defined | | | `appName` | application (with state | @@ -287,6 +287,14 @@ no other information about the application yet. | | | this message is acted on when it | | | | reaches that state. | +-----------------+-----------------------+-----------------------------------+ +| `CMND`, `RECL` | 0. `std::string` | Removes information for an | +| | `appName` | application, by name, from the | +| | | Mothership. Does nothing on a | +| | | running application (it must be | +| | | stopped first) or an initialised | +| | | application (it must be started, | +| | | then stopped). | ++-----------------+-----------------------+-----------------------------------+ | `CMND`, `RUN` | 0. `std::string` | Takes an application held at the | | | `appName` | softswitch barrier (with state | | | | `READY`, and "starts" it by | @@ -334,7 +342,9 @@ and what the Mothership does with those messages. The Mothership process occasionally also sends messages to the Root process. Table 2 denotes subkeys of messages that Mothership processes send to Root, along with their intended use. They're mostly acknowledgements of work -done, and are useful for debugging. +done, and are useful for debugging and logging. Root maintains a data structure +to understand when an application has transitioned from one state to another on +all Motherships. +-----------------+-----------------------+-----------------------------------+ | Key Permutation | Arguments | Reason | @@ -360,12 +370,22 @@ done, and are useful for debugging. | `STOP` | `appName` | the application has been fully | | | | stopped. | +-----------------+-----------------------+-----------------------------------+ +| `MSHP`, `ACK`, | 0. `std::string` | Notifies the Root process that | +| `RECL` | `appName` | the application has been | +| | | recalled. | ++-----------------+-----------------------+-----------------------------------+ | `MSHP`, `REQ`, | 0. `std::string` | Requests the Root process to send | | `STOP` | `appName` | a stop message to all Motherships | | | | running the application. Used by | | | | `stop_application` Supervisor API | | | | call. | +-----------------+-----------------------+-----------------------------------+ +| `MSHP`, `REQ`, | 0. `std::string` | Requests the Root process to send | +| `BRKN` | `appName` | an error message to all | +| | | Motherships hosting the | +| | | application. Is sent when an | +| | | application breaks or errors. | ++-----------------+-----------------------+-----------------------------------+ Table: Output message key permutations that the Mothership process sends to the Root process, and why. @@ -621,21 +641,20 @@ Mothership, as well as external devices elsewhere. They are: - `SupervisorApi* (*getApi)()`: Used to provision the Supervisor API (see below). - + - `uint64_t (*getAddr)(uint32_t)`: Used to get the full symbolic address of a device from its Supervisor-unique index, which is sent in the `pinAddr` field of each log packet. - - - `const SupervisorDeviceInstance_t* (*getInstance)(uint32_t)`: Used to + + - `const SupervisorDeviceInstance_t* (*getInstance)(uint32_t)`: Used to get a pointer to the `SupervisorDeviceInstance_t` struct for the device - identified by the specified index. A `SupervisorDeviceInstance_t` + identified by the specified index. A `SupervisorDeviceInstance_t` contains the address components (and temporarily the name) of a device. - - - `void (*getAddrVector)(std::vector&)`: + + - `void (*getAddrVector)(std::vector&)`: Used to populate a vector with a copy of the Supervisor's `DeviceVector`. This method must be used with care as the `DeviceVector` can be very big. - - Stored in the `SuperDB` object (`Mothership.superdb`) within `std::map SuperDB.supervisors`, keyed by @@ -827,59 +846,3 @@ To follow along, use Figure 2 and the Command and Control section. 4. The `BackendOutputBroker` reads from the `BackendOutputQueue`, and pushes the packet into the compute backend. - -# Appendix B: A Rough Implementation Plan -This Mothership design differs from the existing Mothership in the following -ways: - - - Threads are all started at the beginning of the Mothership process - (i.e. before any messages are received), as opposed to the previous - Mothership, which starts Twig (the backend-receiving thread) in - advance. Furthermore, the thread that processes backend packets - (`BackendInputBroker` vs `Twig`) no longer "resolves" the packet, but simply - forwards it on - - - MPI messages have different forms and arguments (though backend messages are - the same), and acknowledgement messages are sent back from the Mothership - process to the Root process. - - - The data structures for holding application and supervisor information do - not use the hardware model, and are constructed from different pieces of - information. - - - Quitting and cleanup operates independently of backend traffic. The proposed - design does not fail to exit or to terminate an application if it is - spamming the Mothership. - - - The proposed design introduces intermediate application states for certain - stages, respects deploy and command messages arriving out-of-order (i.e. due - to traffic), and has a simpler state transition mechanism. - - - The backend is now loaded independently of Mothership object construction, - and does not prevent the Orchestrator from starting if unable to load. - - - Multiple supervisors-per-Mothership (not per application) are now supported, - and communication between supervisors is defined (albeit tenuously for now). - -Given these differences, this design will be implemented in the following way, -where each stage represents a reviewable unit (probably by GMB): - - 1. An implementation of the core constructs. This implementation will be the - minimum possible to replicate the feature set of the existing Mothership as - closely as possible. - - 2. A performance comparison between the new and old Motherships to check for - regression. One key difference expected is that, since Twig calls - supervisor logic directly, there will be a slightly increased supervisor - message "processing latency", but the backend queue in the new Mothership - will be drained more quickly. Any regression issues will be resolved here. - - 3. Given that it has been developed, SBase would be integrated next for - external device support. This will require a small refactor of the - application deployment messages (`APP`, `*`) to identify the most efficient - solution. - - 4. Implementation of the supervisor API. - - 5. True multibox support, including an explicit Box->Mothership map in Root, - and HostLink considerations. diff --git a/source/user_guide.md b/source/user_guide.md index bca05b7..9bf2f01 100644 --- a/source/user_guide.md +++ b/source/user_guide.md @@ -103,7 +103,8 @@ Other components include: functionality of this clock is to support a rudimentary "delay" command, which can be used as part of a command batch. This allows the user to stage a series of packets to be added to the Engine at a given time, to support - controlled "bursts" of activity. + controlled "bursts" of activity. Currently disabled to alleviate compute + load. - "Injector": The Root component allows the Orchestrator to be controlled by a batch of commands. The Injector component is a developer tool that supports @@ -418,11 +419,10 @@ which will print something like: Orchestrator processes Rank 00, Root:OrchBase:CommonBase, created 10:28:19 Apr 16 2020 Rank 01, LogServer:CommonBase, created 10:28:19 Apr 16 2020 -Rank 02, RTCL:CommonBase, created 10:28:19 Apr 16 2020 -Rank 03, Mothership:CommonBase, created 10:28:19 Apr 16 2020 +Rank 02, Mothership:CommonBase, created 10:28:19 Apr 16 2020 ~~~ -In this case, the Root, RTCL, LogServer, and Mothership components of the +In this case, the Root, LogServer, and Mothership components of the Orchestrator have been started. Note that all components of the Orchestrator exist on the same MPI communicator. More information about these processes is written to the command's microlog. @@ -468,6 +468,11 @@ where: instance name (defined by the `id` attribute in the `GraphInstance` element). +~~~ {.bash} +POETS> 14:30:36.21: 234(I) Typelinking graph instance 'ring_test_instance'... +POETS> 14:30:36.21: 249(I) Successfully typelinked graph instance 'ring_test_instance'. +~~~ + Any typelinking errors are written to the microlog generated by the command. ##### Aside: Tildes in Paths on Unix-likes @@ -563,15 +568,18 @@ microlog of the command. #### Loading binaries into devices for execution, and running the application With a set of binaries to be loaded onto each core of the POETS engine, the -application can be run. Firstly, stage each binary onto its appropriate core by -commanding: +application can be run. Note that each of the commands in this section stages +the operation with the Mothership process, which will perform the action (and +report back) when the Mothership is not busy. Firstly, stage each binary onto +its appropriate core by commanding: ~~~ {.bash} deploy /app = * ~~~ -Once executed, this command provisions the cores with the binaries. To execute -the binaries on the cores, and to start the supervisor, command: +Once executed, this command stages a deployment of the binaries to the +Mothership, which is handled when the Mothership is ready. To execute the +binaries on the cores, and to start the supervisor, command: ~~~ {.bash} initialise /app = * @@ -584,7 +592,7 @@ commanding: run /app = * ~~~ -will start the application once the cores have been initialised; the +which will start the application once the cores have been initialised; the application will not start before all cores have been initialised. While they are running, jobs can be stopped by commanding: @@ -687,8 +695,8 @@ configured staging directory (`Output/Composer` in the default configuration). - `compose /bypass`: Bypasses most of the compose process provided that the compiled binaries for the application already exist, allowing the operator - to reuse binaries from a previous run or to use binaries compiled elsewhere. - The loaded application must be identical in terms of definition and placement + to reuse binaries from a previous run or to use binaries compiled elsewhere. + The loaded application must be identical in terms of definition and placement for this to work - there are no checks beyond binary existance. - `compose /args`: Allows the operator to pass additional arguments to the @@ -843,7 +851,8 @@ documentation for a more detailed description of the commands that follow. - `recall /app`: Given a placed application graph instance (or multiple), informs all Motherships that host the application to recall it (forget about - it completely), unless it is running (it will need to be stopped first). + it completely), unless it is running (it will need to be stopped first), or + unless it has been initialised (it will need to be started, then stopped). ## Return (`return`) @@ -895,8 +904,6 @@ Lower-level system commands. ## Test (`test`) -For information. - - `test /echo`: Logs and micrologs a message passed as one or more parameters. ## Typelink (`tlink`)