Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talkative Mothership Interface #18

Merged
merged 6 commits into from
Jun 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 33 additions & 70 deletions source/mothership_design.md → source/mothership.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ NB: Terminology in this document:
to take people from the past to the present after all. I'll try to be
explicit wherever I use this term.

Figure 1 shows the class structure diagram for the proposed Mothership design.
Figure 1 shows the class structure diagram for the Mothership.

![Mothership class structure diagram](images/mothership_data_structure.png)

Expand Down Expand Up @@ -268,11 +268,11 @@ no other information about the application yet.
| | `soPath` | Also loads the supervisor, and |
| | | provisions its API. |
+-----------------+-----------------------+-----------------------------------+
| `CMND`, `RECL` | 0. `std::string` | Removes information for an |
| | `appName` | application, by name, from the |
| | | Mothership. Does nothing on a |
| | | running application (it must be |
| | | stopped first). |
| `CMND`, `BRKN` | 0. `std::string` | Marks an application as broken, |
| | `appName` | due to an error happening on |
| | | another Mothership. The |
| | | must be recalled and redeployed |
| | | to be used. |
+-----------------+-----------------------+-----------------------------------+
| `CMND`, `INIT` | 0. `std::string` | Takes a fully-defined |
| | `appName` | application (with state |
Expand All @@ -287,6 +287,14 @@ no other information about the application yet.
| | | this message is acted on when it |
| | | reaches that state. |
+-----------------+-----------------------+-----------------------------------+
| `CMND`, `RECL` | 0. `std::string` | Removes information for an |
| | `appName` | application, by name, from the |
| | | Mothership. Does nothing on a |
| | | running application (it must be |
| | | stopped first) or an initialised |
| | | application (it must be started, |
| | | then stopped). |
+-----------------+-----------------------+-----------------------------------+
| `CMND`, `RUN` | 0. `std::string` | Takes an application held at the |
| | `appName` | softswitch barrier (with state |
| | | `READY`, and "starts" it by |
Expand Down Expand Up @@ -334,7 +342,9 @@ and what the Mothership does with those messages.
The Mothership process occasionally also sends messages to the Root
process. Table 2 denotes subkeys of messages that Mothership processes send to
Root, along with their intended use. They're mostly acknowledgements of work
done, and are useful for debugging.
done, and are useful for debugging and logging. Root maintains a data structure
to understand when an application has transitioned from one state to another on
all Motherships.

+-----------------+-----------------------+-----------------------------------+
| Key Permutation | Arguments | Reason |
Expand All @@ -360,12 +370,22 @@ done, and are useful for debugging.
| `STOP` | `appName` | the application has been fully |
| | | stopped. |
+-----------------+-----------------------+-----------------------------------+
| `MSHP`, `ACK`, | 0. `std::string` | Notifies the Root process that |
| `RECL` | `appName` | the application has been |
| | | recalled. |
+-----------------+-----------------------+-----------------------------------+
| `MSHP`, `REQ`, | 0. `std::string` | Requests the Root process to send |
| `STOP` | `appName` | a stop message to all Motherships |
| | | running the application. Used by |
| | | `stop_application` Supervisor API |
| | | call. |
+-----------------+-----------------------+-----------------------------------+
| `MSHP`, `REQ`, | 0. `std::string` | Requests the Root process to send |
| `BRKN` | `appName` | an error message to all |
| | | Motherships hosting the |
| | | application. Is sent when an |
| | | application breaks or errors. |
+-----------------+-----------------------+-----------------------------------+

Table: Output message key permutations that the Mothership process sends to the
Root process, and why.
Expand Down Expand Up @@ -621,21 +641,20 @@ Mothership, as well as external devices elsewhere. They are:

- `SupervisorApi* (*getApi)()`: Used to provision the Supervisor API (see
below).

- `uint64_t (*getAddr)(uint32_t)`: Used to get the full symbolic address
of a device from its Supervisor-unique index, which is sent in the
`pinAddr` field of each log packet.
- `const SupervisorDeviceInstance_t* (*getInstance)(uint32_t)`: Used to

- `const SupervisorDeviceInstance_t* (*getInstance)(uint32_t)`: Used to
get a pointer to the `SupervisorDeviceInstance_t` struct for the device
identified by the specified index. A `SupervisorDeviceInstance_t`
identified by the specified index. A `SupervisorDeviceInstance_t`
contains the address components (and temporarily the name) of a
device.
- `void (*getAddrVector)(std::vector<SupervisorDeviceInstance_t>&)`:

- `void (*getAddrVector)(std::vector<SupervisorDeviceInstance_t>&)`:
Used to populate a vector with a copy of the Supervisor's `DeviceVector`.
This method must be used with care as the `DeviceVector` can be very big.


- Stored in the `SuperDB` object (`Mothership.superdb`) within
`std::map<std::string, SuperHolder> SuperDB.supervisors`, keyed by
Expand Down Expand Up @@ -827,59 +846,3 @@ To follow along, use Figure 2 and the Command and Control section.

4. The `BackendOutputBroker` reads from the `BackendOutputQueue`, and pushes
the packet into the compute backend.

# Appendix B: A Rough Implementation Plan
This Mothership design differs from the existing Mothership in the following
ways:

- Threads are all started at the beginning of the Mothership process
(i.e. before any messages are received), as opposed to the previous
Mothership, which starts Twig (the backend-receiving thread) in
advance. Furthermore, the thread that processes backend packets
(`BackendInputBroker` vs `Twig`) no longer "resolves" the packet, but simply
forwards it on

- MPI messages have different forms and arguments (though backend messages are
the same), and acknowledgement messages are sent back from the Mothership
process to the Root process.

- The data structures for holding application and supervisor information do
not use the hardware model, and are constructed from different pieces of
information.

- Quitting and cleanup operates independently of backend traffic. The proposed
design does not fail to exit or to terminate an application if it is
spamming the Mothership.

- The proposed design introduces intermediate application states for certain
stages, respects deploy and command messages arriving out-of-order (i.e. due
to traffic), and has a simpler state transition mechanism.

- The backend is now loaded independently of Mothership object construction,
and does not prevent the Orchestrator from starting if unable to load.

- Multiple supervisors-per-Mothership (not per application) are now supported,
and communication between supervisors is defined (albeit tenuously for now).

Given these differences, this design will be implemented in the following way,
where each stage represents a reviewable unit (probably by GMB):

1. An implementation of the core constructs. This implementation will be the
minimum possible to replicate the feature set of the existing Mothership as
closely as possible.

2. A performance comparison between the new and old Motherships to check for
regression. One key difference expected is that, since Twig calls
supervisor logic directly, there will be a slightly increased supervisor
message "processing latency", but the backend queue in the new Mothership
will be drained more quickly. Any regression issues will be resolved here.

3. Given that it has been developed, SBase would be integrated next for
external device support. This will require a small refactor of the
application deployment messages (`APP`, `*`) to identify the most efficient
solution.

4. Implementation of the supervisor API.

5. True multibox support, including an explicit Box->Mothership map in Root,
and HostLink considerations.
35 changes: 21 additions & 14 deletions source/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,8 @@ Other components include:
functionality of this clock is to support a rudimentary "delay" command,
which can be used as part of a command batch. This allows the user to stage
a series of packets to be added to the Engine at a given time, to support
controlled "bursts" of activity.
controlled "bursts" of activity. Currently disabled to alleviate compute
load.

- "Injector": The Root component allows the Orchestrator to be controlled by a
batch of commands. The Injector component is a developer tool that supports
Expand Down Expand Up @@ -418,11 +419,10 @@ which will print something like:
Orchestrator processes
Rank 00, Root:OrchBase:CommonBase, created 10:28:19 Apr 16 2020
Rank 01, LogServer:CommonBase, created 10:28:19 Apr 16 2020
Rank 02, RTCL:CommonBase, created 10:28:19 Apr 16 2020
Rank 03, Mothership:CommonBase, created 10:28:19 Apr 16 2020
Rank 02, Mothership:CommonBase, created 10:28:19 Apr 16 2020
~~~

In this case, the Root, RTCL, LogServer, and Mothership components of the
In this case, the Root, LogServer, and Mothership components of the
Orchestrator have been started. Note that all components of the Orchestrator
exist on the same MPI communicator. More information about these processes is
written to the command's microlog.
Expand Down Expand Up @@ -468,6 +468,11 @@ where:
instance name (defined by the `id` attribute in the `GraphInstance`
element).

~~~ {.bash}
POETS> 14:30:36.21: 234(I) Typelinking graph instance 'ring_test_instance'...
POETS> 14:30:36.21: 249(I) Successfully typelinked graph instance 'ring_test_instance'.
~~~

Any typelinking errors are written to the microlog generated by the command.

##### Aside: Tildes in Paths on Unix-likes
Expand Down Expand Up @@ -563,15 +568,18 @@ microlog of the command.
#### Loading binaries into devices for execution, and running the application

With a set of binaries to be loaded onto each core of the POETS engine, the
application can be run. Firstly, stage each binary onto its appropriate core by
commanding:
application can be run. Note that each of the commands in this section stages
the operation with the Mothership process, which will perform the action (and
report back) when the Mothership is not busy. Firstly, stage each binary onto
its appropriate core by commanding:

~~~ {.bash}
deploy /app = *
~~~

Once executed, this command provisions the cores with the binaries. To execute
the binaries on the cores, and to start the supervisor, command:
Once executed, this command stages a deployment of the binaries to the
Mothership, which is handled when the Mothership is ready. To execute the
binaries on the cores, and to start the supervisor, command:

~~~ {.bash}
initialise /app = *
Expand All @@ -584,7 +592,7 @@ commanding:
run /app = *
~~~

will start the application once the cores have been initialised; the
which will start the application once the cores have been initialised; the
application will not start before all cores have been initialised. While they
are running, jobs can be stopped by commanding:

Expand Down Expand Up @@ -687,8 +695,8 @@ configured staging directory (`Output/Composer` in the default configuration).

- `compose /bypass`: Bypasses most of the compose process provided that the
compiled binaries for the application already exist, allowing the operator
to reuse binaries from a previous run or to use binaries compiled elsewhere.
The loaded application must be identical in terms of definition and placement
to reuse binaries from a previous run or to use binaries compiled elsewhere.
The loaded application must be identical in terms of definition and placement
for this to work - there are no checks beyond binary existance.

- `compose /args`: Allows the operator to pass additional arguments to the
Expand Down Expand Up @@ -843,7 +851,8 @@ documentation for a more detailed description of the commands that follow.

- `recall /app`: Given a placed application graph instance (or multiple),
informs all Motherships that host the application to recall it (forget about
it completely), unless it is running (it will need to be stopped first).
it completely), unless it is running (it will need to be stopped first), or
unless it has been initialised (it will need to be started, then stopped).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change this behaviour to allow things at the barrier to respond to KILL packets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #262 so we can discuss further.


## Return (`return`)

Expand Down Expand Up @@ -895,8 +904,6 @@ Lower-level system commands.

## Test (`test`)

For information.

- `test /echo`: Logs and micrologs a message passed as one or more parameters.

## Typelink (`tlink`)
Expand Down