-
Notifications
You must be signed in to change notification settings - Fork 9
Epoch Allocation And Scoping
The current epoch allocation/bit layout follows this pattern:
Baseline epoch Layout:
w-1 .............. w-h-1 ...............w-h-c-1 ....................0
| <EpochHeader> ... | <EpochCategory> ... | <SeqEpochID> |
where h = epoch_header_num_bits (~2 bits)
c = epoch_category_num_bits (1 bit)
w = sizeof(EpochType) * 8 (i.e., 64-bit/int64_t field)
Rooted epoch extension breakdown:
w-h-c-1 w-h-c-n-1 0
^ ^ ^
| .... n .... | ..........|
<NodeType> <SeqEpochID>
where n = sizeof(NodeType) (configurable, default 16 bits)
Thus, with default sizes:
- a collective epoch has 61 bits for the sequential epoch ID;
- a rooted epoch has 16 less bits or 45 bits for the sequential epoch ID (assuming number of nodes fits in 16-bits).
Each EpochWindow
manages allocation/deallocation for epochs of a particular type. The type is all the high bits minus the SeqEpochID
(set to zero). EpochWindow
uses IntegralSet
to compactly store terminated epochs. As epochs complete out of order, the data structure grows in size, but it's generally very efficient.
Epoch Scoping Motivation
Correct collective epoch allocation relies on ordered execution across nodes/ranks. Specifically, within a particular type of epoch, the sequential epoch is repeatedly incremented by 1 each time a new epoch is allocated until a free epoch is found. This presents two problems:
- Work units that create collective epochs must be ordered (even across "modules")
- There is a potential race in terms of discovery of epoch termination and re-use. For this to be a problem, an application would have to process an insane number of epochs to wrap around. The conscious design choice here was to not use the first free epoch returned from the
IntegralSet
---instead, use the next one in the sequence until a free one is found. This means as long as the program is generally reasonable the race is highly improbable.
Epoch scoping has the potential to solve both these problems (and make races even more unlikely). We could also attempt to completely eradicate the possibility of a race---but that might not worth the effort or communication cost.
Initially, I proposed a design with a low number of bits (~5-8 bits) dedicated to epoch scopes. However, even a simple program that uses epoch scopes