Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2201: implement memory aware temperedlb in vt rebased (new version) #2278

Merged
merged 126 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from 125 commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
195528d
#2201: added enums to specify transfer strategy
ppebay Oct 18, 2023
2cbd255
#2201: created framework to integrate transfer strategy ivar
ppebay Oct 18, 2023
e24b454
#2201: added transfer type key to getInputKeysWithHelp()
ppebay Oct 19, 2023
1bfe7b4
#2201: addressed PR2203 NS reviww comments
ppebay Nov 18, 2023
ee5c333
#2201: Update src/vt/vrt/collection/balance/temperedlb/temperedlb.cc
ppebay Nov 22, 2023
8ad4056
#2201: checkpoint of non-breaking changes (documentation and style)
ppebay Nov 27, 2023
844db21
#2201: fixed the incorrect transfer type causing build error
ppebay Nov 28, 2023
ef8ef76
#2201: added enums to specify transfer strategy
ppebay Oct 18, 2023
2f67dc4
#2201: created framework to integrate transfer strategy ivar
ppebay Oct 18, 2023
d7fdc3b
#2201: added transfer type key to getInputKeysWithHelp()
ppebay Oct 19, 2023
03290cb
#2201: addressed PR2203 NS reviww comments
ppebay Nov 18, 2023
b3d428c
#2201: args: add arg to force LB to run on the first phase (mainly fo…
lifflander Nov 29, 2023
55dd1c0
#2201: temperedlb: implement basic memory information consumption, th…
lifflander Nov 29, 2023
855ef88
#2201: temperedlb: add computation for cluster/memory summary
lifflander Nov 29, 2023
8ccb65f
#2201: temperedlb: clear cur_block_ before recomputing the summary
lifflander Nov 29, 2023
81a48f6
#2201: tools: NOT to merge: for now, add the user-defined problem to …
lifflander Nov 29, 2023
d3ff67c
#2201: temperedlb: add missing check for zero cluster size
lifflander Nov 29, 2023
0b0c81c
#2201: temperedlb: add cluster summary to messages
lifflander Nov 29, 2023
66c778d
#2201: temperedlb: add data structures to track other rank's clusters
lifflander Nov 29, 2023
25aa87e
#2201: fixed EOL CI error
ppebay Nov 29, 2023
2077781
#2201: fixed print errors; added pseudocode; and epoch boilerplate
ppebay Nov 29, 2023
e96a09f
#2201: temperedlb: add swap clusters call, fix git history
lifflander Nov 29, 2023
94d04af
#2201: temperedlb: fix whitespace
lifflander Nov 29, 2023
d4b2e0f
#2201: temperedlb: rename cur_blocks_ to cur_clusters_
lifflander Nov 29, 2023
30ba257
#2201: temperedlb: sketch of some code written in the meeting
lifflander Nov 30, 2023
bf3ff66
#2201: temperedlb: use new method of getting user data, fix error mes…
lifflander Dec 4, 2023
58514dc
#2201: temperedlb: fix indentation
lifflander Dec 4, 2023
3d9fdab
#2201: temperedlb: implement locking and swapping protocol--may deadlock
lifflander Dec 5, 2023
78e9249
#2201: temperedlb: fix a couple of bugs
lifflander Dec 5, 2023
b0360cb
#2201: temperedlb: a hack for now to work around the deadlock problem
lifflander Dec 5, 2023
a67faf1
#2201: temperedlb: fix bug in the code due to reentrancy causing prob…
lifflander Dec 5, 2023
66bca79
#2201: temperedlb: fix bug in code sending memory usage
lifflander Dec 5, 2023
91c0103
#2201: temperedlb: fix some other minor bugs, add empty cluster "swap"
lifflander Dec 5, 2023
735e834
#2201: annotated code for side-by-side comparison with LBAF
ppebay Dec 5, 2023
c583530
#2201: annotated following discussion
ppebay Dec 5, 2023
a1a1bf9
#2201: temperedlb: switch a ton of prints to debug prints
lifflander Dec 5, 2023
2cc9bbb
#2201: temperedlb: add proper approximation for memory usage for empt…
lifflander Dec 5, 2023
4c42993
#2201: temperedlb: fix bug where some ranks don't participate if they…
lifflander Dec 5, 2023
294bb58
#2201: clarify annotation
ppebay Dec 5, 2023
df61995
#2201: include infinite value for memory overflow
ppebay Dec 5, 2023
be1a30c
#2201: temperedlb: fix whitespace
lifflander Dec 5, 2023
cf34d5b
#2201: temperedlb: switch other criterion to use negative inf
lifflander Dec 5, 2023
c10f80a
#2201: temperedlb: adding working bytes transfer for correctness
lifflander Dec 5, 2023
c1a9b7d
#2201: temperedlb: add header file comments for the new methods added
lifflander Dec 5, 2023
90bffc7
#2201: temperedlb: start implementing sub-clustering
lifflander Dec 6, 2023
68b4d75
#2201: temperedlb: sub-clustering implemented, disabled by default fo…
lifflander Dec 7, 2023
d8f75d8
#2201: reviewed algorithm and annotated for better legibility
ppebay Dec 9, 2023
c3d3369
#2201: added tempered criterion when ONLY load transfer is considered
ppebay Dec 9, 2023
794b9c5
#2201: factored out computations of load-based tempered criterion
ppebay Dec 9, 2023
e5e7b47
#2201: trailing whitespace cleanup
ppebay Dec 9, 2023
4bd78e0
#2201: temperedlb: dedicated method for memory component of criterion
ppebay Dec 11, 2023
25a5c0c
#2201: temperedlb: fixed load criterion implementation and usage
ppebay Dec 12, 2023
40f0d36
#2201: temperedlb: add ordered locking protocol
lifflander Dec 12, 2023
6e8e8cf
#2201: temperedlb: fixed index of phase -2
ppebay Dec 19, 2023
8c343f9
#2201: temperedlb: fix tracing by disabling it during temperedlb
lifflander Jan 16, 2024
33719c7
#2201: temperedlb: read in comm edges and make graph symmetric
lifflander Jan 16, 2024
d72582b
#2201: temperedlb: add work model and computation for it
lifflander Jan 16, 2024
e4ee44c
#2201: temperedlb: add shared IDs to element communication
lifflander Jan 17, 2024
befcef5
#2201: temperedlb: add getter for rank-based LB data and elm ID for c…
lifflander Jan 17, 2024
66f2e32
#2201: temperedlb: add computation for work model for given distribution
lifflander Jan 17, 2024
479ec2a
#2201: temperedlb: add rank working bytes to the inform to improve ap…
lifflander Jan 17, 2024
4a07f8c
#2201: temperedlb: use exact working bytes for approximation of memor…
lifflander Jan 17, 2024
986d08e
#2201: temperedlb: make getRankLBData public
lifflander Jan 17, 2024
77d7380
#2201: temperedlb: fix copy-paste error in json type
lifflander Jan 17, 2024
89fc458
#2201: temperedlb: fully implement the new work model for cluster swaps
lifflander Jan 17, 2024
0cf6087
#2201: temperedlb: compute working bytes correctly in criterion
lifflander Jan 18, 2024
f320910
#2201: lb_manager: fix offset when run on phase 0
lifflander Jan 18, 2024
8f2b153
#2201: temperedlb: add the rest of the memory model
lifflander Jan 18, 2024
bb97080
#2201: temperedlb: switch work breakdown print to debug print
lifflander Jan 18, 2024
25bb18d
#2201: temperedlb: add abort if we go over the threshold
lifflander Jan 22, 2024
94fcac3
#2201: temperedlb: add a bunch of prints for debugging
lifflander Jan 22, 2024
65bb487
#2201: temperedlb: set locked while it has a lock to avoid giving a l…
lifflander Jan 22, 2024
870abb9
#2201: temperedlb: make greek symbols line up with paper
lifflander Mar 21, 2024
734e005
#2201: temperedlb: fix some typos
lifflander Apr 9, 2024
f6b484c
#2201: temperedlb: read shared block home ranks from json file
nlslatt Apr 9, 2024
09fff92
#2201: temperedlb: fix compile error and warning
nlslatt Apr 9, 2024
af7c357
#2201: temperedlb: make symmedges prints debug verbose
nlslatt Apr 9, 2024
e719cb7
#2201: temperedlb: print final unhomed blocks without debug
nlslatt Apr 9, 2024
2ba5ec0
#2201: tools: NOT to merge: add paper reproducer input and script
nlslatt Apr 9, 2024
9e56bcf
#2201: temperedlb: fix typos in comments and strings
nlslatt Apr 15, 2024
76b4792
#2201: tools: NOT to merge: update user-defined toy problem readme
nlslatt Apr 15, 2024
0c07732
#2201: tools: NOT to merge: add alternative in paper reproducer script
nlslatt Apr 15, 2024
1824c8e
#2201: temperedlb: remove subclustering for now
lifflander May 1, 2024
a26af8a
#2201: temperedlb: stop using greek letters to avoid making some comp…
lifflander May 1, 2024
54873d2
#2201: temperedlb: do not capture structured bindings
cz4rs May 7, 2024
81076b2
#2201: tools: fix shellcheck complaints
cz4rs May 9, 2024
cd51729
#2201: temperedlb: add FIXME
cz4rs May 9, 2024
a668eb9
#2201: temperedlb: reduce duplication
cz4rs May 9, 2024
002c532
#2201: temperedlb: include fmt correctly
cz4rs May 10, 2024
b13b625
#2201: temperedlb: filter by `isMigratable`
cz4rs May 20, 2024
66a7294
#2201: revert "temperedlb: filter by `isMigratable`"
cz4rs May 22, 2024
098aed9
#2201: baselb: filter by `isMigratable`
cz4rs May 24, 2024
a0a6688
#2201: temperedlb: check if the obj is migratable during transfer stage
cz4rs May 28, 2024
1b1d058
#2201: temperedlb: remove redundant code
cz4rs May 28, 2024
1cfe2e9
#2201: fix unused variable warning
cz4rs May 29, 2024
abdf1ef
#2201: temperedlb: keep `eval` and `is_migratable` separate
cz4rs Jun 4, 2024
cbdf71a
#2201: baselb: abort when not migratable
cz4rs Jun 4, 2024
cbf6f1b
#2201: temperedlb: use gamma as coefficient
cz4rs Jun 4, 2024
19df7f8
#2201: temperedlb: remove NormBySelf
cz4rs Jun 4, 2024
7a16983
#2201: temperedlb: add comment for inter-node comm
cz4rs Jun 11, 2024
5735cc3
#2201: test running LB on first phase
cz4rs Jun 12, 2024
ad35330
#2201: reduce code duplication
cz4rs Jun 12, 2024
5e5eb85
#2201: provide basic implementation for `getComm`
cz4rs Jun 13, 2024
ef5c9b8
#2201: Revert "tools: NOT to merge: add alternative in paper reproduc…
cz4rs Jun 27, 2024
1e3e8cf
#2201: Revert "tools: NOT to merge: update user-defined toy problem r…
cz4rs Jun 27, 2024
42a81ac
#2201: Revert "tools: NOT to merge: add paper reproducer input and sc…
cz4rs Jun 27, 2024
9805d9d
#2201: Revert "tools: NOT to merge: for now, add the user-defined pro…
cz4rs Jun 27, 2024
2d07a6e
#2201: remove obsolete comment
cz4rs Jun 27, 2024
d99cdb4
#2201: lb: reduce code duplication
cz4rs Jun 27, 2024
d243bbe
#2201: lb: use named constant for uninitialized
cz4rs Jun 27, 2024
a3b5233
#2201: tests: add tests for temperedLB with load, load+memory, and lo…
cwschilly Aug 13, 2024
8bb402b
#2201: tests: remove trailing whitespace
cwschilly Aug 13, 2024
7aa6f6e
#2201: tests: avoid calculating imbalance manually
cwschilly Aug 14, 2024
6840a75
#2201: tests: only run tests on four nodes; renamed shared_id to shar…
cwschilly Aug 14, 2024
84acbb2
#2201: update test cases; restore shared_id key to json data files; a…
cwschilly Aug 19, 2024
29ecfd5
#2201: wip: fix review comments; add collection_id to synthetic data
cwschilly Sep 5, 2024
31472d4
#2201: loosen strict inequalities for criterion; remove epsilon from …
cwschilly Sep 11, 2024
c5a4a8f
#2201: add test for delta=0.3
cwschilly Sep 12, 2024
562ccbb
#2201: remove comms test for now
cwschilly Sep 12, 2024
26ba0e6
#2201: remove commented out epsilon
cwschilly Sep 13, 2024
e5a8e11
#2201: fix bug in schema; require collection_id for migratable objects
cwschilly Sep 13, 2024
d216c4b
#2201: add collection_id and index to initialization test
cwschilly Sep 13, 2024
894ea64
#2201: tests: reformat to follow style guidlines and using theContext
lifflander Sep 13, 2024
8504c33
#2201: fix remaining review comments; loosen collection_id requiremen…
cwschilly Sep 16, 2024
3a7d707
#2201: pass memory_threshold to config generator
cwschilly Sep 16, 2024
a83c66a
#2201: run tests with three trials
cwschilly Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions scripts/JSON_data_files_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,13 +434,14 @@ def validate_comm_links(all_jsons):

for data in all_jsons:
tasks = data["phases"][n]["tasks"]
id_key = "id" if "id" in tasks[0]["entity"] else "seq_id"
task_ids.update({int(task["entity"][id_key]) for task in tasks})
task_ids.update(
{int(task["entity"].get("id", task["entity"].get("seq_id"))) for task in tasks}
)

if data["phases"][n].get("communications") is not None:
comms = data["phases"][n]["communications"]
comm_ids.update({int(comm["from"][id_key]) for comm in comms})
comm_ids.update({int(comm["to"][id_key]) for comm in comms})
comm_ids.update({int(comm["from"].get("id", comm["from"].get("seq_id"))) for comm in comms})
comm_ids.update({int(comm["to"].get("id", comm["to"].get("seq_id"))) for comm in comms})

if not comm_ids.issubset(task_ids):
logging.error(
Expand Down
17 changes: 12 additions & 5 deletions scripts/LBDatafile_schema.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
from schema import And, Optional, Schema

def validate_id_and_seq_id(field):
"""Ensure that either seq_id or id is provided."""
def validate_ids(field):
"""
Ensure that 1) either seq_id or id is provided,
and 2) if an object is migratable, collection_id has been set.
"""
if 'seq_id' not in field and 'id' not in field:
raise ValueError('Either id (bit-encoded) or seq_id must be provided.')

if field['migratable'] and 'seq_id' in field and 'collection_id' not in field:
raise ValueError('If an entity is migratable, it must have a collection_id')

return field

LBDatafile_schema = Schema(
Expand Down Expand Up @@ -45,7 +52,7 @@ def validate_id_and_seq_id(field):
'type': str,
'migratable': bool,
Optional('objgroup_id'): int
}, validate_id_and_seq_id),
}, validate_ids),
'node': int,
'resource': str,
Optional('subphases'): [
Expand All @@ -71,7 +78,7 @@ def validate_id_and_seq_id(field):
Optional('migratable'): bool,
Optional('index'): [int],
Optional('objgroup_id'): int,
}, validate_id_and_seq_id),
}, validate_ids),
'messages': int,
'from': And({
'type': str,
Expand All @@ -82,7 +89,7 @@ def validate_id_and_seq_id(field):
Optional('migratable'): bool,
Optional('index'): [int],
Optional('objgroup_id'): int,
}, validate_id_and_seq_id),
}, validate_ids),
'bytes': float
}
],
Expand Down
1 change: 1 addition & 0 deletions src/vt/configs/arguments/app_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ struct AppConfig {
bool vt_lb_self_migration = false;
bool vt_lb_spec = false;
std::string vt_lb_spec_file = "";
bool vt_lb_run_lb_first_phase = false;


bool vt_no_detect_hang = false;
Expand Down
3 changes: 3 additions & 0 deletions src/vt/configs/arguments/args.cc
Original file line number Diff line number Diff line change
Expand Up @@ -913,6 +913,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
auto lb_self_migration = "Allow load balancer to migrate objects to the same node";
auto lb_spec = "Enable LB spec file (defines which phases output LB data)";
auto lb_spec_file = "File containing LB spec; --vt_lb_spec to enable";
auto lb_first_phase_info = "Force LB to run on the first phase (phase 0)";
auto s = app.add_flag("--vt_lb", appConfig.vt_lb, lb);
auto t1 = app.add_flag("--vt_lb_quiet", appConfig.vt_lb_quiet, lb_quiet);
auto u = app.add_option("--vt_lb_file_name", appConfig.vt_lb_file_name, lb_file_name)->capture_default_str()->check(CLI::ExistingFile);
Expand All @@ -935,6 +936,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
auto lbasm = app.add_flag("--vt_lb_self_migration", appConfig.vt_lb_self_migration, lb_self_migration);
auto lbspec = app.add_flag("--vt_lb_spec", appConfig.vt_lb_spec, lb_spec);
auto lbspecfile = app.add_option("--vt_lb_spec_file", appConfig.vt_lb_spec_file, lb_spec_file)->capture_default_str()->check(CLI::ExistingFile);
auto lb_first_phase = app.add_flag("--vt_lb_run_lb_first_phase", appConfig.vt_lb_run_lb_first_phase, lb_first_phase_info);
cz4rs marked this conversation as resolved.
Show resolved Hide resolved

// --vt_lb_name excludes --vt_lb_file_name, and vice versa
v->excludes(u);
Expand Down Expand Up @@ -963,6 +965,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
lbasm->group(debugLB);
lbspec->group(debugLB);
lbspecfile->group(debugLB);
lb_first_phase->group(debugLB);

// help options deliberately omitted from the debugLB group above so that
// they appear grouped with --vt_help when --vt_help is used
Expand Down
1 change: 1 addition & 0 deletions src/vt/configs/types/types_sentinels.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ static constexpr SequentialIDType const first_seq_id = 1;
static constexpr PriorityType const no_priority = 0;
static constexpr PriorityLevelType const no_priority_level = 0;
static constexpr ThreadIDType const no_thread_id = 0;
static constexpr SharedIDType const no_shared_id = -1;

} // end namespace vt

Expand Down
2 changes: 2 additions & 0 deletions src/vt/configs/types/types_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ using PriorityLevelType = uint8_t;
using ComponentIDType = uint32_t;
/// Used to hold a unique ID for a user-level thread on a particular node
using ThreadIDType = uint64_t;
/// Used to hold a shared ID
using SharedIDType = int;

// Action types for attaching a closure to a runtime function
/// Used for generically store an action to perform
Expand Down
28 changes: 24 additions & 4 deletions src/vt/elm/elm_comm.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
#if !defined INCLUDED_VT_ELM_ELM_COMM_H
#define INCLUDED_VT_ELM_ELM_COMM_H

#include "vt/configs/types/types_type.h"
#include "vt/elm/elm_id.h"

#include <unordered_map>
Expand All @@ -58,7 +59,9 @@ enum struct CommCategory : int8_t {
CollectionToNodeBcast = 5,
NodeToCollectionBcast = 6,
CollectiveToCollectionBcast = 7,
LocalInvoke = 8
LocalInvoke = 8,
WriteShared = 9,
ReadOnlyShared = 10
};

inline NodeType objGetNode(ElementIDStruct const id) {
Expand All @@ -71,6 +74,8 @@ struct CommKey {
struct CollectionTag { };
struct CollectionToNodeTag { };
struct NodeToCollectionTag { };
struct WriteSharedTag { };
struct ReadOnlySharedTag { };

CommKey() = default;
CommKey(CommKey const&) = default;
Expand Down Expand Up @@ -107,12 +112,25 @@ struct CommKey {
cat_(bcast ? CommCategory::NodeToCollectionBcast : CommCategory::NodeToCollection)
{ }

CommKey(
WriteSharedTag,
NodeType in_home, int in_shared_id
) : nto_(in_home), shared_id_(in_shared_id), cat_(CommCategory::WriteShared)
{ }

CommKey(
ReadOnlySharedTag,
NodeType in_home, int in_shared_id
) : nto_(in_home), shared_id_(in_shared_id), cat_(CommCategory::ReadOnlyShared)
{ }

ElementIDStruct from_ = {};
ElementIDStruct to_ = {};

ElementIDStruct edge_id_ = {};
NodeType nfrom_ = uninitialized_destination;
NodeType nto_ = uninitialized_destination;
SharedIDType shared_id_ = no_shared_id;
CommCategory cat_ = CommCategory::SendRecv;

ElementIDStruct fromObj() const { return from_; }
Expand All @@ -121,6 +139,7 @@ struct CommKey {
ElementIDType toNode() const { return nto_; }
ElementIDStruct edgeID() const { return edge_id_; }
CommCategory commCategory() const { return cat_; }
int sharedID() const { return shared_id_; }

bool selfEdge() const { return cat_ == CommCategory::SendRecv and from_ == to_; }
bool offNode() const {
Expand All @@ -140,12 +159,12 @@ struct CommKey {
return
k.from_ == from_ and k.to_ == to_ and
k.nfrom_ == nfrom_ and k.nto_ == nto_ and
k.cat_ == cat_;
k.cat_ == cat_ and k.shared_id_ == shared_id_;
}

template <typename SerializerT>
void serialize(SerializerT& s) {
s | from_ | to_ | nfrom_ | nto_ | cat_ | edge_id_;
s | from_ | to_ | nfrom_ | nto_ | cat_ | edge_id_ | shared_id_;
}
};

Expand Down Expand Up @@ -189,7 +208,8 @@ struct hash<vt::elm::CommKey> {
size_t operator()(vt::elm::CommKey const& in) const {
return std::hash<uint64_t>()(
std::hash<vt::elm::ElementIDStruct>()(in.from_) ^
std::hash<vt::elm::ElementIDStruct>()(in.to_) ^ in.nfrom_ ^ in.nto_
std::hash<vt::elm::ElementIDStruct>()(in.to_) ^ in.nfrom_ ^ in.nto_ ^
in.shared_id_
);
}
};
Expand Down
5 changes: 5 additions & 0 deletions src/vt/elm/elm_id.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
//@HEADER
*/

#include "vt/context/context.h"
#include "vt/elm/elm_id.h"
#include "vt/elm/elm_id_bits.h"

Expand All @@ -58,4 +59,8 @@ NodeType ElementIDStruct::getCurrNode() const {
return curr_node;
}

bool ElementIDStruct::isLocatedOnThisNode() const {
return theContext()->getNode() == curr_node and not isMigratable();
}

}} /* end namespace vt::elm */
1 change: 1 addition & 0 deletions src/vt/elm/elm_id.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ struct ElementIDStruct {
bool isMigratable() const;
NodeType getHomeNode() const;
NodeType getCurrNode() const;
bool isLocatedOnThisNode() const;
};


Expand Down
16 changes: 16 additions & 0 deletions src/vt/elm/elm_lb_data.cc
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,22 @@ void ElementLBData::sendToEntity(
sendComm(key, bytes);
}

void ElementLBData::addWritableSharedID(
NodeType home, int shared_id, double bytes
) {
sendComm(
elm::CommKey{elm::CommKey::WriteSharedTag{}, home, shared_id}, bytes
);
}

void ElementLBData::addReadOnlySharedID(
NodeType home, int shared_id, double bytes
) {
sendComm(
elm::CommKey{elm::CommKey::ReadOnlySharedTag{}, home, shared_id}, bytes
);
}

void ElementLBData::sendComm(elm::CommKey key, double bytes) {
phase_comm_[cur_phase_][key].sendMsg(bytes);
subphase_comm_[cur_phase_].resize(cur_subphase_ + 1);
Expand Down
3 changes: 3 additions & 0 deletions src/vt/elm/elm_lb_data.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ struct ElementLBData {
void sendToEntity(ElementIDStruct to, ElementIDStruct from, double bytes);
void sendComm(elm::CommKey key, double bytes);

void addWritableSharedID(NodeType home, int shared_id, double bytes);
void addReadOnlySharedID(NodeType home, int shared_id, double bytes);

void recvComm(elm::CommKey key, double bytes);
void recvObjData(
ElementIDStruct to_perm,
Expand Down
13 changes: 13 additions & 0 deletions src/vt/messaging/active.h
Original file line number Diff line number Diff line change
Expand Up @@ -1722,6 +1722,19 @@ struct ActiveMessenger : runtime::component::PollableComponent<ActiveMessenger>
MsgSizeType const msg_size
);

public:
/**
* \brief Get the rank-based LB data along with element ID for rank-based work
*
* \return tuple with pointers to each one
*/
auto getRankLBData() {
return std::make_tuple(
&bare_handler_dummy_elm_id_for_lb_data_,
&bare_handler_lb_data_
);
}

private:
# if vt_check_enabled(trace_enabled)
trace::UserEventIDType trace_irecv = trace::no_user_event_id;
Expand Down
3 changes: 3 additions & 0 deletions src/vt/vrt/collection/balance/baselb/baselb.cc
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ std::shared_ptr<const balance::Reassignment> BaseLB::normalizeReassignments() {
auto const new_node = std::get<1>(transfer);
auto const current_node = obj_id.curr_node;

vtAbortIf(
not obj_id.isMigratable(), "Transfering object that is not migratable"
);
if (current_node == new_node) {
vt_debug_print(
verbose, lb, "BaseLB::normalizeReassignments(): self migration\n"
Expand Down
43 changes: 40 additions & 3 deletions src/vt/vrt/collection/balance/lb_data_holder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -252,9 +252,7 @@ std::unique_ptr<nlohmann::json> LBDataHolder::toJson(PhaseType phase) const {

i = 0;
if (node_comm_.find(phase) != node_comm_.end()) {
for (auto&& elm : node_comm_.at(phase)) {
auto volume = elm.second;
auto const& key = elm.first;
for (auto const& [key, volume] : node_comm_.at(phase)) {
j["communications"][i]["bytes"] = volume.bytes;
j["communications"][i]["messages"] = volume.messages;

Expand Down Expand Up @@ -296,6 +294,17 @@ std::unique_ptr<nlohmann::json> LBDataHolder::toJson(PhaseType phase) const {
outputEntity(j["communications"][i]["from"], key.fromObj());
break;
}
case elm::CommCategory::ReadOnlyShared:
case elm::CommCategory::WriteShared: {
j["communications"][i]["type"] =
(key.cat_ == elm::CommCategory::ReadOnlyShared) ?
"ReadOnlyShared" : "WriteShared";
j["communications"][i]["to"]["type"] = "node";
j["communications"][i]["to"]["id"] = key.toNode();
j["communications"][i]["from"]["type"] = "shared_id";
j["communications"][i]["from"]["id"] = key.sharedID();
break;
}
case elm::CommCategory::LocalInvoke:
case elm::CommCategory::CollectiveToCollectionBcast:
// not currently supported
Expand Down Expand Up @@ -476,6 +485,34 @@ LBDataHolder::LBDataHolder(nlohmann::json const& j)
);
CommVolume vol{bytes, messages};
this->node_comm_[id][key] = vol;
} else if (
type == "ReadOnlyShared" or type == "WriteShared"
) {
vtAssertExpr(comm["from"]["type"] == "shared_id");
vtAssertExpr(comm["to"]["type"] == "node");

CommVolume vol{bytes, messages};
auto to_node = comm["to"]["id"];
vtAssertExpr(to_node.is_number());

auto from_shared_id = comm["from"]["id"];
vtAssertExpr(from_shared_id.is_number());

if (type == "ReadOnlyShared") {
CommKey key(
CommKey::ReadOnlySharedTag{},
static_cast<NodeType>(to_node),
static_cast<int>(from_shared_id)
);
this->node_comm_[id][key] = vol;
} else {
CommKey key(
CommKey::WriteSharedTag{},
static_cast<NodeType>(to_node),
static_cast<int>(from_shared_id)
);
this->node_comm_[id][key] = vol;
}
}
}
}
Expand Down
3 changes: 0 additions & 3 deletions src/vt/vrt/collection/balance/lb_data_holder.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,9 @@

#include "vt/config.h"
#include "vt/vrt/collection/balance/lb_common.h"
#include "vt/elm/elm_comm.h"

#include <unordered_map>
#include <memory>
#include <variant>
#include <string>

#include <nlohmann/json_fwd.hpp>

Expand Down
6 changes: 5 additions & 1 deletion src/vt/vrt/collection/balance/lb_invoke/lb_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,11 @@ LBType LBManager::decideLBToRun(PhaseType phase, bool try_file) {
} else {
auto interval = theConfig()->vt_lb_interval;
vtAssert(interval != 0, "LB Interval must not be 0");
if (phase % interval == 1 || (interval == 1 && phase != 0)) {
vt::PhaseType offset = theConfig()->vt_lb_run_lb_first_phase ? 0 : 1;
if (
phase % interval == offset ||
(interval == 1 && phase != 0)
) {
bool name_match = false;
for (auto&& elm : get_lb_names()) {
if (elm.second == theConfig()->vt_lb_name) {
Expand Down
4 changes: 4 additions & 0 deletions src/vt/vrt/collection/balance/model/composed_model.cc
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,8 @@ int ComposedModel::getNumSubphases() const {
return base_->getNumSubphases();
}

CommMapType ComposedModel::getComm(PhaseOffset when) const {
return base_->getComm(when);
}

}}}}
1 change: 1 addition & 0 deletions src/vt/vrt/collection/balance/model/composed_model.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ class ComposedModel : public LoadModel
bool hasUserData() const override;
ElmUserDataType getUserData(ElementIDStruct object, PhaseOffset when) const override;
unsigned int getNumPastPhasesNeeded(unsigned int look_back) const override;
CommMapType getComm(PhaseOffset offset) const override;

ObjectIterator begin() const override;

Expand Down
Loading
Loading