Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1934: Add parameter to control minimal retention of historical LB data #1996

Open
wants to merge 82 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
7c823e0
#1934: Add parameter to control minimal retention of historical LB data
thearusable Oct 17, 2022
932f1d6
#1934: Add UTs for minimal LB data retention
thearusable Oct 18, 2022
f2b2a6d
#1934: Store LB data in ordered map
thearusable Jan 27, 2023
d31709f
#1934: Update UTs to better reflect which phases are kept
thearusable Jun 6, 2023
7de6353
#1934: Add tests for DynamicCircularBuffer
thearusable Nov 10, 2023
18cf374
#1934: Add continuous dynamic circular buffer container
thearusable Nov 10, 2023
7d49646
#1934: Change node_data_ to new buffer type
thearusable Nov 10, 2023
f801dca
#1934: Use new container type in LB Data Holder
thearusable Nov 14, 2023
b7fb70d
#1934: Update usage of circular buffer after change in the operator[]
thearusable Nov 14, 2023
374aed0
#1934: Update implementation of setting correct buffer size in NodeLB…
thearusable Nov 14, 2023
cda9fd8
#1934: Remove trailing whitespaces
thearusable Nov 14, 2023
c6ec830
#1934: Add missing resize in linear model test
thearusable Nov 14, 2023
bbb3303
#1934: Allow contains to be called when size is zero
thearusable Nov 14, 2023
e2bcfd7
#1934: Add iterator support for buffer type
thearusable Nov 22, 2023
3182b05
#1934: Fix failing tests related to buffers being not initialized
thearusable Nov 23, 2023
a9d4b45
#1934: Add cosmetic changes and reverse some unneded changes
thearusable Nov 24, 2023
8070c2b
#1934: Use CircularPhasesBuffer in ElementLBData
thearusable Nov 28, 2023
0a7f8eb
#1934: Pass the requested retention to ElementLBData
thearusable Nov 28, 2023
4a8ffe2
#1934: Update retention tests to expect correct amount of phases in E…
thearusable Nov 28, 2023
03669cd
#1934: Adjust buffers size in objs of the collection
thearusable Nov 28, 2023
133c88b
#1934: Reaply interface changes after rebase
thearusable Dec 11, 2023
6b6a0e5
#1934: Synchronize buffer size in TestCol before checking for persist…
thearusable Dec 12, 2023
f88eb33
#1934: Set initial timings buffer size to 1
thearusable Dec 12, 2023
5820529
#1934: Remove update of the buffers in the group
thearusable Dec 12, 2023
1c969c1
#1934: Use buffer alias in models
thearusable Dec 12, 2023
ebe1805
#1934: Add more tests for circular phases buffer
thearusable Dec 15, 2023
529659f
#1934: Add more test cases for CircularPhasesBuffer
thearusable Dec 19, 2023
86c60de
#1934: Avoid crashing when container was not resized yet
thearusable Dec 19, 2023
76a65ec
#1934: Finish first version of the CirculaPhasesBuffer
thearusable Dec 19, 2023
c5772d9
#1934: Update tests for CircularPhasesBuffer
thearusable Dec 20, 2023
2c2db3d
#1934: Update implementation of CirculaPhasesBuffer
thearusable Dec 20, 2023
0be51a4
#1934: Add documentation for CirculalPhasesBuffer
thearusable Dec 20, 2023
79d8a10
#1934: Fix compilation issue on Apple Clang
thearusable Dec 21, 2023
e473ec9
#1934: Fix NVCC warning related to unsigned variable
thearusable Dec 29, 2023
34cf664
#1934: Update implementation after resolving conflicts
thearusable Apr 23, 2024
96373a8
#1934: Refactor of the CircularPhasesBuffer to be dynamic in size unt…
thearusable Apr 30, 2024
e0eb2bc
#1934: Remove unnecessary resizes
thearusable Apr 30, 2024
7a43e69
#1934: Update unit tests to check all relevant fields in NodeLBData
thearusable Apr 30, 2024
376c516
#1934: Remove trailing whitespaces
thearusable Apr 30, 2024
0b1fb56
#1934: Remove usage of old LBDataHolder constructor
thearusable May 6, 2024
899cd3e
#1934: Enable serialization of queue in CircularPhasesBuffer
thearusable Jun 6, 2024
f7e26ff
#1934: Add missing include with unordered_map
thearusable Jun 6, 2024
9b066bc
#1934: Update units tests to fix CI failure
thearusable Jun 11, 2024
6ffedd8
#1934: Use common types in tests
thearusable Jun 11, 2024
5ca4a62
#1934: Undo not needed changes
thearusable Jun 11, 2024
3658c6a
#1934: Make CircularBuffer interface more like a map
thearusable Jun 11, 2024
8ec8fe2
#1934: Update collection creation in tests
thearusable Jun 11, 2024
e225d2b
#1934: Remove small changes
thearusable Jul 4, 2024
41f4541
#1934: Update CircularPhasesBuffer to follow vt style of naming varia…
thearusable Jul 15, 2024
3942cad
#1934: Avoid unnecessary copies of data when adding it to cache
thearusable Jul 17, 2024
5e2d8a7
#1934: Remove const from addToCache data parameter
thearusable Jul 18, 2024
9c86660
#1934: Update unit tests to work with new implementation of the buffer
thearusable Aug 22, 2024
ff860c8
#1934: Update implementation of the circular buffer to use std::vector
thearusable Aug 22, 2024
0fc1778
#1934: Add front and back methods to CircularPhasesBuffer
thearusable Aug 26, 2024
4af903f
#1934: Add tests for front and back methods
thearusable Aug 26, 2024
1822f20
#1934: Update usage of circular phases buffer
thearusable Aug 26, 2024
f252c53
#1934: Add resizing functionality to LBDataHolder buffers
thearusable Aug 27, 2024
f519dca
#1934: Update unit tests to use new constructor of LBDataHolder
thearusable Aug 27, 2024
fef230b
#1934: Disable resizing of the buffer to zero
thearusable Aug 28, 2024
35fc771
#1934: Update tests to work correctly with the new buffer type
thearusable Aug 28, 2024
5c34790
#1934: Update implementation of circular buffer to work based on modu…
thearusable Sep 17, 2024
14c17f3
#1934: Update tests for circular buffer to support the new implementa…
thearusable Sep 17, 2024
5a94672
#1934: Update usage of the buffer in the codebase
thearusable Sep 17, 2024
1d828ae
#1934: Update LBDataHolder to use default constructor
thearusable Sep 17, 2024
13b5d7f
#1934: Remove list initializer constructor
thearusable Sep 19, 2024
b36b933
#1934: Update LBDataHolder to resize containers when reading data fro…
thearusable Sep 19, 2024
7f3f0e1
#1934: Update tests to not use the removed constructor
thearusable Sep 19, 2024
77b8fb9
#1934: Update usage of LBDataHolder in the codebase
thearusable Sep 19, 2024
3bc5550
#1934: Fix compilation issues after the PR rebase
thearusable Sep 19, 2024
f4ce66b
#1934: Reset ElementLBData containers after resetting the current phase
thearusable Sep 19, 2024
c9c65e6
#1934: Remove leftovers from previous implementation of the circular …
thearusable Sep 19, 2024
9a056c0
#1934: Update CircularPhasesBuffer documentation
thearusable Sep 19, 2024
b9f2293
#1934: Add resizeHistory method back after resolving conflicts
thearusable Sep 23, 2024
57d0457
#1934: Modify ElementLBData::resetPhase() to reset the head phase of …
thearusable Sep 23, 2024
57b70f7
#1934: Adapt newly added functionality in LBDataHolder to support Cir…
thearusable Sep 23, 2024
ddc3c57
#1934: Remove comparison of signed and unsigned integers in retention…
thearusable Sep 23, 2024
c6db957
#1934: Add LB data retention tests for chckpointing case
thearusable Sep 24, 2024
a4ff0c6
#1934: Improve documentation for CircularPhasesBuffer
thearusable Sep 24, 2024
c9e9120
#1934: Modify test harness to allow for vt restarts
thearusable Sep 26, 2024
0623fc4
#1934: Prepare content of the containers for work after restore from …
thearusable Sep 26, 2024
37b719f
#1934: Update retention test to do a full recreation of the vt objects
thearusable Sep 26, 2024
0b2489f
#1934: Fix retention test checks when LB is disabled
thearusable Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/vt/configs/arguments/app_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ struct AppConfig {
bool vt_lb_data = false;
bool vt_lb_data_compress = true;
bool vt_lb_data_in = false;
uint32_t vt_lb_data_retention = 0;
std::string vt_lb_data_dir = "vt_lb_data";
std::string vt_lb_data_file = "data.%p.json";
std::string vt_lb_data_dir_in = "vt_lb_data_in";
Expand Down Expand Up @@ -325,6 +326,7 @@ struct AppConfig {
| vt_lb_interval
| vt_lb_data
| vt_lb_data_compress
| vt_lb_data_retention
| vt_lb_data_dir
| vt_lb_data_file
| vt_lb_data_in
Expand Down
3 changes: 3 additions & 0 deletions src/vt/configs/arguments/args.cc
Original file line number Diff line number Diff line change
Expand Up @@ -911,6 +911,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
auto lb_data = "Enable load balancing data";
auto lb_data_in = "Enable load balancing data input";
auto lb_data_comp = "Compress load balancing data output with brotli";
auto lb_data_hist = "Minimal number of historical LB data phases to retain";
auto lb_data_dir = "Load balancing data output directory";
auto lb_data_file = "Load balancing data output file name";
auto lb_data_dir_in = "Load balancing data input directory";
Expand All @@ -934,6 +935,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
auto ww = app.add_flag("--vt_lb_data", appConfig.vt_lb_data, lb_data);
auto za = app.add_flag("--vt_lb_data_in", appConfig.vt_lb_data_in, lb_data_in);
auto xz = app.add_flag("--vt_lb_data_compress", appConfig.vt_lb_data_compress, lb_data_comp);
auto dr = app.add_option("--vt_lb_data_retention", appConfig.vt_lb_data_retention, lb_data_hist);
auto wx = app.add_option("--vt_lb_data_dir", appConfig.vt_lb_data_dir, lb_data_dir)->capture_default_str();
auto wy = app.add_option("--vt_lb_data_file", appConfig.vt_lb_data_file, lb_data_file)->capture_default_str();
auto xx = app.add_option("--vt_lb_data_dir_in", appConfig.vt_lb_data_dir_in, lb_data_dir_in)->capture_default_str();
Expand Down Expand Up @@ -967,6 +969,7 @@ void addLbArgs(CLI::App& app, AppConfig& appConfig) {
xx->group(debugLB);
xy->group(debugLB);
xz->group(debugLB);
dr->group(debugLB);
yx->group(debugLB);
yy->group(debugLB);
yz->group(debugLB);
Expand Down
23 changes: 14 additions & 9 deletions src/vt/elm/elm_lb_data.cc
Original file line number Diff line number Diff line change
Expand Up @@ -195,16 +195,23 @@ void ElementLBData::updatePhase(PhaseType const& inc) {
}

void ElementLBData::resetPhase() {
// This method will become obsolete once VT gains full restart capability,
// allowing it to load all necessary data (like PhaseManager state, NodeLBData, etc.) from a checkpoint.

cur_phase_ = fst_lb_phase;
// Resets the current phase in the containers.
phase_timings_.restartFrom(fst_lb_phase);
subphase_timings_.restartFrom(fst_lb_phase);
phase_comm_.restartFrom(fst_lb_phase);
subphase_comm_.restartFrom(fst_lb_phase);
}

PhaseType ElementLBData::getPhase() const {
return cur_phase_;
}

LoadType ElementLBData::getLoad(PhaseType const& phase) const {
auto iter = phase_timings_.find(phase);
if (iter != phase_timings_.end()) {
if (phase_timings_.contains(phase)) {
auto const total_load = phase_timings_.at(phase);

vt_debug_print(
Expand Down Expand Up @@ -276,13 +283,11 @@ SubphaseType ElementLBData::getSubPhase() const {
return cur_subphase_;
}

void ElementLBData::releaseLBDataFromUnneededPhases(PhaseType phase, unsigned int look_back) {
if (phase >= look_back) {
phase_timings_.erase(phase - look_back);
subphase_timings_.erase(phase - look_back);
phase_comm_.erase(phase - look_back);
subphase_comm_.erase(phase - look_back);
}
void ElementLBData::setHistoryCapacity(unsigned int hist_lb_data_count) {
phase_timings_.resize(hist_lb_data_count);
subphase_timings_.resize(hist_lb_data_count);
phase_comm_.resize(hist_lb_data_count);
subphase_comm_.resize(hist_lb_data_count);
}

std::size_t ElementLBData::getLoadPhaseCount() const {
Expand Down
18 changes: 9 additions & 9 deletions src/vt/elm/elm_lb_data.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
#include "vt/elm/elm_id.h"
#include "vt/elm/elm_comm.h"
#include "vt/timing/timing.h"
#include "vt/utils/container/circular_phases_buffer.h"

namespace vt { namespace vrt { namespace collection { namespace balance {

Expand Down Expand Up @@ -122,24 +123,23 @@ struct ElementLBData {
static const constexpr SubphaseType no_subphase =
std::numeric_limits<SubphaseType>::max();

protected:
/**
* \internal \brief Release LB data from phases prior to lookback
* \brief Resize internal buffers
*
* \param[in] hist_lb_data_count the requested buffers capacity
*/
void releaseLBDataFromUnneededPhases(PhaseType phase, unsigned int look_back);

friend struct vrt::collection::balance::NodeLBData;
void setHistoryCapacity(unsigned int hist_lb_data_count);

protected:
bool cur_time_started_ = false;
TimeType cur_time_ = TimeType{0.0};
PhaseType cur_phase_ = fst_lb_phase;
std::unordered_map<PhaseType, LoadType> phase_timings_ = {};
std::unordered_map<PhaseType, CommMapType> phase_comm_ = {};
util::container::CircularPhasesBuffer<LoadType> phase_timings_ = {};
util::container::CircularPhasesBuffer<CommMapType> phase_comm_ = {};

SubphaseType cur_subphase_ = 0;
std::unordered_map<PhaseType, std::vector<LoadType>> subphase_timings_ = {};
std::unordered_map<PhaseType, std::vector<CommMapType>> subphase_comm_ = {};
util::container::CircularPhasesBuffer<std::vector<LoadType>> subphase_timings_ = {};
util::container::CircularPhasesBuffer<std::vector<CommMapType>> subphase_comm_ = {};
};

}} /* end namespace vt::elm */
Expand Down
Loading