Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1265 lb data replay without collection #1720

Merged
merged 41 commits into from
Nov 9, 2023

Conversation

nlslatt
Copy link
Collaborator

@nlslatt nlslatt commented Mar 25, 2022

This PR allows replaying LB data (workloads) read from json files. It does not correctly handle communications and thus will not work with comm-aware load models or load balancers. To handle communications correctly, a vt refactor where communications are treated analogously to load data by the ProposedReassignment load model and the LBManager would likely be the best approach.

Closes #1265

@nlslatt nlslatt force-pushed the 1265-stats-replay-without-collection branch 5 times, most recently from 855587d to a765426 Compare March 25, 2022 18:53
@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (clang-3.9, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (gcc-5, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (gcc-7, ubuntu, mpich, trace runtime, LB)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@codecov
Copy link

codecov bot commented Mar 25, 2022

Codecov Report

Merging #1720 (fa6c1bc) into develop (7be56f2) will decrease coverage by 1.50%.
Report is 201 commits behind head on develop.
The diff coverage is 94.43%.

❗ Current head fa6c1bc differs from pull request most recent head 794299d. Consider uploading reports for the commit 794299d to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1720      +/-   ##
===========================================
- Coverage    85.48%   83.98%   -1.50%     
===========================================
  Files          722      776      +54     
  Lines        25907    27297    +1390     
===========================================
+ Hits         22146    22925     +779     
- Misses        3761     4372     +611     
Files Coverage Δ
src/vt/configs/arguments/app_config.h 100.00% <ø> (ø)
src/vt/configs/arguments/args.cc 95.37% <100.00%> (+0.80%) ⬆️
src/vt/runtime/runtime_banner.cc 64.85% <ø> (+1.95%) ⬆️
src/vt/vrt/collection/balance/baselb/baselb.h 100.00% <ø> (ø)
.../vt/vrt/collection/balance/lb_invoke/lb_manager.cc 76.52% <100.00%> (-4.40%) ⬇️
...c/vt/vrt/collection/balance/lb_invoke/lb_manager.h 100.00% <ø> (ø)
src/vt/vrt/collection/balance/model/raw_data.cc 100.00% <100.00%> (ø)
src/vt/vrt/collection/balance/workload_replay.h 100.00% <100.00%> (ø)
...sts/unit/collection/test_workload_data_migrator.cc 100.00% <100.00%> (ø)
src/vt/vrt/collection/balance/workload_replay.cc 84.21% <84.21%> (ø)

... and 491 files with indirect coverage changes

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (clang-5.0, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (gcc-6, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (clang-9, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (nvidia cuda 10.1, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (clang-10, alpine, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Mar 25, 2022

PR tests (clang-10, ubuntu, mpich)

Build for cd90005

Compilation - successful

Testing - passed

Build log

@nlslatt nlslatt marked this pull request as ready for review March 25, 2022 20:52
@nlslatt
Copy link
Collaborator Author

nlslatt commented Mar 25, 2022

@ppebay I would love to run one of your (load-only) toy problems through this if you can get it into exactly the right json format. Until some changes are made to the json reader, the data effectively needs to be dumped by vt itself (so that the special bits of the object IDs are consistent with vt's usage).

@nlslatt nlslatt marked this pull request as draft March 29, 2022 18:03
@nlslatt nlslatt force-pushed the 1265-stats-replay-without-collection branch from af9ad32 to 794299d Compare November 9, 2023 18:46
@nlslatt nlslatt requested a review from lifflander November 9, 2023 18:46
auto json = r.readFile();
auto sd = std::make_shared<LBDataHolder>(*json);

for (auto &phase_data : sd->node_data_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor improvement would be to conditionalize these loops on if (theConfig()->vt_debug_replay) {

@lifflander lifflander merged commit 65e7ebc into develop Nov 9, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement general test driver for LB testing
2 participants