Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1659: Improve communication statistics in VT #1993

Merged
merged 14 commits into from
Dec 14, 2022

Conversation

cz4rs
Copy link
Contributor

@cz4rs cz4rs commented Oct 12, 2022

fixes #1659

@github-actions
Copy link

github-actions bot commented Oct 12, 2022

Pipelines results

PR tests (gcc-12, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-3.9, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-5, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

FAILED: src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx.o 
/usr/bin/ccache /usr/lib/ccache/g++ -DJSON_USE_IMPLICIT_CONVERSIONS=1 -DVT_NO_COLOR_ENABLED -I/vt/lib/CLI -Irelease -I/vt/src -I/vt/lib/json/include -I/vt/lib/brotli/c/include -I/vt/lib/libfort/lib -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /build/checkpoint/install/include -isystem /build/detector/install/include -O3 -DNDEBUG -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -ftemplate-backtrace-limit=100 -Werror -Wno-unused-variable -fPIC -fopenmp -std=c++14 -MD -MT src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx.o -MF src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx.o.d -o src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx.o -c src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx
In file included from /usr/include/c++/5/bits/hashtable.h:35:0,
                 from /usr/include/c++/5/unordered_map:47,
                 from /vt/src/vt/handler/handler.h:48,
                 from /vt/src/vt/handler/handler.cc:44,
                 from src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx:3:
/usr/include/c++/5/bits/hashtable_policy.h: In instantiation of 'struct std::__detail::__is_noexcept_hash<vt::vrt::collection::balance::LBType, std::hash<vt::vrt::collection::balance::LBType> >':
/usr/include/c++/5/type_traits:137:12:   required from 'struct std::__and_<std::__is_fast_hash<std::hash<vt::vrt::collection::balance::LBType> >, std::__detail::__is_noexcept_hash<vt::vrt::collection::balance::LBType, std::hash<vt::vrt::collection::balance::LBType> > >'
/usr/include/c++/5/type_traits:148:38:   required from 'struct std::__not_<std::__and_<std::__is_fast_hash<std::hash<vt::vrt::collection::balance::LBType> >, std::__detail::__is_noexcept_hash<vt::vrt::collection::balance::LBType, std::hash<vt::vrt::collection::balance::LBType> > > >'
/usr/include/c++/5/bits/unordered_map.h:100:66:   required from 'class std::unordered_map<vt::vrt::collection::balance::LBType, std::__cxx11::basic_string<char> >'
/vt/src/vt/vrt/collection/balance/read_lb.h:123:36:   required from here
/usr/include/c++/5/bits/hashtable_policy.h:85:34: error: no match for call to '(const std::hash<vt::vrt::collection::balance::LBType>) (const vt::vrt::collection::balance::LBType&)'
  noexcept(declval<const _Hash&>()(declval<const _Key&>()))>
                                  ^
In file included from /usr/include/c++/5/bits/move.h:57:0,
                 from /usr/include/c++/5/bits/stl_pair.h:59,
                 from /usr/include/c++/5/bits/stl_algobase.h:64,
                 from /usr/include/c++/5/vector:60,
                 from /vt/src/vt/handler/handler.h:47,
                 from /vt/src/vt/handler/handler.cc:44,
                 from src/CMakeFiles/vt.dir/Unity/unity_20_cxx.cxx:3:
/usr/include/c++/5/type_traits: In instantiation of 'struct std::__not_<std::__and_<std::__is_fast_hash<st%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-7, ubuntu, mpich, trace runtime, LB)

Build for 317f055 (2022-11-09 20:30:46 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-5.0, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-9, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-6, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, alpine, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-11, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (intel icpx, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-12, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 10.1, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-14, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-11, ubuntu, mpich, json schema test)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-10, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (intel icpc, ubuntu, mpich)

Build for 634004e (2022-11-14 20:53:30 UTC)

Compilation - successful

Testing - passed

Build log


@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch 4 times, most recently from ee0a4b1 to 7ef11b7 Compare October 17, 2022 10:28
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from bd63f4f to 4d2b25d Compare November 8, 2022 16:11
@cz4rs
Copy link
Contributor Author

cz4rs commented Nov 9, 2022

Sample output generated with:

mpiexec -n 2 ./build/tests/collection_extended --gtest_filter=LoadBalancerExplodeOther/TestLoadBalancerOther.test_load_balancer_other_1/4
expand JSON
{
    "type": "LBStatsfile",
    "phases": [
        {
            "id": 0,
            "pre-LB": {
                "Object_comm": {
                    "avg": 466.0,
                    "car": 4.0,
                    "imb": 0.7682403433476395,
                    "kur": -2.4373563791593496,
                    "max": 824.0,
                    "min": 112.0,
                    "npr": 4.0,
                    "skw": 0.00012438128147181126,
                    "std": 354.01129925469894,
                    "sum": 1864.0,
                    "var": 125324.0
                },
                "Object_load_modeled": {
                    "avg": 4.64694857068285e-05,
                    "car": 70.0,
                    "imb": 8.919304963295684,
                    "kur": 28.781199587775358,
                    "max": 0.0004609450002135418,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 4.959831982971411,
                    "std": 6.113038713175178e-05,
                    "sum": 0.003252863999477995,
                    "var": 3.736924230877844e-09
                },
                "Object_load_raw": {
                    "avg": 4.64694857068285e-05,
                    "car": 70.0,
                    "imb": 8.919304963295684,
                    "kur": 28.781199587775358,
                    "max": 0.0004609450002135418,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 4.959831982971411,
                    "std": 6.113038713175178e-05,
                    "sum": 0.003252863999477995,
                    "var": 3.736924230877844e-09
                },
                "Rank_comm": {
                    "avg": 932.0,
                    "car": 2.0,
                    "imb": 0.0042918454935623185,
                    "kur": -2.75,
                    "max": 936.0,
                    "min": 928.0,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 4.0,
                    "sum": 1864.0,
                    "var": 16.0
                },
                "Rank_load_modeled": {
                    "avg": 0.0016264319997389975,
                    "car": 2.0,
                    "imb": 0.30121517548248056,
                    "kur": -2.75,
                    "max": 0.0021163379999507015,
                    "min": 0.0011365259995272936,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0004899060002117039,
                    "sum": 0.003252863999477995,
                    "var": 2.400078890434301e-07
                },
                "Rank_load_raw": {
                    "avg": 0.0016264319997389975,
                    "car": 2.0,
                    "imb": 0.30121517548248056,
                    "kur": -2.75,
                    "max": 0.0021163379999507015,
                    "min": 0.0011365259995272936,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0004899060002117039,
                    "sum": 0.003252863999477995,
                    "var": 2.400078890434301e-07
                }
            }
        },
        {
            "id": 1,
            "migration count": 8,
            "post-LB": {
                "Object_comm": {
                    "avg": 840.0,
                    "car": 4.0,
                    "imb": 0.06666666666666665,
                    "kur": -2.4375,
                    "max": 896.0,
                    "min": 784.0,
                    "npr": 4.0,
                    "skw": 0.0,
                    "std": 56.0,
                    "sum": 3360.0,
                    "var": 3136.0
                },
                "Object_load_modeled": {
                    "avg": 5.6572871423148694e-05,
                    "car": 70.0,
                    "imb": 4.605796416967453,
                    "kur": 17.54840783783443,
                    "max": 0.00031713599992144736,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 3.3960689438183804,
                    "std": 4.28437352241483e-05,
                    "sum": 0.003960100999620408,
                    "var": 1.835585647956926e-09
                },
                "Object_load_raw": {
                    "avg": 5.6572871423148694e-05,
                    "car": 70.0,
                    "imb": 4.605796416967453,
                    "kur": 17.54840783783443,
                    "max": 0.00031713599992144736,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 3.3960689438183804,
                    "std": 4.28437352241483e-05,
                    "sum": 0.003960100999620408,
                    "var": 1.835585647956926e-09
                },
                "Object_work_modeled": {
                    "avg": 5.6572871423148694e-05,
                    "car": 70.0,
                    "imb": 4.605796416967453,
                    "kur": 17.54840783783442,
                    "max": 0.00031713599992144736,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 3.396068943818379,
                    "std": 4.284373522414831e-05,
                    "sum": 0.003960100999620408,
                    "var": 1.8355856479569265e-09
                },
                "Rank_comm": {
                    "avg": 1680.0,
                    "car": 2.0,
                    "imb": 0.0,
                    "kur": 0.0,
                    "max": 1680.0,
                    "min": 1680.0,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0,
                    "sum": 3360.0,
                    "var": 0.0
                },
                "Rank_load_modeled": {
                    "avg": 0.001980050499810204,
                    "car": 2.0,
                    "imb": 0.0024769568420375254,
                    "kur": -2.75,
                    "max": 0.001984954999443289,
                    "min": 0.0019751460001771193,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 4.9044996330849244e-06,
                    "sum": 0.003960100999620408,
                    "var": 2.4054116650930158e-11
                },
                "Rank_load_raw": {
                    "avg": 0.001980050499810204,
                    "car": 2.0,
                    "imb": 0.0024769568420375254,
                    "kur": -2.75,
                    "max": 0.001984954999443289,
                    "min": 0.0019751460001771193,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 4.9044996330849244e-06,
                    "sum": 0.003960100999620408,
                    "var": 2.4054116650930158e-11
                },
                "Rank_work_modeled": {
                    "avg": 0.001980050499810204,
                    "car": 2.0,
                    "imb": 0.26022391825216284,
                    "kur": -2.75,
                    "max": 0.002495306999207969,
                    "min": 0.0014647940004124393,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0005152564993977649,
                    "sum": 0.003960100999620408,
                    "var": 2.6548926017163883e-07
                }
            },
            "pre-LB": {
                "Object_comm": {
                    "avg": 840.0,
                    "car": 4.0,
                    "imb": 0.06666666666666665,
                    "kur": -2.4375,
                    "max": 896.0,
                    "min": 784.0,
                    "npr": 4.0,
                    "skw": 0.0,
                    "std": 56.0,
                    "sum": 3360.0,
                    "var": 3136.0
                },
                "Object_load_modeled": {
                    "avg": 5.6572871423148694e-05,
                    "car": 70.0,
                    "imb": 4.605796416967453,
                    "kur": 17.54840783783442,
                    "max": 0.00031713599992144736,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 3.396068943818379,
                    "std": 4.284373522414831e-05,
                    "sum": 0.003960100999620408,
                    "var": 1.8355856479569265e-09
                },
                "Object_load_raw": {
                    "avg": 5.6572871423148694e-05,
                    "car": 70.0,
                    "imb": 4.605796416967453,
                    "kur": 17.54840783783442,
                    "max": 0.00031713599992144736,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 3.396068943818379,
                    "std": 4.284373522414831e-05,
                    "sum": 0.003960100999620408,
                    "var": 1.8355856479569265e-09
                },
                "Rank_comm": {
                    "avg": 1680.0,
                    "car": 2.0,
                    "imb": 0.0,
                    "kur": 0.0,
                    "max": 1680.0,
                    "min": 1680.0,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0,
                    "sum": 3360.0,
                    "var": 0.0
                },
                "Rank_load_modeled": {
                    "avg": 0.001980050499810204,
                    "car": 2.0,
                    "imb": 0.26022391825216284,
                    "kur": -2.75,
                    "max": 0.002495306999207969,
                    "min": 0.0014647940004124393,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0005152564993977649,
                    "sum": 0.003960100999620408,
                    "var": 2.6548926017163883e-07
                },
                "Rank_load_raw": {
                    "avg": 0.001980050499810204,
                    "car": 2.0,
                    "imb": 0.26022391825216284,
                    "kur": -2.75,
                    "max": 0.002495306999207969,
                    "min": 0.0014647940004124393,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0005152564993977649,
                    "sum": 0.003960100999620408,
                    "var": 2.6548926017163883e-07
                }
            }
        },
(...)
        {
            "id": 9,
            "migration count": 0,
            "post-LB": {
                "Object_comm": {
                    "avg": 4632.0,
                    "car": 4.0,
                    "imb": 0.07081174438687388,
                    "kur": -2.4375,
                    "max": 4960.0,
                    "min": 4304.0,
                    "npr": 4.0,
                    "skw": 0.0,
                    "std": 328.0,
                    "sum": 18528.0,
                    "var": 107584.0
                },
                "Object_load_modeled": {
                    "avg": 8.89744285359484e-05,
                    "car": 70.0,
                    "imb": 17.76672912338164,
                    "kur": 34.416259845604124,
                    "max": 0.0016697589992418216,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 5.845439578625132,
                    "std": 0.00022748406793370163,
                    "sum": 0.006228209997516387,
                    "var": 5.1749001163664976e-08
                },
                "Object_load_raw": {
                    "avg": 8.89744285359484e-05,
                    "car": 70.0,
                    "imb": 17.76672912338164,
                    "kur": 34.416259845604124,
                    "max": 0.0016697589992418216,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 5.845439578625132,
                    "std": 0.00022748406793370163,
                    "sum": 0.006228209997516387,
                    "var": 5.1749001163664976e-08
                },
                "Object_work_modeled": {
                    "avg": 8.89744285359484e-05,
                    "car": 70.0,
                    "imb": 17.76672912338164,
                    "kur": 34.416259845604124,
                    "max": 0.0016697589992418216,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 5.845439578625132,
                    "std": 0.00022748406793370163,
                    "sum": 0.006228209997516387,
                    "var": 5.1749001163664976e-08
                },
                "Rank_comm": {
                    "avg": 9264.0,
                    "car": 2.0,
                    "imb": 0.0,
                    "kur": 0.0,
                    "max": 9264.0,
                    "min": 9264.0,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0,
                    "sum": 18528.0,
                    "var": 0.0
                },
                "Rank_load_modeled": {
                    "avg": 0.0031141049987581937,
                    "car": 2.0,
                    "imb": 0.002597214844879403,
                    "kur": -2.75,
                    "max": 0.0031221929984894814,
                    "min": 0.003106016999026906,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 8.08799973128771e-06,
                    "sum": 0.006228209997516387,
                    "var": 6.541573965331006e-11
                },
                "Rank_load_raw": {
                    "avg": 0.0031141049987581937,
                    "car": 2.0,
                    "imb": 0.002597214844879403,
                    "kur": -2.75,
                    "max": 0.0031221929984894814,
                    "min": 0.003106016999026906,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 8.08799973128771e-06,
                    "sum": 0.006228209997516387,
                    "var": 6.541573965331006e-11
                },
                "Rank_work_modeled": {
                    "avg": 0.0031141049987581937,
                    "car": 2.0,
                    "imb": 0.002597214844879403,
                    "kur": -2.75,
                    "max": 0.0031221929984894814,
                    "min": 0.003106016999026906,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 8.08799973128771e-06,
                    "sum": 0.006228209997516387,
                    "var": 6.541573965331006e-11
                }
            },
            "pre-LB": {
                "Object_comm": {
                    "avg": 4632.0,
                    "car": 4.0,
                    "imb": 0.07081174438687388,
                    "kur": -2.4375,
                    "max": 4960.0,
                    "min": 4304.0,
                    "npr": 4.0,
                    "skw": 0.0,
                    "std": 328.0,
                    "sum": 18528.0,
                    "var": 107584.0
                },
                "Object_load_modeled": {
                    "avg": 8.89744285359484e-05,
                    "car": 70.0,
                    "imb": 17.76672912338164,
                    "kur": 34.416259845604124,
                    "max": 0.0016697589992418216,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 5.845439578625132,
                    "std": 0.00022748406793370163,
                    "sum": 0.006228209997516387,
                    "var": 5.1749001163664976e-08
                },
                "Object_load_raw": {
                    "avg": 8.89744285359484e-05,
                    "car": 70.0,
                    "imb": 17.76672912338164,
                    "kur": 34.416259845604124,
                    "max": 0.0016697589992418216,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 5.845439578625132,
                    "std": 0.00022748406793370163,
                    "sum": 0.006228209997516387,
                    "var": 5.1749001163664976e-08
                },
                "Object_work_modeled": {
                    "avg": 8.758099999275665e-05,
                    "car": 70.0,
                    "imb": 17.66915196946732,
                    "kur": 35.73043645952325,
                    "max": 0.0016350629985026899,
                    "min": 0.0,
                    "npr": 66.0,
                    "skw": 5.931927884228807,
                    "std": 0.00021919888106694175,
                    "sum": 0.006130669999492966,
                    "var": 4.804814946099927e-08
                },
                "Rank_comm": {
                    "avg": 9264.0,
                    "car": 2.0,
                    "imb": 0.0,
                    "kur": 0.0,
                    "max": 9264.0,
                    "min": 9264.0,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.0,
                    "sum": 18528.0,
                    "var": 0.0
                },
                "Rank_load_modeled": {
                    "avg": 0.0031141049987581937,
                    "car": 2.0,
                    "imb": 0.002597214844879403,
                    "kur": -2.75,
                    "max": 0.0031221929984894814,
                    "min": 0.003106016999026906,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 8.08799973128771e-06,
                    "sum": 0.006228209997516387,
                    "var": 6.541573965331006e-11
                },
                "Rank_load_raw": {
                    "avg": 0.0031141049987581937,
                    "car": 2.0,
                    "imb": 0.002597214844879403,
                    "kur": -2.75,
                    "max": 0.0031221929984894814,
                    "min": 0.003106016999026906,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 8.08799973128771e-06,
                    "sum": 0.006228209997516387,
                    "var": 6.541573965331006e-11
                },
                "Rank_work_modeled": {
                    "avg": 0.003065334999746483,
                    "car": 2.0,
                    "imb": 0.07422451348080084,
                    "kur": -2.75,
                    "max": 0.003292857998758336,
                    "min": 0.0028378120007346297,
                    "npr": 2.0,
                    "skw": 0.0,
                    "std": 0.00022752299901185324,
                    "sum": 0.006130669999492966,
                    "var": 5.176671507934777e-08
                }
            }
        }
    ]
}

@cz4rs cz4rs marked this pull request as ready for review November 9, 2022 20:30
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from 041dc16 to 317f055 Compare November 9, 2022 20:30
@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Merging #1993 (fe6c671) into develop (3ac0077) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head fe6c671 differs from pull request most recent head 52d4808. Consider uploading reports for the commit 52d4808 to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1993      +/-   ##
===========================================
+ Coverage    84.45%   84.47%   +0.01%     
===========================================
  Files          732      728       -4     
  Lines        25843    25850       +7     
===========================================
+ Hits         21826    21837      +11     
+ Misses        4017     4013       -4     
Impacted Files Coverage Δ
src/vt/elm/elm_comm.h 89.74% <ø> (ø)
src/vt/vrt/collection/balance/baselb/baselb.h 100.00% <ø> (ø)
src/vt/vrt/collection/balance/greedylb/greedylb.h 100.00% <ø> (ø)
src/vt/vrt/collection/balance/lb_common.h 57.89% <ø> (-2.11%) ⬇️
...c/vt/vrt/collection/balance/lb_invoke/lb_manager.h 100.00% <ø> (ø)
...vrt/collection/balance/temperedwmin/temperedwmin.h 100.00% <ø> (ø)
tests/unit/lb/test_lbargs_enum_conv.nompi.cc 100.00% <ø> (ø)
src/vt/vrt/collection/balance/baselb/baselb.cc 95.14% <100.00%> (+0.14%) ⬆️
src/vt/vrt/collection/balance/lb_common.cc 78.72% <100.00%> (ø)
.../vt/vrt/collection/balance/lb_invoke/lb_manager.cc 80.00% <100.00%> (+0.38%) ⬆️
... and 13 more

@PhilMiller
Copy link
Member

I'm not a huge fan of this architecture, where strategy-specific stuff bleeds into the manager, but I understand the motivation well enough.

Two points about not presenting this as something that application developers should see or their code should call:

Instead of 'custom model', could we call it 'strategy specific model'?

It would be nice if setting this model were a private method that was only called from a friended method of BaseLB, that the derived strategy could call. Essentially, making BaseLB a small instance of an attorney pattern for LBManager as the client.

@cz4rs cz4rs marked this pull request as draft November 14, 2022 17:18
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from 317f055 to 46b8f88 Compare November 14, 2022 17:22
@cz4rs cz4rs marked this pull request as ready for review November 14, 2022 20:53
@cz4rs cz4rs requested a review from PhilMiller November 14, 2022 20:59
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from 634004e to 525c35e Compare November 28, 2022 09:51
@cz4rs cz4rs requested review from PhilMiller and removed request for jstrzebonski November 28, 2022 12:05
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from 525c35e to af22bed Compare November 29, 2022 09:33
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from af22bed to 04e2408 Compare November 30, 2022 17:58
Copy link
Collaborator

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great in general. I do have one concern but it's definitely open to discussion.

src/vt/vrt/collection/balance/lb_invoke/lb_manager.cc Outdated Show resolved Hide resolved
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from 04e2408 to f7c1611 Compare December 6, 2022 00:30
@cz4rs
Copy link
Contributor Author

cz4rs commented Dec 6, 2022

JSON schema validator fails (correctly) with Wrong keys 'Object_work_modeled', 'Rank_work_modeled' (...) error.

Running schema validator on: ./tests/vt_lb_statistics.2022-12-06-01-29-50.json.br
Validating file: /build/vt/tests/vt_lb_statistics.2022-12-06-01-29-50.json.br
Invalid JSON schema in /build/vt/tests/vt_lb_statistics.2022-12-06-01-29-50.json.br
[JSON_data_files_validator] SchemaError Key 'phases' error:
Or({'id': <class 'int'>, Optional('migration count'): <class 'int'>, Optional('post-LB'): {'Object_comm': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Object_load_modeled': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Object_load_raw': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Rank_comm': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Rank_load_modeled': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Rank_load_raw': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}}, 'pre-LB': {'Object_comm': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Object_load_modeled': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Object_load_raw': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Rank_comm': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Rank_load_modeled': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}, 'Rank_load_raw': {'avg': <class 'float'>, 'car': <class 'float'>, 'imb': <class 'float'>, 'kur': <class 'float'>, 'max': <class 'float'>, 'min': <class 'float'>, 'npr': <class 'float'>, 'skw': <class 'float'>, 'std': <class 'float'>, 'sum': <class 'float'>, 'var': <class 'float'>}}}) did not validate {'id': 1, 'migration count': 10, 'post-LB': {'Object_comm': {'avg': 840.0, 'car': 4.0, 'imb': 0.06666666666666665, 'kur': -2.4375, 'max': 896.0, 'min': 784.0, 'npr': 4.0, 'skw': 0.0, 'std': 56.0, 'sum': 3360.0, 'var': 3136.0}, 'Object_load_modeled': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269169, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226218, 'std': 8.071501322276251e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550728e-11}, 'Object_load_raw': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269169, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226218, 'std': 8.071501322276251e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550728e-11}, 'Object_work_modeled': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269172, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226216, 'std': 8.071501322276253e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550729e-11}, 'Rank_comm': {'avg': 1680.0, 'car': 2.0, 'imb': 0.0, 'kur': 0.0, 'max': 1680.0, 'min': 1680.0, 'npr': 2.0, 'skw': 0.0, 'std': 0.0, 'sum': 3360.0, 'var': 0.0}, 'Rank_load_modeled': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.0011763182995736532, 'kur': -2.75, 'max': 0.00042513000198596274, 'min': 0.0004241310007273569, 'npr': 2.0, 'skw': 0.0, 'std': 4.995006293029292e-07, 'sum': 0.0008492610027133196, 'var': 2.4950087867402225e-13}, 'Rank_load_raw': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.0011763182995736532, 'kur': -2.75, 'max': 0.00042513000198596274, 'min': 0.0004241310007273569, 'npr': 2.0, 'skw': 0.0, 'std': 4.995006293029292e-07, 'sum': 0.0008492610027133196, 'var': 2.4950087867402225e-13}, 'Rank_work_modeled': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.358218497683666, 'kur': -2.75, 'max': 0.0005767410016233043, 'min': 0.0002725200010900153, 'npr': 2.0, 'skw': 0.0, 'std': 0.00015211050026664452, 'sum': 0.0008492610027133196, 'var': 2.3137604291368863e-08}}, 'pre-LB': {'Object_comm': {'avg': 840.0, 'car': 4.0, 'imb': 0.06666666666666665, 'kur': -2.4375, 'max': 896.0, 'min': 784.0, 'npr': 4.0, 'skw': 0.0, 'std': 56.0, 'sum': 3360.0, 'var': 3136.0}, 'Object_load_modeled': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269172, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226216, 'std': 8.071501322276253e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550729e-11}, 'Object_load_raw': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269172, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226216, 'std': 8.071501322276253e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550729e-11}, 'Rank_comm': {'avg': 1680.0, 'car': 2.0, 'imb': 0.0, 'kur': 0.0, 'max': 1680.0, 'min': 1680.0, 'npr': 2.0, 'skw': 0.0, 'std': 0.0, 'sum': 3360.0, 'var': 0.0}, 'Rank_load_modeled': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.358218497683666, 'kur': -2.75, 'max': 0.0005767410016233043, 'min': 0.0002725200010900153, 'npr': 2.0, 'skw': 0.0, 'std': 0.00015211050026664452, 'sum': 0.0008492610027133196, 'var': 2.3137604291368863e-08}, 'Rank_load_raw': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.358218497683666, 'kur': -2.75, 'max': 0.0005767410016233043, 'min': 0.0002725200010900153, 'npr': 2.0, 'skw': 0.0, 'std': 0.00015211050026664452, 'sum': 0.0008492610027133196, 'var': 2.3137604291368863e-08}}}
Key 'post-LB' error:
Wrong keys 'Object_work_modeled', 'Rank_work_modeled' in {'Object_comm': {'avg': 840.0, 'car': 4.0, 'imb': 0.06666666666666665, 'kur': -2.4375, 'max': 896.0, 'min': 784.0, 'npr': 4.0, 'skw': 0.0, 'std': 56.0, 'sum': 3360.0, 'var': 3136.0}, 'Object_load_modeled': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269169, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226218, 'std': 8.071501322276251e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550728e-11}, 'Object_load_raw': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269169, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226218, 'std': 8.071501322276251e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550728e-11}, 'Object_work_modeled': {'avg': 1.213230003876171e-05, 'car': 70.0, 'imb': 3.080347517465974, 'kur': 4.413427772269172, 'max': 4.950400034431368e-05, 'min': 0.0, 'npr': 66.0, 'skw': 1.3208459347226216, 'std': 8.071501322276253e-06, 'sum': 0.0008492610027133196, 'var': 6.514913359550729e-11}, 'Rank_comm': {'avg': 1680.0, 'car': 2.0, 'imb': 0.0, 'kur': 0.0, 'max': 1680.0, 'min': 1680.0, 'npr': 2.0, 'skw': 0.0, 'std': 0.0, 'sum': 3360.0, 'var': 0.0}, 'Rank_load_modeled': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.0011763182995736532, 'kur': -2.75, 'max': 0.00042513000198596274, 'min': 0.0004241310007273569, 'npr': 2.0, 'skw': 0.0, 'std': 4.995006293029292e-07, 'sum': 0.0008492610027133196, 'var': 2.4950087867402225e-13}, 'Rank_load_raw': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.0011763182995736532, 'kur': -2.75, 'max': 0.00042513000198596274, 'min': 0.0004241310007273569, 'npr': 2.0, 'skw': 0.0, 'std': 4.995006293029292e-07, 'sum': 0.0008492610027133196, 'var': 2.4950087867402225e-13}, 'Rank_work_modeled': {'avg': 0.0004246305013566598, 'car': 2.0, 'imb': 0.358218497683666, 'kur': -2.75, 'max': 0.0005767410016233043, 'min': 0.0002725200010900153, 'npr': 2.0, 'skw': 0.0, 'std': 0.00015211050026664452, 'sum': 0.0008492610027133196, 'var': 2.3137604291368863e-08}}
+ echo 'Invalid schema in ./tests/vt_lb_statistics.2022-12-06-01-29-50.json.br.. exiting'

@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from e7ad6e6 to fe6c671 Compare December 6, 2022 18:43
@cz4rs cz4rs force-pushed the 1659-improve-lb-statistics branch from fe6c671 to 484a4c4 Compare December 13, 2022 16:26
@cz4rs cz4rs requested review from PhilMiller and nlslatt December 14, 2022 11:01
Copy link
Collaborator

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@nlslatt nlslatt merged commit 5d748ec into develop Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve communication statistics in VT
3 participants