Feature 280 lb sim output only #281

lifflander · 2019-02-21T23:35:17Z

This pull request adds functionality to VT that enables it to dump load information to file. This includes only the computation stats so far (not the communication graph).

It adds 3 new optional arguments to enable (opt-in):
mpirun -n 4 ./examples/lb_iter --vt_lb_stats --vt_lb_stats_dir=my-stats-dir --vt_lb_stats_file=stats

This creates the tree:

<current-dir>
    my-stats-dir
        stats.0.out
        stats.1.out
        stats.2.out
        stats.3.out

Each file in CSV follows this format:
<iter/phase>, <distributed-unique-object-id>, <time-in-seconds>

0,68719476736,0.023175
0,64424509440,0.0229769
0,60129542144,0.0236919
0,55834574848,0.022301
0,51539607552,0.0231979
0,47244640256,0.0263572
0,42949672960,0.024205
0,38654705664,0.029372
0,34359738368,3.79086e-05
0,30064771072,3.21865e-05
0,25769803776,3.19481e-05
0,21474836480,3.60012e-05
0,17179869184,3.19481e-05
0,12884901888,3.31402e-05
0,8589934592,3.69549e-05
0,4294967296,6.10352e-05
1,68719476736,0.0237269
1,64424509440,0.022706
1,60129542144,0.0231159
1,55834574848,0.023596
1,51539607552,0.0259681
1,47244640256,0.0247791
1,42949672960,0.0229402
1,38654705664,0.0230789
1,34359738368,3.79086e-05
1,30064771072,2.5034e-05
1,25769803776,2.47955e-05
1,21474836480,2.38419e-05
1,17179869184,4.60148e-05
1,12884901888,2.40803e-05
1,8589934592,4.1008e-05
1,4294967296,2.71797e-05

PhilMiller · 2019-02-22T21:50:22Z

src/vt/vrt/collection/balance/proc_stats.cc

+
+/*static*/ void ProcStats::closeStatsFile() {
+  if (stats_file_) {
+    fclose(stats_file_);


I think you need to also assign nullptr here, or else that condition can trigger multiple times, even though the function superficially looks idempotent

Addressed problem

PhilMiller · 2019-02-22T21:52:17Z

src/vt/vrt/collection/balance/proc_stats.h

@@ -72,6 +73,10 @@ struct ProcStats {
  static void clearStats();
  static void releaseLB();

+  static void createStatsFile();
+  static void outputStatsFile();
+  static void closeStatsFile();


If createStatsFile and closeStatsFile are going to be public, then outputStatsFile probably shouldn't be opening and (more importantly) closing it automatically. Someone might call open manually, and the output might close it on them.

Addressed by making {close,create}StatsFile private

PhilMiller · 2019-02-22T22:00:36Z

src/vt/vrt/collection/balance/proc_stats.cc

+  // Barrier: wait for node 0 to create directory before trying to put a file in
+  // the stats destination directory
+  if (curRT) {
+    curRT->systemSync();


It looks like that calls sync(), but doesn't also call a barrier?

sync() is a barrier under the hood.

I see - I read that as the POSIX sync(), not a call to the sibling member function.

ppebay

2 questions here to make sure I fully understand here:

Each of these files provides a lit of task objects and assorted timings, varying over time. For example:
0,68719476736,0.023175
...
1,68719476736,0.0237269
in the first output file means that the same object (with unique ID 68719476736) resides on the same processor during steps 0 and 1, and that it the compute time of that object varied between those steps.
The LB operations would happen between the two steps. Therefore, if this object (with unique ID 68719476736) had been offloaded from say, processor 0 to processor 2 during the LB stage, its time information during step 2:
1,68719476736,0.0237269
would no longer be found in the same file output-0.csv, but in output-2.csv.

Is this a correct understanding?

lifflander · 2019-02-23T00:32:00Z

Philippe,

You are exactly correct on both counts. The benchmark I'm running does approximately the same work for each iteration (just a micro benchmark for LB), but it varies a little. I'm always dumping the objects that are local to each processor and the object ID is unique for the duration of the program so it can be cross identified regardless of what processor it is mapped to.

lifflander · 2019-02-23T00:33:39Z

@ppebay Do you have a suggestion for the format when I add the communication edges to the output? Or do you want to convert this to an Exodus format before I do that?

ppebay · 2019-02-23T01:04:10Z

Thanks for confirming, this information layout is perfect for the "load-only" algorithm.

Regarding edges, here is how I understand things: the primary quantities in VT are object-to-object links. However the ones that we would take into account for LB purposes would be processor-to-processor links (it is presumably reasonable to assume that if object A talks to object B but on the same processor, this has 0 communication cost -- at least as a first-order approximation), whereas if the latter resides on process M and the latter on processor N, then link A->B contributes to link M->N which is the one that we want to consider. Other objects communicating similarly would also contribute to link M->N, which can thus be viewed as an aggregate -- in other words, a derived quantity.

So, the question thus naturally becomes: do we want to (1) store the object to object links (in which case these will be aggregated into processor to processor links at ingestion time when the LB phase starts), or, alternatively, (2) compute those statistics inside VT and dump the already computed aggregates?

In terms of cost, although it does not cost more to tally the costs before than reading them, approach (1) would result in much larger output files as many more links would be stored. That would in turn result in longer write/read times. So, the only advantage that I can see in approach (1) with respect to (2) would be to preserve the primary information (the object to object traffic) as opposed to the derived aggregates, but I fail to see how that could be useful for the methodology we are considering.

So, I am therefore suggesting that VT tallies the object-to-object communication, per-processor, and store that aggregate information after the object lines, as follows:
<iter/phase>, ,
We could also store the incoming volume, per processor, but those should all be implicit. For instance
if there were only processors 0 and 1 participating in the run at iteration 0, with 0 sending 1B to 1, and receiving 2B from it during that iteration, we would have the following line at the end of stats.0.csv:
0,1,1
and the following line at the end of stats.1.csv:
0,0,2
So, although only outflows would be explicit, (0 sends 1 to 1; 1 sends 2 to 0), the entire traffic information would be implicitly available because the former directly implies (0 receives 2 from 1; 1 receives 1 from 0).
I thus think that there is no point in saving the bi-directional traffic. We can decide to save either out- or inflow, in order to reduce I/O time.

PhilMiller · 2019-02-25T20:13:34Z

In modeling or simulating how each object's communication contributes to the load it represents, I do think we need to record the object-level communication graph, not the contracted process-level graph that arises from it.

When we evaluate the hypothetical re-assignment of an object to a different processor, we need to consider the change in overall communication resulting from moving that object. So, we would at least need to know what processes each object's communication involved.

When we evaluate re-assignment of multiple objects, then the information of what processes a given object was communicating with is no longer informative, because some of the communication partners may also have moved. So, I conclude that we need full object-level communication records to be available to the load balancing strategies, and any simulations thereof

ppebay · 2019-02-25T20:57:21Z

@PhilMiller Agreed.
Sorry for not having mentioned it earlier here, but Jonathan and I spoke about it this weekend and yes indeed we need to keep track of the object-to-object communications. The expectation is that for real cases it will not result in a quadratic or worse increase of output file size.
Thank you.

PhilMiller · 2019-02-25T21:34:35Z

Ah, good to hear. Indeed, for most reasonable applications, each object is only expected to interact with a small, often constant, number of other objects, so there would be only a small multiplicative factor increase in the footprint of the object graph data.

ppebay · 2019-02-25T23:55:56Z

So, continuing with the object-to-object communication: what does the weight (scalar) associated with an edge represent? It is at a flow through the link? Or rather the amount of data that must be transferred between the two end-points?

lifflander · 2019-02-26T02:51:35Z

The weight (scalar) for object-to-object communication can be any normalized scalar, but I think that bytes makes the most sense because it's easily compatible with information we obtain about the network (if/when we add topology as a factor).

What do you think of the following format?

<iter/phase>, <object-id>, <time-in-seconds>, [<comm-object-id>, <comm-bytes>]...

Each line will have the object, time and then a series of objects it communicates, which may be empty. Each line will be variable in length.

Or we could but this on separate lines, so there would be two different formats. If there are 4 elements on a line it would imply communication, else it would be a computation line.

<iter/phase>, <object-id>, <time-in-seconds>
<iter/phase>, <object-id1>, <object-id2>, <num-bytes>

ppebay · 2019-02-26T23:29:48Z

I have a preference for the latter because it is closer to what we could easily map into an ExodusII format via the VTK I/O facilities; in our case, 3 elements on a line imply a node (vertex), and 4 an edge (line). In fact with just a few extra boilerplate lines in the file it could be used as-in in the old-style ASCII VTK readers. So, I am voting for the latter. TY.

lifflander · 2019-02-27T00:54:19Z

@ppebay Ok, I will implement with separate lines...that also makes it easy to ignore communication edges if you choose to do so. I am about 80% done with the implementation on #283

lifflander requested review from PhilMiller, mperrinel, ppebay and hobywan February 21, 2019 23:35

PhilMiller reviewed Feb 22, 2019

View reviewed changes

ppebay reviewed Feb 23, 2019

View reviewed changes

lifflander added 13 commits February 26, 2019 17:15

#280: lb: add new args for outputting stats for offline analysis

72fc80c

#280: runtime: add startup info about LB stats based on args

a67292e

#280: vrt coll: add ready for by-element tracking

7274229

#280: vrt coll: add overload for findElmHolder

8f67159

#280: vrt coll: enhance comment

d647350

#280: vrt coll: add LB statistics output

32fbff3

#280: runtime: add "system" sync method for advanced uses

3b98b15

#280: lb: create directory for stats

1c8dcaa

#280: lb: make LB stat element IDs persist

c10329c

#280: lb: conditionalize the stats output on LB feature

eb4bb3c

#280: lb: add missing serialization for element stats

64ce787

#280: lb: make {create,close}StatsFile private

31ad1d6

#280: lb: set stats_file_ to nullptr after close

ea1e42a

lifflander force-pushed the feature-280-lb-sim-output-only branch from d2004a4 to ea1e42a Compare February 27, 2019 03:04

lifflander merged commit 82fb410 into develop Feb 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature 280 lb sim output only #281

Feature 280 lb sim output only #281

lifflander commented Feb 21, 2019

PhilMiller Feb 22, 2019

lifflander Feb 27, 2019

PhilMiller Feb 22, 2019

lifflander Feb 27, 2019

PhilMiller Feb 22, 2019

lifflander Feb 27, 2019

PhilMiller Feb 27, 2019

ppebay left a comment

lifflander commented Feb 23, 2019

lifflander commented Feb 23, 2019

ppebay commented Feb 23, 2019 •

edited

Loading

PhilMiller commented Feb 25, 2019

ppebay commented Feb 25, 2019

PhilMiller commented Feb 25, 2019

ppebay commented Feb 25, 2019

lifflander commented Feb 26, 2019

ppebay commented Feb 26, 2019

lifflander commented Feb 27, 2019

Feature 280 lb sim output only #281

Feature 280 lb sim output only #281

Conversation

lifflander commented Feb 21, 2019

PhilMiller Feb 22, 2019

Choose a reason for hiding this comment

lifflander Feb 27, 2019

Choose a reason for hiding this comment

PhilMiller Feb 22, 2019

Choose a reason for hiding this comment

lifflander Feb 27, 2019

Choose a reason for hiding this comment

PhilMiller Feb 22, 2019

Choose a reason for hiding this comment

lifflander Feb 27, 2019

Choose a reason for hiding this comment

PhilMiller Feb 27, 2019

Choose a reason for hiding this comment

ppebay left a comment

Choose a reason for hiding this comment

lifflander commented Feb 23, 2019

lifflander commented Feb 23, 2019

ppebay commented Feb 23, 2019 • edited Loading

PhilMiller commented Feb 25, 2019

ppebay commented Feb 25, 2019

PhilMiller commented Feb 25, 2019

ppebay commented Feb 25, 2019

lifflander commented Feb 26, 2019

ppebay commented Feb 26, 2019

lifflander commented Feb 27, 2019

ppebay commented Feb 23, 2019 •

edited

Loading