-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add by_rank map to graph store #674
Conversation
Codecov Report
@@ Coverage Diff @@
## master #674 +/- ##
==========================================
+ Coverage 75.07% 75.09% +0.01%
==========================================
Files 78 78
Lines 8166 8220 +54
==========================================
+ Hits 6131 6173 +42
- Misses 2035 2047 +12
Continue to review full report at Codecov.
|
All checks are green and Codecov looks good. @dongahn you can give this a pass over when you're able, or I can add @SteVwonder as a reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milroy: this looks great.
- I have a couple of very minor in-line comments but I'm approving it without it.
- The current testing of using
resource-query
is limited as all vertices will be only mapped to one rank (-1). My suggestion would be to add one more simple test by modifyingsched-fluxion-resource
's stat interface andflux ion-resource
. But since a similar test can be added as we merge our PRs and you will be busy with other 2 PRs, I didn't know if that should be a part of this PR's scope. Please let me know your preference.
I've been poking at this and am struggling to deal with the need to iterate over multiple ranks. In the stat interface I could send multiple responses (one per rank to vector size association). That approach seems ugly and complex. Another possibility would be to iterate over the map keys and flatten the key-value pairs to a single string, and transmit that string in the response. What do you think @dongahn? |
I agreed. stat is to get the overall statistics and multiple responses would defeat the purpose.
This sounds good to me. Perhaps we can compactly flatten this using Doc for |
So the general way to do this is iterate through the ranks and populate an auxiliary map that's keyed by subgraph sizes, and takes vectors of ranks as values. Then I can iterate through the size keys and decode the vector of ranks with |
A more fundamental question is: how do I create a |
You can use hwloc reader with sched-fluxoin-resource (I think this is the default). As each hwloc files are unpacked into the resource graph, a distinct rank will be assigned. |
Yes. But instead of using vectors of ranks as values, you can use an "essentially" idsets as values to directly encode the rank set into idset during the initial iteration. I say "essentially" because using a c++ wrapper around struct idset will make memory management a bit easier. class idset_wrap_t {
public:
~idset_wrap_t() {
if (m_idset)
idset_destroy (m_idset);
}
const get_idset () const {
return m_idset;
}
void set_idset (struct idset *idset) {
if (idset)
m_idset = idset;
}
private:
struct idset *m_idset{nullptr};
};
std::map<size_t, idset_wrap_t> s2r; // subgraph size to ranks
for (auto &kv : by_rank) {
if (s2r.find (kv.second.size ()) == s2r.end ()) {
idset_wrap_t iw;
struct idset *ids = NULL;
if ( !(ids = idset_create (0, IDSET_FLAG_AUTOGROW))) {
error_check ()
}
iw.set_idset (ids);
s2r[kv.second.size ()] = iw;
}
if (idset_set (s2r[kv.second.size ()].get_idset (), kv.first) < 0)
error_check ();
}
// once the above iteration is done, each value of s2r
// has the fully encoded rank idset. The code not tested at all, though. |
@milroy: if you are busy working on other PRs, I can add this support. Let me know. |
@dongahn unfortunately the implementation is taking me longer than I expected. Can you add the support? Also, I think you will need to make some adjustments to class idset_wrap_t {
private:
struct idset *m_idset{nullptr};
public:
~idset_wrap_t() {
if (m_idset)
idset_destroy (m_idset);
}
const idset get_idset () {
return m_idset;
}
void set_idset (struct idset *idset) {
if (idset)
m_idset = idset;
}
}; |
Sure thing. I will get to this soon so that you can focus on the other PRs! |
d495453
to
b22cfd9
Compare
I thoughtlessly pulled your new commits @dongahn and pushed some new ones of my own. That added me as the committer to commits authored by you. Is there some way to fix this? |
@milroy, IMO it is fine if you are committer, this happens quite often. As long as authorship is preserved it is somewhat standard practice when cherry-picking as you are doing. If you really need to reset the committer, I think you can amend the commit with |
Don't worry about it. Also this didn't changed the author anyway. |
@milroy: your new changed LGTM. @SteVwonder: if you can take a quick look at this PR, this should be ready to go in. |
- update resource status via depth first traversal starting at a specified subtree root. The functions find the subtree's parent vertex and mark it black to avoid upward traversal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I did have one question below, but it is not a blocker to this getting merged. Feel free to set MWP after you squash the fixup commit.
|
||
test_expect_success 'qmanager: loading resource and qmanager modules works' ' | ||
flux module remove sched-simple && | ||
load_resource prune-filters=ALL:core subsystems=containment policy=low |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Mainly a note to myself)
We may need to add an whitelist/allowlist here to make this test work under both hwloc1 and hwloc2, since the number of resources under each execution target may differ (e.g., numanode
vs no numanode
). Since this PR is on the critical path, I wouldn't worry about it right now. If it does turn out to be an issue, I can handle it in my upcoming hwloc2 PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and this makes sense. Yes, let's target this work when hwloc2 PR is posted. We can either solve this w/ allowlist or by calculating number of ranks x per rank vertices = overall vertex count.
@dongahn thank you very much for jumping in and creating the additional functionality. @SteVwonder thanks for the quick review! I'm going to set MWP after flux-sched gets taggged. |
@milroy, in case you missed the slack message, flux-sched v0.9.0 has been tagged. |
add a map to graph store to keep track of the subgraph vertices associated with each rank. This map will be used to mark resources subgraphs with a new status.
enable population of by_rank map in readers while loading the resource graph
add console debug output to print the size of each vertex vector (subgraph) associated with each rank in the resource graph.
add tests to check the number of vertices associated with rank -1 (the default value) for GRUG, JGF, and hwloc test graphs.
Add by_rank support to stat request callback. "by_rank" key in the response is a JSON dictionary whereby each key is a rank idset that contains the same number of graph vertices. Make the idset lib a dependency of fluxion resource.
This PR addresses item II.3 from issue #662 by introducing a
by_rank
map toresource_graph_metadata_t
. The map is keyed by theint64_t
rank
and takes astd::vector
ofvtx_t
corresponding to the subgraph associated with the rank. The changes include populating the map in the GRUG, JGF, and hwloc readers and the addtion of testsuite checks for correct subgraph sizes.This PR should be merged before #665 and #667 as those depend on the features created here.