-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise Sched's find() and select() operations to leverage visitor/matcher design patterns #193
Comments
[Rough Idea] Wondering if we can use an one-pass algorithm using matcher/visitor patterns to merge Rough sketch:
|
That sounds good Dong, and is rather like what I meant to be getting at, though much more fleshed out. The only nit I have is it would be more efficient to use a preorder rather than postorder walk because of the way exclusivity works in our resource model. If it's preorder, you can look at the exclusive flag on a resource to determine whether it's worth looking at any of its children, while the postorder would have already visited the children before seeing the parent. This is also quite easy to parallelize across threads without races except on the matches construct, which would be pretty easy to protect. |
Yeah. It should have been preorder. That order has the pruning benefit from the perspective of both exclusivity and aggregate evaluation. |
Also, |
I agree that's useful, but would rather be able to traverse it by On 16 Sep 2016, at 11:28, Dong H. Ahn wrote:
|
I see your point about memory. This came from wanting to manage the potential complexities of checking ancestor vertices across potentially many hierarchies. Perhaps a matcher can create a sub-matcher to walk up various hierarchies and check the sub-constraints. |
Like
|
I have to think some more about this but if the scheduler limits the interaction with the new Resource API at the matcher and submatchers level, which uses WalkerAPI set, then perhaps this is coarse grained enough that merits a message interface... Of course, the scheduler needs to push the matcher code into the Resource service, but this is already being demonstrated with @grondo's reduction work... |
I thought about this some more. I am pretty positive that we can do this as a message interface and we can even also take care of reservations with an one-pass algorithm. Essentially matchers can evaluate not only the current state but also the next availble windows upto k windows (this should be a function of the resevation depth). The rsulting score of each match should reflect how early the resource will be avaiable. The score should be higher for any match whose availble time window is earlier. Precedence of the score: available time#better match - when resource is idle, the time will be 0, so they must be selected before choosing a reource that becomes available in 10 mins from now. This way, one pass query will always produce an enough number of resources sorted by the match critetia, so the scheduler only needs to dequeue N items from the priority queue. |
For IO and Power, I think the pre-order is pretty close to optimal. As you descend the tree, you can determine the maximum number of processes that you could spawn in each subtree given the amount of available BW/Power. You can bail-out early from a subtree traversal if the number of processes you can spawn is 0 (or less than the total in the case of the root). My only question is how would a locality-aware scheduler work under this scheme? For example, if you wanted to make a scheduler that creates jobs under as few switches as possible (to limit the number of hops necessary for MPI comms), how would that work under this one-pass scheme? I could see how to write it under the two-pass scheme: (1) find all the nodes available and then (2) select the nodes under the switches with the most available nodes. |
@SteVwonder: For locality-aware scheduling, I can think of two schemes:
The first one is a top down and the other bottom up scheme. Depending on what you do, either should work better. But it seems an important thing is to identify representative operations and package them as a set of abstractions. In any case, matcher's signature can be something like:
|
An interesting thing to think about is the main hierarchy vs. auxiliary hierarchies. I can see a scheme where the scheduler's main tree is the default hierarchy and then use |
A locality-aware scheduler would probably want to compute the densest clique that matches all other constraints, it could be done as a function of some pretty common graph algorithms, but now we're talking about a graph walking interface rather than just a tree walker. We'll definitely want that, but it's a much more complicated animal than we're talking about here initially. This is part of why I think it's a good idea to build on a library or framework that has graph traversal functions built in, the trivial tree cases remain trivial, but the complicated stuff becomes much more tractable. |
Did you happen to look at some of the already available frameworks we can possibly leverage? |
Many, this is why I was having trouble with C as the language. There's only one that offers a C interface, and igraph is not particularly suitable. If we had to, we could make it work, but it's an O(n) operation on the size of the graph as a whole to add a node or edge. Our local graph analysis experts suggested Boost::Graph for its flexibility and support for completely custom data structures, there are various other good C++ libraries too, or there's the cayley graph library/database in go. Obviously none of these is a perfect fit to the project... That said, having an abstract interface over one of them to get started quickly seems less painful than trying to build an entire toolkit from scratch without a graph analysis expert on hand... |
FYI -- @lipari, @trws, @SteVwonder and I talked about this at yesterday's meeting. Making a long story short, we agreed that our short-term effort will be to add some walker and priority queue support around resrc and friends and to rewrite the scheduler plugins as matchers. The cleaner separation between resrc and scheduler will then be a basis for performing a later surgery when resrc becomes a part of job and resource service via message interfaces. @lipari and I will closely work together on this one -- this can be viewed as scalability work as well. And we will get constant feedback form @trws and @SteVwonder to make later surgery more successful. |
FYI -- To do some minimal amount of work for this issue, I am first trying to abstract out various resource generator methods like these lines into a common factory/generator class. The idea is then for this class to generate a resrc "database" fully given a generation method and make all of the resource hierarchies available (through some accessors of the opaque data type (
Now, I find it a bit awkward having to specialize the code whether the target resrc is physical or flow, not being able to write the most generic code: e.g.,
And macher and walker API
I will keep working on this to demo the new ways in a small but easily modifiable way but it seems clear to me we want to represent any resource hierarchy as a generic tree and also have a common resource abstract data type covering both |
This has been addressed by resource-query PR. |
Spinning off the issue from the discussions in PR 189.
@dongahn said:
I believe providing matcher and visitor patterns from resrc and the friends can make resrc APIs more ease to use and robust, and make it be better adaptable when @trws's job spec design comes along, as mentioned in this comment
ROSE's AstPrePostProcessing class is one of the examples. It is an abstract class that defines
preOrderVisit
andpostOrderVisit
interfaces for an Abstract Syntax Tree (AST). One can override (e.g., C++ lingo -- meaning, implement) these interfaces in his or her AST analysis class to plug their analysis into the system. Then, this class takes care of an AST traversal and lets the user-defined analysis to be invokedon post visit
andon pre visit
during this depth first traversal.I actually used this interface for the STAT project. One example code can be found in STAT's TemporarlOderAnalysisAPI in our internal bitbucket repo (here). I used this pattern to implement a way to identify all of the lexical scopes in a C/C++ source file, which contain a specific program point or points. This static information, along with the runtime stack trace, then allowed me to temporally order points of execution among MPI processes (i.e., determining relative progress).
Our case can be viewed in a similar way. Our scheduling plugins traverse each resrc hierarchy and act on a vertex that they wants to operate on. But plugins don' t need to implement raw traversal logic by themselves. Since our language is C, we can't use a OOP concept. But I think a callback that gets invoked on each node visit can bring us similar benefits.
If we abstract out data structures and visitor/match patterns from resource details (e.g., node type, PDU type, switch type and etc), a scheduler plugin writer can focus on scheduling operations like matched/unmatched, allocate/deallocate etc while leaving the traversal across potentially many different resource hierarchy to the visitor pattern provider. Good separations of concerns.
If we were to do this, I would see if we can turn some of the tree traversing logic in our current scheduler plugin to use such visitor/matcher patterns.
BTW, Clang also has Matcher patterns in here though I don't have hands-on experience.
@trws said:
Clang matchers are, for the record, one of the most awesome features ever written for manipulating C/C++.
Anyway, actual on-topic comment, the classic twalk style pattern could work well for this, but I'd probably want to simplify it to a DFS only where the function pointer's return value informs the walker as to whether to process children, skip children and continue, or abort completely. I say DFS specifically because a DFS from the root of the hardware tree, restricted to the hardware tree, can most rapidly identify branches to skip in that ordering IIRC.
The text was updated successfully, but these errors were encountered: