-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update stats output to include data for failed matches #1187
Conversation
resource/modules/resource_match.cpp
Outdated
if (flux_respond_pack (h, msg, "{s:I s:I s:o s:f s:I s:I s:I s:I s:f s:f" | ||
" s:f s:f s:I s:I s:f s:f s:f s:f}", | ||
"V", num_vertices (ctx->db->resource_graph), | ||
"E", num_edges (ctx->db->resource_graph), | ||
"by_rank", o, | ||
"load-time", ctx->perf.load, | ||
"graph-uptime", graph_uptime_s, | ||
"time-since-reset", time_since_reset_s, | ||
"njobs", ctx->perf.njobs, | ||
"njobs-reset", ctx->perf.njobs_reset, | ||
"min-match", min, | ||
"max-match", ctx->perf.max, | ||
"avg-match", mean, | ||
"match-variance", variance, | ||
"njobs-failed", ctx->perf.njobs_failed, | ||
"njobs-reset-failed", ctx->perf.njobs_reset_failed, | ||
"min-match-failed", min_failed, | ||
"max-match-failed", ctx->perf.max_failed, | ||
"avg-match-failed", mean_failed, | ||
"match-variance-failed", variance_failed) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quck thought - it might make sense to break the flat list up into some objects like
match.success.min
match.success.avg
match.success.max
match.failed.min
match.failed.avg
match.failed.max
etc or wherever grouping makes sense. That would make it a little easier to access specific data with queries e.g.
flux module stats sched-fluxion-resource | jq .match.failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Let me know if the implementation makes sense.
02587f1
to
9497cdd
Compare
resource/modules/resource_match.cpp
Outdated
goto error; | ||
} | ||
|
||
if (flux_respond_pack (h, msg, "{s:I s:I s:o s:f s:I s:I s:{s:o s:o}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you but it might be nice to have the min/max/avg values in sub-objects.
Also it looks like the sub-objects could be leaked on error.
Packing objects with little o (which "steals" the reference) complicates object reference count management on error because the original reference is not always restored when pack fails (at least that was found to be the case in some jansson versions). So a suggestion is to either pack the response in one go or use big O to pack the sub objects and decrement their reference counts on both success and failure paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garlick, I think I implemented your first suggestion as you intended. Let me know what you think and I'll discard or squash the fixup commits.
48cfae7
to
4f75e31
Compare
resource/modules/resource_match.cpp
Outdated
"avg-match", mean, | ||
"match-variance", variance) < 0) { | ||
|
||
if (flux_respond_pack (h, msg, "{s:I s:I s:o s:f s:I s:I" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get why the indentation, and why this is like this, but my goodness this is complex. I'm guessing you had it making two sub-objects earlier and backed away based on @garlick's comment. This is clearly right, so I'm ok with taking it, but breaking deep ones like this down would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the second round of feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy. Thanks for the updates @milroy!
errno = ENOMEM; | ||
goto error_free; | ||
} | ||
if (!(match_failed = json_pack ("{s:I s:I s:{s:f s:f s:f s:f}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot easier for me to follow, good stuff.
6c8b606
to
98cbee2
Compare
@garlick, would you like one more pass here or are we ready for MWP? |
Problem: failed matches for job specifications with unsatisfiable constraints have been observed restricting the rate at which valid requests are scheduled. The resource module does not currently track stats on failed matches. Add tracking and reporting of stats on failed matches.
Problem: flux ion does not report the stats on failed matches collected and returned via RPC from the resource module. Add the ability to output the stats.
Go ahead. Nice work! |
Ready when you are then @milroy! |
Thanks for the reviews! I added a tiny commit to test populating the failed stats path. Once that succeeds I'll set MWP. |
Problem: the current stats tests don't test populating failed stats. Submit two unsatisfiable jobs to generate failed job stats.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1187 +/- ##
======================================
Coverage 73.9% 73.9%
======================================
Files 102 102
Lines 14565 14595 +30
======================================
+ Hits 10766 10790 +24
- Misses 3799 3805 +6
|
Performance testing with Fluxion has revealed the need to save and output data on failed matches. This PR updates the resource module to track and report these stats.