Resource fuzzing to assess job performance variation #76

cmelone · 2024-07-31T20:59:11Z

Given the goal of reducing costs on a per-job basis, we would like to understand the effects of limiting CPU cycles available to a build job. This process would add variance to the resource allocation algorithm.

This would take "gantry in the direction of a full genetic algorithm to optimize the resource requests of jobs to build applications in the least expensive way possible" - Alec.

This is essentially a scaling study in order to balance the amount of cycles allocated to a build and the wall time of the job, ultimately optimizing cost. The efficiency curve is the plot of interest, where efficiency is defined as cores/build time.

This would be done by choosing 10-15% of all incoming prediction requests to "fuzz" and purposefully limiting the CPU resources allocated so we can understand the impact on different types of applications and the variety of build options available in Spack.

This fuzzing would occur a few times for each given spec, until we can determine the optimal efficiency for the job, which would be used to define future CPU limits and the number of make jobs.

cmelone · 2024-09-17T18:23:24Z

fuzzing:

when fuzzing, include an indicator variable and mark job as having been fuzzed so it doesn't get used as a predictor in the future
make sure you don't fuzz a retried job by checking if the last exact spec in the db failed
- would some sort of grace period be necessary? "not fuzzable until x time"

storing predictions:

store a log of predictions in the db -- would be fuzzy searched just like the current system
predictions would have a bool fuzzable indicator
can invalidate them so they don't get used again
not completely sure if this will increase efficiency...most of the computation is done by the search and not computing the actual allocations

cmelone · 2024-10-29T14:59:33Z

once we have fuzzed for a bit, need to figure out how to update the prediction algorithm to choose allocations based on the efficiency of resources/duration

cmelone added the feature New feature or request label Jul 31, 2024

cmelone self-assigned this Jul 31, 2024

cmelone mentioned this issue Jul 31, 2024

Project roadmap #71

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource fuzzing to assess job performance variation #76

Resource fuzzing to assess job performance variation #76

cmelone commented Jul 31, 2024 •

edited

Loading

cmelone commented Sep 17, 2024

cmelone commented Oct 29, 2024

Resource fuzzing to assess job performance variation #76

Resource fuzzing to assess job performance variation #76

Comments

cmelone commented Jul 31, 2024 • edited Loading

cmelone commented Sep 17, 2024

cmelone commented Oct 29, 2024

cmelone commented Jul 31, 2024 •

edited

Loading