Prediction API #5

cmelone · 2024-01-24T06:11:28Z

This PR implements an endpoint that accepts the following payload via a GET request to /v1/allocation:

{
    "package": {
        "name": "string",
        "version": "string",
        "variants": "string"
    },
    "compiler": {
        "name": "string",
        "version": "string"
    }
}

It will also accept an array of these dictionaries.

The program will then search for 4 ≤ x ≤ 5 samples with the following covariates (in priority order):

("pkg_name", "pkg_version", "compiler_name", "compiler_version")
("pkg_name", "compiler_name", "compiler_version")
("pkg_name", "pkg_version", "compiler_name")
("pkg_name", "compiler_name")
("pkg_name", "pkg_version")
("pkg_name",)

We've chosen this order after running simulations of this prediction algorithm and finding that (1) is the best predictor of resource usage.

After collecting the sample, the program returns a mean of past CPU and memory usage to suggest an appropriate k8s allocation. For the moment, we are only targeting requests.

Additionally, we have implemented a safeguard ensuring that the requests will not be set below what is currently allocated for each package. spack/spack#42351, which was merged a few days ago, increases the allocation for many packages (based on max memory per package, rather than mean as we do here). I am hypothesizing that this will likely render many of the predictions futile, but it's still a good opportunity for a Gantry trial run.

I'm looking for any ways to improve the code or make it more concise!

alecbcs

This is looking really good! Just a few initial impressions and a small suggestion.

gantry/views.py

gantry/util/k8s.py

gantry/routes/prediction/prediction.py

Rather than grabbing the 4-5 rows that match based on other params (name, version for both pkg and compiler) and then filtering by variant, these queries are now split up into two. The program will first try to do an exact match on variants. If that doesn't work, it'll try to match based on expensive variants. This increases the amount of queries being done, but is a more robust system and will result in more matches.

we have two options for accepting multiple predictions: 1) instructing the client to send a request for each prediction 2) allowing the client to pass a list of specs to predict for the second option presents a problem when we get into the weeds of GET request length limits. since specs can vary widely in length when running json.dumps() on them, it would be difficult to either instruct the client to limit their length, or to somehow validate it. when we were allowing bulk prediction in one GET request, it was handled through an asyncio.gather() call. this approach is approximately 2x faster than individual HTTP requests when testing 5000 consecutive calls to the API (4s vs 8s). In this case, 8 seconds is not very long so it's not a problem, but we can revisit this in the future if we run into performance issues.

…son spec

…et_sample`

Co-authored-by: Alec Scott <[email protected]>

alecbcs

Looks good to me. Thanks @cmelone!

cmelone added the feature New feature or request label Jan 24, 2024

cmelone self-assigned this Jan 24, 2024

cmelone changed the base branch from add/collection-func to develop January 30, 2024 04:41

cmelone changed the base branch from develop to add/collection-func January 30, 2024 04:41

cmelone force-pushed the add/predict branch 2 times, most recently from dcb99d4 to aab0a7d Compare January 31, 2024 07:28

cmelone changed the base branch from add/collection-func to add/first-docs February 1, 2024 05:01

cmelone changed the base branch from add/first-docs to add/collection-func February 1, 2024 05:02

cmelone marked this pull request as ready for review February 1, 2024 05:19

cmelone requested a review from alecbcs February 1, 2024 05:22

alecbcs requested changes Feb 5, 2024

View reviewed changes

gantry/views.py Outdated Show resolved Hide resolved

gantry/views.py Outdated Show resolved Hide resolved

gantry/util/k8s.py Outdated Show resolved Hide resolved

gantry/routes/prediction/prediction.py Outdated Show resolved Hide resolved

cmelone commented Feb 9, 2024

View reviewed changes

gantry/routes/prediction/prediction.py Outdated Show resolved Hide resolved

Base automatically changed from add/collection-func to develop February 12, 2024 19:19

cmelone force-pushed the add/predict branch from 2c5b533 to d590b0d Compare February 15, 2024 07:39

cmelone requested a review from alecbcs February 16, 2024 00:20

alecbcs reviewed Feb 16, 2024

View reviewed changes

gantry/routes/prediction/prediction.py Show resolved Hide resolved

cmelone force-pushed the add/predict branch from db2dd7f to 5a8a7a7 Compare March 4, 2024 18:26

cmelone mentioned this pull request Mar 7, 2024

add tests for prediction API #19

Merged

cmelone force-pushed the add/predict branch 2 times, most recently from 790dbdf to 87eebb8 Compare March 7, 2024 21:08

cmelone requested a review from alecbcs March 7, 2024 21:11

cmelone force-pushed the add/predict branch from 70df8fa to 8c54e43 Compare April 24, 2024 03:43

github-actions bot added the docs Improvements or additions to documentation label Apr 24, 2024

alecbcs reviewed Apr 25, 2024

View reviewed changes

gantry/routes/prediction/prediction.py Outdated Show resolved Hide resolved

gantry/routes/prediction/prediction.py Outdated Show resolved Hide resolved

cmelone added 6 commits April 25, 2024 16:46

first prediction functionality

33cf4b3

1

24cfb69

add basic routing

8421602

check whether predictions are lower than current allocation

e7bcc9f

adjust conversions

c2da0e0

fix some bugs

169fab5

cmelone and others added 26 commits April 25, 2024 16:46

bulk predictions run in parallel

2f65298

fmt changes

5714aae

cpu units now in cores rather than millicores

22d87e5

request limit 8MB

27b07f5

add variant filtering

3f2b7ff

small fixes

a87f0f1

updated standard allocations

031706e

formatting

ce9bab3

formatting 😵‍💫

00795b2

logging -> logger usage

9affc34

remove unnecessary comment [ci skip]

777c175

move validate payload for prediction into util for easier testing

7aeac68

make a couple changes to prediction to make testing easier

feda9c6

fix: remove unnecessary condition in validate_payload

d8b0598

fix: remove custom GET length, 8K is sufficient for one url-encoded j…

50aead8

…son spec

fix: remove hash from GET payload

f5f2363

fix: update docstring for allocation view

341c5a3

allocation API now accepts spec string rather than JSON payload

558d575

formatting fixes

54658ce

formatting fixes

1d4278c

black is not being consistent :(

b552cfc

fix: race condition where spec passed by reference was modified by `g…

67e0438

…et_sample`

predict_single -> predict

e3b2602

Co-authored-by: Alec Scott <[email protected]>

decouple prediction strategy from env variable

00821d4

cmelone force-pushed the add/predict branch from 17056ea to 00821d4 Compare April 25, 2024 23:56

alecbcs approved these changes May 1, 2024

View reviewed changes

alecbcs merged commit 547f80a into develop May 1, 2024
5 checks passed

alecbcs deleted the add/predict branch May 1, 2024 00:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction API #5

Prediction API #5

cmelone commented Jan 24, 2024 •

edited by alecbcs

Loading

alecbcs left a comment

alecbcs left a comment

Prediction API #5

Prediction API #5

Conversation

cmelone commented Jan 24, 2024 • edited by alecbcs Loading

alecbcs left a comment

Choose a reason for hiding this comment

alecbcs left a comment

Choose a reason for hiding this comment

cmelone commented Jan 24, 2024 •

edited by alecbcs

Loading