Previous Step: Prepare cluster
- For Kind cluster provided by
script.sh
:
./script.sh deploy_cpe_operator
- For managed cluster:
CLUSTER_PROVIDER= # kind|k8s|ocp
kubectl apply -f https://raw.githubusercontent.com/IBM/cpe-operator/main/examples/deployment/${CLUSTER_PROVIDER}-deploy.yaml
2.1. For managed cluster, make sure that cluster is in an idle state with only control plane workload
There are three available mode: quick
for testing, full
for pre-defined stressng, and custom
for non-CPE benchmarks.
- Quick sample
./script.sh quick_collect
This is only for testing purpose.
- Stressng (standard workload)
./script.sh collect
It might take an hour to run and collect all benchmarks. Output including CPE CR and Prometheus query response will be in data
folder by default.
- Custom Benchmark
With CPE operator, the start and the end time of each pod will be recorded. However, users might want to run a custom benchmark outside of kepler-model-server
and collect metrics for training the model using kepler-model-server
. In that case, user can define either interval
or [start_time
, end_time
] options to set the desired time window for metrics collection from Prometheus.
./script.sh custom_collect
Validation of metrics happens by default at the time of their collection. It is also possible to validate the collected metrics explicitly.
- Quick sample
./script.sh validate sample
- Full run
./script.sh validate stressng
- Custom benchmark
./script.sh validate customBenchmark
You can train the model by using docker image which require no environment setup but can be limited by docker limitation or train the model natively by setting up your python environment as follows.
- Quick sample
./script.sh quick_train
- Full run
./script.sh train
- Custom benchmark
./script.sh custom_train
Training output will be in /data
folder by default. The folder contains:
- preprocessed data from Prometheus query response
- profiles
- models in a pipeline hierarchy
Compatible version: python 3.10
- Install
hatch
- Prepare environment:
hatch shell
- Run
NATIVE="true" ./script.sh train
-
Fork
kepler-model-db
. -
Validate and make a copy by export_models command. Need to define
machine id
,local path to forked kepler-model-db/models
,author github account
andbenchmark type
../script.sh export_models <machine id> <path to kepler-model-db/models> <author github account> <benchmark type>
If you also agree to share the raw data (preprocessed data and archived file of full pipeline), run
./script.sh export_models_with_raw <machine id> <path to kepler-model-db/models> <author github account> <benchmark type>
- set
NATIVE="true"
to export natively. - Benchmark type accepts one of the values
sample
,stressng
orcustomBenchmark
.
- set
-
Add information of your machine in
./models/README.md
inkepler-model-db
. You may omit any column as needed. -
Push PR to `kepler-model-db.