⚠ This is not a complete doc, just a sketch. We need to add more tracers and maybe a new tool to analyse tracking data. For those who are interested in this feature, please follow #36

Tracking ML jobs in kata containers

In run-with-kata-containers we have shown how to run duetector in a kata container. In this section, we will show a user case of tracking ML jobs in kata containers.

Prepare work dir

First, create a work dir in host machine, then we will mount it to kata container. This is for saving tracking data.

mkdir ./duetector-kata

Start a kata container

sudo nerdctl run \
-it --rm \
-p 8888:8888 \
-p 8120:8120 \
-e DUETECTOR_DAEMON_WORKDIR=/duetector-kata \
-v $(pwd)/duetector-kata:/duetector-kata \
--runtime=io.containerd.kata.v2 \
--cap-add=sys_admin \
--rm \
dataucon/duetector

Note:

JupyterLab is the default entrypoint of dataucon/duetector as user application, see Dockerfile and start-script for change it by yourself.
--cap-add=sys_admin is required for eBPF to run properly.
You can use --entrypoint bash to enter the container and run duetector manually.
- you need to mount debugfs manually: mount -t debugfs debugfs /sys/kernel/debug
You can learn how to use GPUs with kata containers in official kata containers doc and config its cpu and memory in config file.

Train a model

We use a example of using PyTorch to train a model and save it. You can find the code in here.

It will take a while to train the model. You can change NUM_EPOCHS and size of dataset to make it faster.

Analyse tracking data

TDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Tracking ML jobs in kata containers

Prepare work dir

Start a kata container

Train a model

Analyse tracking data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Tracking ML jobs in kata containers

Prepare work dir

Start a kata container

Train a model

Analyse tracking data