⚠ This is not a complete doc, just a sketch. We need to add more tracers and maybe a new tool to analyse tracking data. For those who are interested in this feature, please follow #36
In run-with-kata-containers we have shown how to run duetector
in a kata container. In this section, we will show a user case of tracking ML jobs in kata containers.
First, create a work dir in host machine, then we will mount it to kata container. This is for saving tracking data.
mkdir ./duetector-kata
sudo nerdctl run \
-it --rm \
-p 8888:8888 \
-p 8120:8120 \
-e DUETECTOR_DAEMON_WORKDIR=/duetector-kata \
-v $(pwd)/duetector-kata:/duetector-kata \
--runtime=io.containerd.kata.v2 \
--cap-add=sys_admin \
--rm \
dataucon/duetector
Note:
- JupyterLab is the default entrypoint of
dataucon/duetector
as user application, see Dockerfile and start-script for change it by yourself. --cap-add=sys_admin
is required foreBPF
to run properly.- You can use
--entrypoint bash
to enter the container and runduetector
manually.- you need to mount debugfs manually:
mount -t debugfs debugfs /sys/kernel/debug
- you need to mount debugfs manually:
- You can learn how to use GPUs with kata containers in official kata containers doc and config its cpu and memory in config file.
We use a example of using PyTorch
to train a model and save it. You can find the code in here.
It will take a while to train the model. You can change NUM_EPOCHS
and size of dataset to make it faster.
TDB