Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/ec2-debug-and-manual-cleanup #240

Merged
merged 7 commits into from
May 23, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions content/docs/self-hosted-runners.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,51 @@ provisioned through environment variables instead of files.
</tab>
</toggle>

#### Cloud Compute Resource Manual Cleanup

In very rare cases, you may need to cleanup CML cloud resources manually.
An example of such a problem can be seen
[when an EC2 instance ran out of storage space](https://github.com/iterative/cml/issues/1006).

The following sections contain lists of all the resources you may need to
manually cleanup in the case of a failure.

<toggle>
<tab title="AWS">
evamaxfield marked this conversation as resolved.
Show resolved Hide resolved

- The running EC2 instance (named with pattern `cml-{random-id}`)
- The volume attached to the running EC2 instance
(this should delete itself after terminating the EC2 instance)
- The generated key-pair (named with pattern `cml-{random-id}`)

If you keep encountering issues, it is appreciated to attempt pulling the logs
from the running instance before terminating and opening a GitHub Issue.

To do so add a startup command to the runner:
evamaxfield marked this conversation as resolved.
Show resolved Hide resolved

> `--cloud-startup-script=$(echo 'echo "$(curl https://github.com/'"$GITHUB_ACTOR"'.keys)" >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0)`

Once the instance fails you can attempt to connect to it and dump logs with:
evamaxfield marked this conversation as resolved.
Show resolved Hide resolved

```bash
ssh ubuntu@instance_public_ip
sudo journalctl -n all -u cml.service --no-pager > cml.log
sudo dmesg --ctime > system.log
sudo dmesg --ctime --userspace > userspace.log
evamaxfield marked this conversation as resolved.
Show resolved Hide resolved
```

You can then copy those logs to your local machine with:

```bash
scp ubuntu@instance_public_ip:~/cml.log .
scp ubuntu@instance_public_ip:~/system.log .
scp ubuntu@instance_public_ip:~/userspace.log .
```

There is a chance that the instance could be severely broken if the SSH command
hangs -- if that happens reboot it from the web console and try the commands
again.

#### On-premise (Local) Runners

The `cml runner` command can also be used to manually set up a local machine,
Expand Down