Skip to content

Debugging curtin from within MaaS

Juan Vela edited this page Feb 14, 2019 · 1 revision

Source: https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b

Debugging curtin from within MAAS

Deploying a node with MAAS cli

You can deploy a node with the maas cli which is often preferable to clicking a button on a web UI.

$ SYSTEM_ID=node-787b19d8-d25c-11e4-9f9e-00163eca91de
$ NAME="random-nodename"
$ MAASNAME="maaslocal"
$ maas $MAASNAME machine allocate "name=$NAME"
#  in maas < 2.2, this is: maas $MAASNAME nodes acquire "name=$NAME"

$ maas $MAASNAME machine deploy "$SYSTEM_ID"
#  in maas < 2.2, this is: maas $MAASNAME node start "$SYSTEM_ID"

$ maas $MAASNAME machine release "$SYSTEM_ID"
# in maas < 2.2, this is: maas $MAASNAME node release "$SYSTEM_ID"

optionally, you can pass the following options to machine deploy:

  • hwe_kernel=hwe-x: to specify the HardwareEnablement kernel that you want.
  • distro_series=$release: to specify the ubuntu release you want to install.

Bug 1074317 is a request for improved usability in that command line.

Accessing a system during deployment

After the node is started and boots to appropriate place in boot to have ssh runnnig, you can ssh in as the ubuntu user. (Bug 1462498 requests that be changed to 'ephemeral' to avoid confusion with deployed system).

Given $HOST_IP as the host's IP that can be seen on the maas UI or in the cli (maas $MAASNAME machines list-allocated), you can then:

# as the system will have a new host fingerprint remove any old ones
$ ssh-keygen -f ~/.ssh/known_hosts -R $HOST_IP

$ ssh ubuntu@$HOST_IP

Enabling Debug output in log

To enable debug output in the install log, modify the maas region controller's /etc/maas/preseeds/curtin_userdata file. At the bottom, simply add:

showtrace: true
verbosity: 3

Then, the deployment log will have curtin debug enabled, and any errors will have stack traces shown.

Stopping deployment shutdown

There are 2 ways to stop shutting down of a system after deployment has failed or succeeded.

  1. modify the maas region controller's /etc/maas/preseeds/curtin_userdata file.

    Doing things this way is global to your maas installation, and requires you being root on the maas region controller to make the change.

    In /etc/maas/preseeds/curtin_userdata, there is a section here that looks like:

     {{if third_party_drivers and driver}}
     early_commands:
       {{py: key_string = ''.join(['\\x%x' % x for x in driver['key_binary']])}}
       ...
       {{endif}}
     {{endif}}
     late_commands:
    

    You will need want to pull the early_commands entry out of the third_party_drivers section so that it is always rendered. Adjust the file so that it looks more like below (Bug 1683465 requests making this more obvious). The multiple files cover working with multiple different MAAS versions.

     early_commands:
       disable_reboot: touch /run/block-curtin-poweroff /tmp/block-poweroff /tmp/block-reboot
       {{if third_party_drivers and driver}}
       ...
    
  2. Stop the single node from rebooting.

    During the deployment process, you can ssh in as described above and touch the same files that the early command above would touch.

     $ ssh ubuntu@$HOST_IP sudo touch /run/block-curtin-poweroff /tmp/block-poweroff /tmp/block-reboot
    

    Because you have to be quick here, you can use the 'block-reboot' script here.

Logs

Information about the curtin installation should be available in

  • /tmp/install.log: When maas deploys with curtin it provides config that specifies the log should go to this file. This file is also posted back to MAAS so the user can see it there. Extra debug output will be included here.

  • /var/log/cloud-init.log: cloud-init's log wont include much about curtin specifically, but might include other useful information. Specifically looking for WARN in the log will highlight likely causes of issue.

  • /var/log/cloud-init-output.log: For any process executed by cloud-init (such as curtin) the standard output and standard error are sent to this file. Thus, this will likely have errors written by curtin.

Re-running curtin

MAAS runs curtin by sending it as user-data to cloud-init. cloud-init puts that user-data into /var/lib/cloud/instance/scripts/part-001 and then executes it. The content there is a self extracting shell archive that extracts itself to /curtin and then executes itself.

As root, then you can re-use the /curtin directory to run curtin again.

ubuntu$ sudo su -
$ cd /curtin
$ ls
bin  configs  curtin  helpers

$ /var/lib/cloud/instance/scripts/part-001 info
LABEL='curtin'
PREFIX='curtin'
COMMAND=( 'curtin' '--install-deps' 'install'
    '--config=configs/config-000.cfg' '--config=configs/config-001.cfg'
    '--config=configs/config-002.cfg' '--config=configs/config-003.cfg'
    '--config=configs/config-004.cfg'
    'http://10.7.10.1:5248/images/ubuntu/amd64/generic/xenial/daily/root-tgz' )
CREATE_TIME='Wed, 25 May 2016 15:21:05 +0000'

The command show there in 'info' is the command that is run to do the install. The configs referenced are viewable in the configs/ directory.

You can then re-run the same command and or edit the configs and run it again.

$ ./bin/curtin --install-deps install \
  --config=configs/config-000.cfg --config=configs/config-001.cfg \
  --config=configs/config-002.cfg --config=configs/config-003.cfg \
  --config=configs/config-004.cfg \
  http://10.7.10.1:5248/images/ubuntu/amd64/generic/xenial/daily/root-tgz

To aid in debugging, I suggest adding the following flags to the command line:

  • -vvv: be more verbose
  • --showtrace: on failure, allow a python stack trace to show through.

Other Tips

Enabling -proposed and upgrading system

If you want to install a system with -proposed enabled and upgrade it so that the first boot tests the new packages, that can be accomplished by editing /etc/maas/preseeds/curtin_userdata (more info). Add the following:

system_upgrade: {enabled: True}
apt:
  sources:
    proposed.list:
      source: deb $MIRROR $RELEASE-proposed main universe

Filing a bug

If you need to file a bug, please collect and attach the storage configuration for the node. See the file 'bug-request-for-info.txt' for information on how to do that.