Add ignition-virtio-dump.service #146

cgwalters · 2020-01-04T16:29:12Z

Debugging failures in the initrd is annoying; this code
looks for a virtio-serial port named com.coreos.ignition.journal,
and runs as part of emergency.target.

I plan to change mantle to set up this port by default, so if
something fails in the initramfs we'll at least reliably get
the journal in a sane parsable format.

This is a special targeted subset of
coreos/ignition#585

jlebon · 2020-01-06T16:15:25Z

We'll need something like this for rpm-ostree's CI too at least, where right now the journal kola collects stops on the first reboot (again, because kola doesn't know the node is being rebooted).

I had thought of something similar though possibly using another console instead, and ForwardToConsole=. Using virtio-serial channels are a neat idea! We made use of them in SystemTap to support targeting probing processes inside VMs. Though hmm, the downside is we'll probably need a custom unit to proxy the messages through.

Anyway, WDYT about having this functionality in https://github.com/coreos/fedora-coreos-config directly, and just conditionalizing the unit on ConditionPathExists=/dev/virtio-ports/com.coreos.journal?

jlebon · 2020-01-06T16:16:44Z

(To clarify, what I'm suggesting here is making this a streaming thing instead, installing it in both the initrd and the real root, and making it more of a "host API" than something Ignition-specific.)

cgwalters · 2020-01-06T16:21:33Z

I think these are strongly related but still orthogonal things. We don't need to stream the journal from the initrd - assuming the initrd works fine, if we do journal streaming in the real root we'll get the logs we need then.

Hence, I'd propose merging this PR mostly as is, and do what you're suggesting as a separate virtio channel indeed owned by fedora-coreos-config (since it's not really related to Ignition).

cgwalters · 2020-01-06T16:23:02Z

BTW, I wrote exactly what you're suggesting for gnome-continuous for several reasons, but one of the most interesting is that the default for desktop systems is not to have ssh on.

(It could make sense to change mantle to default to 'exec over virtio' but that's a separate discussion)

cgwalters · 2020-01-27T21:59:28Z

Any further thoughts on this one?

jlebon

I think these are strongly related but still orthogonal things. We don't need to stream the journal from the initrd - assuming the initrd works fine, if we do journal streaming in the real root we'll get the logs we need then.

Hmm, I don't follow. If the goal is to have a debugging hook like this, IMO it'd be even more useful if it streamed starting from the initrd too. E.g. systemd.log_level=debug systemd.log_target=console works on both the initrd systemd and real root systemd. And if we do that, I think it can be used in place of this.

But yeah, this is clearly useful to have today, so no issues from me getting this in meanwhile.

Anyway, a few optional comments, but LGTM as is too.

dracut/99emergency-timeout/ignition-virtio-dump.service

dracut/99emergency-timeout/module-setup.sh

cgwalters · 2020-01-28T21:47:46Z

If the goal is to have a debugging hook like this, IMO it'd be even more useful if it streamed starting from the initrd too.

Yeah...though it would duplicate then what kola is doing with gathering the journals (we could replace that only on qemu of course).

We'd also need to handle being killed and restarted across the switchroot and think about how that appears in logs.

I guess again my main concern is getting the journal when things go wrong - when things go "right" (at least up till ssh) one has a ton of options.

Arguably, we should have a similar service in the real root that also handles failure to reach the default target.

jlebon · 2020-01-29T20:21:20Z

Yeah...though it would duplicate then what kola is doing with gathering the journals (we could replace that only on qemu of course).

Yeah, the goal would definitely be to make kola use that for qemu (and fixing the rpm-ostree vmcheck test logs case).

We'd also need to handle being killed and restarted across the switchroot and think about how that appears in logs.

This would be tricky to do but not unsolvable I think. E.g. the proxy service could just write out on shutdown the cursor of the last message it proxied?

I guess again my main concern is getting the journal when things go wrong - when things go "right" (at least up till ssh) one has a ton of options.

The way I'm thinking of it, the contexts in which you would have this set up is also where you want to be ready for things to go wrong (e.g. Ignition debugging, test harnesses, etc..). I don't see it as re-implementing e.g. systemd-remote-journal but something lower level than that and situational.

But again, I definitely see the value of just something that fires on emergency in the initrd. So this WFM!

Pairs with coreos/ignition-dracut#146 What we really want is to use this in kola, will do as a separate followup.

cgwalters · 2020-03-26T21:36:36Z

Now pairs with coreos/coreos-assembler#1290 and tested to work (or I guess successfully fail?) together.

Will merge both when both are approved.

cgwalters · 2020-03-26T22:20:21Z

Actually now that I play with this more...it might be nice if we wrote to the channel just {} when we succeeded too - that would make the flow in mantle/kola saner because we could synchronously wait for either success/failure rather than waiting for (ignition failure or ssh works).

cgwalters · 2020-03-26T23:18:30Z

I thought about the "generalize this to post-initramfs" and realized we don't necessarily need to bake it into CoreOS by default - it could be injected via Ignition.

Pairs with coreos/ignition-dracut#146 This way, we error out fast if something went wrong in the initramfs rather than timing out. And further, we get the journal as JSON, so we can do something intelligent in the future to analyze it.

jlebon · 2020-03-27T19:07:01Z

I thought about the "generalize this to post-initramfs" and realized we don't necessarily need to bake it into CoreOS by default - it could be injected via Ignition.

The way I think of it is that a generalized version of this would be like the serial console output; it just streams from start to end of the VM on the same port. The same output you get from journalctl -o json really: logs there start from before switchroot.

Debugging failures in the initrd is annoying; this code looks for a virtio-serial port named `com.coreos.ignition.journal`, and runs as part of `emergency.target`. I plan to change mantle to set up this port by default, so if something fails in the initramfs we'll at least reliably get the journal in a sane parsable format. This is a special targeted subset of coreos/ignition#585

cgwalters · 2020-03-27T19:23:52Z

The same output you get from journalctl -o json really: logs there start from before switchroot.

Sure, but post-switchroot any code injected via Ignition to write to a port or do whatever is going to get those logs too - it'll just be delayed until the switchroot happens.

Another important thing is that instead of getting all logs the calling code can also use e.g. journalctl -u or whatever to filter to specific units to avoid transferring all the data, etc.

Pairs with coreos/ignition-dracut#146 This way, we error out fast if something went wrong in the initramfs rather than timing out. And further, we get the journal as JSON, so we can do something intelligent in the future to analyze it. And add a test case for this.

cgwalters · 2020-03-30T19:31:42Z

OK last call on this one...if there aren't any further objections/thoughts I plan to merge.

Pairs with coreos/ignition-dracut#146 This way, we error out fast if something went wrong in the initramfs rather than timing out. And further, we get the journal as JSON, so we can do something intelligent in the future to analyze it. And add a test case for this.

This is similar to: coreos/ignition-dracut#146 For our test system, it generally works really well to inject things via Ignition. That PR was about handling failures in the initramfs *before* Ignition runs. This PR is trying to help us test the scenario where no Ignition is injected into the Live ISO. Let's also use the virtio-channel approach.

This finally unifies the advantages of `cosa run` and `kola spawn`. I kept getting annoyed by how serial console sizing is broken (e.g. trying to use `less` etc.). Using `ssh` via `kola spawn` addresses that, but it means you can't debug the initramfs. Now things work in an IMO pretty cool way; if you do e.g. `cosa run --kargs ignition.config.url=blah://` (or inject a bad Ignition config) to cause a failure in the initramfs, you'll see a nice error (building on coreos/ignition-dracut#146 ) telling you to rerun with `cosa run --devshell-console`. Things are also wired up cleanly so that we support rebooting with the equivalent of `kola spawn --reconnect` (which we should probably remove now). You can exit via *either* quitting SSH cleanly or using `poweroff`, and the lifecycle of ssh and qemu is wired together. And finally, if we detect a cosa workdir we also bind it in by default. More to come here, such as auto-injecting debugging tools and containers.

This came up in coreos/ignition-dracut#146 and since then we've been doing more "ad hoc unit writing to virtio" in mantle, but let's add a general API that streams the journal. This is just better for what devshell wants - we can more precisely watch for sshd starting. And more code in e.g. `testiso.go` could use it too which can come later.

This came up in coreos/ignition-dracut#146 and since then we've been doing more "ad hoc unit writing to virtio" in mantle, but let's add a general API that streams the journal. This is just better for what devshell wants - we can more precisely watch for sshd starting. And more code in e.g. `testiso.go` could use it too which can come later. The immediate motivation here is I may add another kola test which could use this.

jlebon approved these changes Jan 28, 2020

View reviewed changes

dracut/99emergency-timeout/ignition-virtio-dump.service Outdated Show resolved Hide resolved

dracut/99emergency-timeout/module-setup.sh Outdated Show resolved Hide resolved

cgwalters mentioned this pull request Mar 23, 2020

Teach mantle/kola how to automatically find cosa builds coreos/coreos-assembler#1259

Merged

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 26, 2020

qemu: Support monitoring for failures in the initramfs

8fed2af

Pairs with coreos/ignition-dracut#146 What we really want is to use this in kola, will do as a separate followup.

cgwalters mentioned this pull request Mar 26, 2020

qemu: Support monitoring for failures in the initramfs coreos/coreos-assembler#1290

Merged

cgwalters force-pushed the failure-qemu-dump branch from 554e8af to 84c89f4 Compare March 27, 2020 19:44

cgwalters merged commit 6136be3 into coreos:master Mar 31, 2020

cgwalters mentioned this pull request Apr 6, 2020

Lacking troubleshooting method for ignition coreos/fedora-coreos-tracker#453

Closed

cgwalters mentioned this pull request Apr 8, 2020

overlay: Add code to verify interactive login for Live ISO coreos/fedora-coreos-config#339

Merged

cgwalters mentioned this pull request Apr 10, 2020

Introduce kola qemuexec --devshell, make cosa run use it coreos/coreos-assembler#1338

Merged

cgwalters mentioned this pull request Apr 13, 2020

99emergency-timeout: Dump all failed services #168

Merged

cgwalters mentioned this pull request Apr 16, 2020

providers: setup network kernel arguments in initrd coreos/afterburn#390

Merged

cgwalters mentioned this pull request Apr 17, 2020

cosa run silently hangs with initramfs failure and no virtio dump patched ignition-dracut coreos/coreos-assembler#1367

Closed

cgwalters mentioned this pull request Apr 27, 2020

WIP: Enable initramfs-etc.img as an appended initramfs coreos/fedora-coreos-config#364

Closed

cgwalters mentioned this pull request Apr 29, 2020

Add qemu API to stream journal, use it in devshell coreos/coreos-assembler#1416

Merged

sohankunkerkar mentioned this pull request Jul 10, 2020

mantle/kola/tests/ignition: don't skip ignitionFailure test coreos/coreos-assembler#1581

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ignition-virtio-dump.service #146

Add ignition-virtio-dump.service #146

cgwalters commented Jan 4, 2020

jlebon commented Jan 6, 2020

jlebon commented Jan 6, 2020

cgwalters commented Jan 6, 2020

cgwalters commented Jan 6, 2020

cgwalters commented Jan 27, 2020

jlebon left a comment

cgwalters commented Jan 28, 2020

jlebon commented Jan 29, 2020

cgwalters commented Mar 26, 2020

cgwalters commented Mar 26, 2020

cgwalters commented Mar 26, 2020

jlebon commented Mar 27, 2020

cgwalters commented Mar 27, 2020 •

edited

Loading

cgwalters commented Mar 30, 2020

Add ignition-virtio-dump.service #146

Add ignition-virtio-dump.service #146

Conversation

cgwalters commented Jan 4, 2020

jlebon commented Jan 6, 2020

jlebon commented Jan 6, 2020

cgwalters commented Jan 6, 2020

cgwalters commented Jan 6, 2020

cgwalters commented Jan 27, 2020

jlebon left a comment

Choose a reason for hiding this comment

cgwalters commented Jan 28, 2020

jlebon commented Jan 29, 2020

cgwalters commented Mar 26, 2020

cgwalters commented Mar 26, 2020

cgwalters commented Mar 26, 2020

jlebon commented Mar 27, 2020

cgwalters commented Mar 27, 2020 • edited Loading

cgwalters commented Mar 30, 2020

cgwalters commented Mar 27, 2020 •

edited

Loading