Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha/Beta: corner-case dbus segfault in PXE image boot without first_boot=1 due to missing rootfs population #944

Closed
pothos opened this issue Jan 9, 2023 · 6 comments · Fixed by flatcar-archive/coreos-overlay#2371
Labels
kind/bug Something isn't working

Comments

@pothos
Copy link
Member

pothos commented Jan 9, 2023

Description

The login hangs until a timeout is hit, the kernel log shows a dbus segfault.

[   96.721359] dbus-daemon[750]: segfault at 1 ip 00007f2c8b78ba19 sp 00007ffc76fe3d98 error 4 in libc.so.6[7f2c8b659000+172000]
[   96.721774] Code: fe 7f 5c 17 e1 c5 f8 77 c3 0f 1f 84 00 00 00 00 00 89 f8 48 89 fa c5 f9 ef c0 25 ff 0f 00 00 3d e0 0f 00 00 0f 87 37 01 00 00 <c5> fd 74 0f c5 fd d7 c1 85 c0 74 5b f3 0f bc c0 c5 f8 77 c3 0f 1f

Impact

Slow machine and things may be broken

Environment and steps to reproduce

./flatcar_production_pxe.sh  -append 'flatcar.autologin'
# you can also do ssh:
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p 2222 [email protected]
@pothos pothos added the kind/bug Something isn't working label Jan 9, 2023
@pothos
Copy link
Member Author

pothos commented Jan 9, 2023

I can't reproduce it with the regular image - that's good, but we should add tests for the PXE images to catch this.

@pothos
Copy link
Member Author

pothos commented Jan 9, 2023

Stable 3374 is not affected, Beta 3432 is affected.

@pothos pothos changed the title Alpha: dbus segfault (at least with PXE image) Alpha/Beta: dbus segfault (at least with PXE image) Jan 9, 2023
@jepio
Copy link
Member

jepio commented Jan 9, 2023

This is the backtrace:

#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00007f5237561661 in __vfprintf_internal (s=s@entry=0x55c5fae992a0, format=format@entry=0x55c5f9b027ed "Unknown class %s", ap=ap@entry=0x7fff045a1168,
    mode_flags=mode_flags@entry=2) at vfprintf-internal.c:1517
#2  0x00007f52375f8ad6 in __vsyslog_internal (pri=<optimized out>, fmt=0x55c5f9b027ed "Unknown class %s", ap=0x7fff045a1168, mode_flags=2) at syslog.c:229
#3  0x000055c5f9af0ee9 in vsyslog (__ap=0x7fff045a1168, __fmt=0x55c5f9b027ed "Unknown class %s", __pri=14) at /build/amd64-usr/usr/include/bits/syslog.h:47
#4  log_callback (type=type@entry=0, fmt=fmt@entry=0x55c5f9b027ed "Unknown class %s")
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/selinux.c:147
#5  0x000055c5f9af106b in bus_selinux_check (sender_sid=sender_sid@entry=0x55c5faea4d00, override_sid=<optimized out>, requested=requested@entry=0x55c5f9b02853 "send_msg",
    auxdata=auxdata@entry=0x7fff045a32f0, target_class=0x55c5f9b027e8 "dbus") at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/selinux.c:409
#6  0x000055c5f9af189d in bus_selinux_check (auxdata=0x7fff045a32f0, requested=0x55c5f9b02853 "send_msg", target_class=0x55c5f9b027e8 "dbus", override_sid=<optimized out>,
    sender_sid=0x55c5faea4d00) at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/selinux.c:402
#7  bus_selinux_allows_send (sender=sender@entry=0x55c5faeb0660, proposed_recipient=proposed_recipient@entry=0x0, msgtype=0x7f52378baa57 "method_call",
    interface=interface@entry=0x55c5fae890e8 "org.freedesktop.DBus", member=member@entry=0x55c5fae890d8 "Hello", error_name=error_name@entry=0x0,
    destination=0x55c5fae89108 "org.freedesktop.DBus", activation_entry=0x0, error=0x7fff045a34a0)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/selinux.c:634
#8  0x000055c5f9ae2b16 in bus_context_check_security_policy (context=context@entry=0x55c5fae785e0, transaction=transaction@entry=0x55c5fae89080,
    sender=sender@entry=0x55c5faeb0660, addressed_recipient=addressed_recipient@entry=0x0, proposed_recipient=proposed_recipient@entry=0x0,
    message=message@entry=0x55c5fae87ed0, activation_entry=0x0, error=0x7fff045a34a0)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/bus.c:1762
#9  0x000055c5f9aeb009 in bus_dispatch (message=0x55c5fae87ed0, connection=0x55c5faeb0660)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/dispatch.c:394
#10 bus_dispatch_message_filter (connection=0x55c5faeb0660, message=0x55c5fae87ed0, user_data=<optimized out>)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/dispatch.c:559
#11 0x00007f5237897737 in dbus_connection_dispatch (connection=0x55c5faeb0660)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/dbus/dbus-connection.c:4703
#12 dbus_connection_dispatch (connection=connection@entry=0x55c5faeb0660)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/dbus/dbus-connection.c:4574
#13 0x000055c5f9af6309 in _dbus_loop_dispatch (loop=<optimized out>) at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/dbus/dbus-mainloop.c:532
#14 _dbus_loop_dispatch (loop=0x55c5fae78730) at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/dbus/dbus-mainloop.c:513
#15 _dbus_loop_iterate (loop=loop@entry=0x55c5fae78730, block=block@entry=1)
    at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/dbus/dbus-mainloop.c:862
#16 0x000055c5f9af66e5 in _dbus_loop_run (loop=0x55c5fae78730) at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/dbus/dbus-mainloop.c:888
#17 0x000055c5f9adfbfa in main (argc=<optimized out>, argv=<optimized out>) at /build/amd64-usr/var/tmp/portage/sys-apps/dbus-1.14.4/work/dbus-1.14.4/bus/main.c:750

in frame 4 a va_list is reused, and that could be the cause of the segfault. Dbus is attempting to log "Unknown class: dbus" so we might have an selinux issue as well.

@pothos
Copy link
Member Author

pothos commented Jan 9, 2023

Seems we only hit this error path because the first boot cmdline arg is missing and in the initrd we don't do the tmpfile setup then. I change this in flatcar/bootengine#50 to always do it. We could backport this to Beta/Alpha as a workaround.

@pothos pothos changed the title Alpha/Beta: dbus segfault (at least with PXE image) Alpha/Beta: corner-case dbus segfault in PXE image boot without first_boot=1 due to missing rootfs population Jan 9, 2023
@pothos
Copy link
Member Author

pothos commented Jan 9, 2023

With ./flatcar_production_pxe.sh -append 'flatcar.autologin flatcar.first_boot=1' we can make sure that initrd-setup-root runs and the rootfs is populated the right way. Normally it's used like this because one anyway wants to run Ignition.

pothos added a commit to flatcar/bootengine that referenced this issue Jan 9, 2023
In flatcar/Flatcar#944 we noticed that dbus
failed because of missing files in the rootfs when Ignition isn't
running in a PXE environment. This worked before by chance but the
underlying problem is that initrd-setup-root is required but wasn't
running because the "wants" symlinks in the module setup weren't
taking effect.

This is a backport of the relevant changes in
#50 to always run
initrd-setup-root instead of only when Ignition runs by pulling it in
from the two possible targets: either ignition-complete or
ignition-subsequent which runs in the other case.
The RequiresMountsFor=/sysroot/usr/share/oem directive didn't have any
effect because there was no mount unit defined. The directive
After=sysroot-usr.mount is already implied by
RequiresMountsFor=/sysroot/usr/.
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 9, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 9, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 9, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 10, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
@github-project-automation github-project-automation bot moved this from Upcoming / Backlog to Implemented in Flatcar tactical, release planning, and roadmap Jan 10, 2023
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 10, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 10, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 10, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
@pothos
Copy link
Member Author

pothos commented Jan 10, 2023

This was backported after the new releases were tagged, it will be fixed in future point releases (cf. the changelog).

t-lo pushed a commit to flatcar/scripts that referenced this issue Apr 17, 2023
This pulls in
c8399e42bb9651c3c108f916f6645557ab41884b which is a backport of the
relevant parts of flatcar/bootengine#50 to fix
flatcar/Flatcar#944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants