-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589
Comments
But all the ip setting stuff is part of CONFIG_ROOT_NFS. e.g. the ip kernel command line parameter is part of nfsroot https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt Though it is (ab)used for other purposes, e.g. 9P2000 root. Ultimately, you should be assigning an IP with userspace configuration (e.g. /etc/network/interfaces). I personally wouldn't consider the behaviour you've encountered a bug. |
It's a bug because it's well documented that you can assign an IP using the cmdline.txt ip=vvv.xxx.yyy.zzz and there's no checking that it's being used for an NFS rootfs. If the parm is only to be used with NFS then it should barf earlier or (probably not a good idea) silently ignore it. I agree if the rootfs was on an NFS device that there's no point continuing, if the connection doesn't come active, but that should bail out with an oops after a few more turns round the loop (with retries set to a higher value). It's never sensible to solidly hang the boot in an endless loop. It's a convenience thing to be able to set an IP address for the ethernet interface when you can't access the ext4 filesystem. My RPis normally run with fixed addresses from the 10.1.1.0/24 block so I could fiddle with the windows side to fix the IP address there, but popping the SDCard and updating cmdline.txt is easier (and if it didn't hang I could set it and forget it). |
Wait a minute, the code you post doesn't really explain the issue you're seeing. Why would ROOT_DEV == Root_NFS be true? |
That appears to be the only place in the code where we loop back to try_try_again without decrementing retries. |
Sure, though if ROOT_DEV is set to Root_NFS without passing root=/dev/nfs on the kernel command line it seems that's where the bug really is (and it seems it would be an upstream bug). |
Removing CONFIG_ROOT_NFS is not an option. I do all my development with an nfs mounted rootfs, and it appears a very common configuration. I assume (from your description) you are not seeing the message "IP-Config: Retrying forever (NFS root)..."? Are you sure that is where it gets stuck? |
I'm going to build a new kernel with debugging set (I may even put some extra messages in). I thought I'd found the hang from reading the code (which handles the ip=vvv.xxx.yyy.zzz parm). I'd have thought giving it five minutes (rather than endless) before doing something to tell the user the boot isn't going to complete would have been a better design. Perhaps I'm too used to seeing IBM mainframe operating systems set disabled wait states when their initial program load can't continue. |
It's amazing what you see when you add some debugging with
There's a delay loop (120 seconds) before ipconfig.c gives up the ghost and carries on. I guess I've never been patient enough to wait that long before pulling the power and giving up.
to be made smaller or I could just accept that boot is going to hang for two minutes when I'm stupid enough to define ip=vvv.xxx.yyy.zzz but not connect a wire. |
commit 09712f5 upstream. When resuming from s2ram on an SMP system without cpufreq operating points (e.g. there's no "operating-points" property for the CPU node in DT, or the platform doesn't use DT yet), the kernel crashes when bringing CPU 1 online: Enabling non-boot CPUs ... CPU1: Booted secondary processor Unable to handle kernel NULL pointer dereference at virtual address 0000003c pgd = ee5e6b00 [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000 Internal error: Oops: a07 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1246 Comm: s2ram Tainted: G W 3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589 task: eeec5240 ti: ee704000 task.ti: ee704000 PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c LR is at __cpufreq_add_dev.isra.24+0x244/0x77c pc : [<c0298efc>] lr : [<c0298ef4>] psr: 60000153 sp : ee705d48 ip : ee705d48 fp : ee705d84 r10: c04e0450 r9 : 00000000 r8 : 00000001 r7 : c05426a8 r6 : 00000001 r5 : 00000001 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : 20000153 r0 : c0542734 Verify that policy is not NULL before dereferencing it to fix this. Signed-off-by: Geert Uytterhoeven <[email protected]> Fixes: 8414809 (cpufreq: Preserve policy structure across suspend/resume) Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
When resuming from s2ram on an SMP system without cpufreq operating points (e.g. there's no "operating-points" property for the CPU node in DT, or the platform doesn't use DT yet), the kernel crashes when bringing CPU 1 online: Enabling non-boot CPUs ... CPU1: Booted secondary processor Unable to handle kernel NULL pointer dereference at virtual address 0000003c pgd = ee5e6b00 [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000 Internal error: Oops: a07 [raspberrypi#1] SMP ARM Modules linked in: CPU: 0 PID: 1246 Comm: s2ram Tainted: G W 3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty raspberrypi#589 task: eeec5240 ti: ee704000 task.ti: ee704000 PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c LR is at __cpufreq_add_dev.isra.24+0x244/0x77c pc : [<c0298efc>] lr : [<c0298ef4>] psr: 60000153 sp : ee705d48 ip : ee705d48 fp : ee705d84 r10: c04e0450 r9 : 00000000 r8 : 00000001 r7 : c05426a8 r6 : 00000001 r5 : 00000001 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : 20000153 r0 : c0542734 Verify that policy is not NULL before dereferencing it to fix this. Signed-off-by: Geert Uytterhoeven <[email protected]> Fixes: 8414809 (cpufreq: Preserve policy structure across suspend/resume) Cc: 3.12+ <[email protected]> # 3.12+ Signed-off-by: Rafael J. Wysocki <[email protected]>
I quite often connect my RPi to my Windows system with a direct cable using the 169.254.xxx.xxx address scheme. By assiging 169.254.3.14 I can easily find my RPi. It's a convenient way to work away from home to get connected before the WiFi (which often needs a password or web page interaction) is running.
If I bring the machine home and don't connect a cat5 cable to my home router (or connect to my laptop) then the kernel hangs during boot. I get the splash screen and the Raspberry logo and nothing more.
Looking in the code and /proc/config.gz I think I've found the cause.
The kernel config has CONFIG_ROOT_NFS=y
So when we run in net/ipv4/ipconfig.c the retries count is ignored and we loop round
which causes the boot to hang.
The quick resolution is simple
The permanent fix is to reset CONFIG_ROOT_NFS and rebuild the kernel.
The text was updated successfully, but these errors were encountered: