Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589

Closed
DougieLawson opened this issue May 9, 2014 · 8 comments
Closed

Comments

@DougieLawson
Copy link

I quite often connect my RPi to my Windows system with a direct cable using the 169.254.xxx.xxx address scheme. By assiging 169.254.3.14 I can easily find my RPi. It's a convenient way to work away from home to get connected before the WiFi (which often needs a password or web page interaction) is running.

If I bring the machine home and don't connect a cat5 cable to my home router (or connect to my laptop) then the kernel hangs during boot. I get the splash screen and the Raspberry logo and nothing more.

Looking in the code and /proc/config.gz I think I've found the cause.

The kernel config has CONFIG_ROOT_NFS=y
So when we run in net/ipv4/ipconfig.c the retries count is ignored and we loop round

#ifdef CONFIG_ROOT_NFS
                        if (ROOT_DEV ==  Root_NFS) {
                                pr_err("IP-Config: Retrying forever (NFS root)...\n");
                                goto try_try_again;
                        }
#endif

which causes the boot to hang.

The quick resolution is simple

  1. Pull the card, edit cmdline.txt to drop the ip=169.254.3.14 and reboot
  2. Wire the ethernet to my laptop

The permanent fix is to reset CONFIG_ROOT_NFS and rebuild the kernel.

@asb
Copy link

asb commented May 9, 2014

But all the ip setting stuff is part of CONFIG_ROOT_NFS. e.g. the ip kernel command line parameter is part of nfsroot https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt Though it is (ab)used for other purposes, e.g. 9P2000 root.

Ultimately, you should be assigning an IP with userspace configuration (e.g. /etc/network/interfaces). I personally wouldn't consider the behaviour you've encountered a bug.

@DougieLawson
Copy link
Author

It's a bug because it's well documented that you can assign an IP using the cmdline.txt ip=vvv.xxx.yyy.zzz and there's no checking that it's being used for an NFS rootfs. If the parm is only to be used with NFS then it should barf earlier or (probably not a good idea) silently ignore it.

I agree if the rootfs was on an NFS device that there's no point continuing, if the connection doesn't come active, but that should bail out with an oops after a few more turns round the loop (with retries set to a higher value). It's never sensible to solidly hang the boot in an endless loop.

It's a convenience thing to be able to set an IP address for the ethernet interface when you can't access the ext4 filesystem. My RPis normally run with fixed addresses from the 10.1.1.0/24 block so I could fiddle with the windows side to fix the IP address there, but popping the SDCard and updating cmdline.txt is easier (and if it didn't hang I could set it and forget it).

@asb
Copy link

asb commented May 9, 2014

Wait a minute, the code you post doesn't really explain the issue you're seeing. Why would ROOT_DEV == Root_NFS be true?

@DougieLawson
Copy link
Author

That appears to be the only place in the code where we loop back to try_try_again without decrementing retries.

@asb
Copy link

asb commented May 9, 2014

Sure, though if ROOT_DEV is set to Root_NFS without passing root=/dev/nfs on the kernel command line it seems that's where the bug really is (and it seems it would be an upstream bug).

@popcornmix
Copy link
Collaborator

Removing CONFIG_ROOT_NFS is not an option. I do all my development with an nfs mounted rootfs, and it appears a very common configuration.

I assume (from your description) you are not seeing the message "IP-Config: Retrying forever (NFS root)..."? Are you sure that is where it gets stuck?

@DougieLawson
Copy link
Author

I'm going to build a new kernel with debugging set (I may even put some extra messages in). I thought I'd found the hang from reading the code (which handles the ip=vvv.xxx.yyy.zzz parm).

I'd have thought giving it five minutes (rather than endless) before doing something to tell the user the boot isn't going to complete would have been a better design. Perhaps I'm too used to seeing IBM mainframe operating systems set disabled wait states when their initial program load can't continue.

@DougieLawson
Copy link
Author

It's amazing what you see when you add some debugging with

#define IPCONFIG_DEBUG

There's a delay loop (120 seconds) before ipconfig.c gives up the ghost and carries on. I guess I've never been patient enough to wait that long before pulling the power and giving up.
I could petition for

#define CONF_CARRIER_TIMEOUT    120000  /* Wait for carrier timeout */

to be made smaller or I could just accept that boot is going to hang for two minutes when I'm stupid enough to define ip=vvv.xxx.yyy.zzz but not connect a wire.

popcornmix pushed a commit that referenced this issue Nov 22, 2014
commit 09712f5 upstream.

When resuming from s2ram on an SMP system without cpufreq operating
points (e.g. there's no "operating-points" property for the CPU node in
DT, or the platform doesn't use DT yet), the kernel crashes when
bringing CPU 1 online:

    Enabling non-boot CPUs ...
    CPU1: Booted secondary processor
    Unable to handle kernel NULL pointer dereference at virtual address 0000003c
    pgd = ee5e6b00
    [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000
    Internal error: Oops: a07 [#1] SMP ARM
    Modules linked in:
    CPU: 0 PID: 1246 Comm: s2ram Tainted: G        W      3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589
    task: eeec5240 ti: ee704000 task.ti: ee704000
    PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c
    LR is at __cpufreq_add_dev.isra.24+0x244/0x77c
    pc : [<c0298efc>]    lr : [<c0298ef4>]    psr: 60000153
    sp : ee705d48  ip : ee705d48  fp : ee705d84
    r10: c04e0450  r9 : 00000000  r8 : 00000001
    r7 : c05426a8  r6 : 00000001  r5 : 00000001  r4 : 00000000
    r3 : 00000000  r2 : 00000000  r1 : 20000153  r0 : c0542734

Verify that policy is not NULL before dereferencing it to fix this.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Fixes: 8414809 (cpufreq: Preserve policy structure across suspend/resume)
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
julianscheel pushed a commit to julianscheel/linux that referenced this issue Mar 10, 2015
When resuming from s2ram on an SMP system without cpufreq operating
points (e.g. there's no "operating-points" property for the CPU node in
DT, or the platform doesn't use DT yet), the kernel crashes when
bringing CPU 1 online:

    Enabling non-boot CPUs ...
    CPU1: Booted secondary processor
    Unable to handle kernel NULL pointer dereference at virtual address 0000003c
    pgd = ee5e6b00
    [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000
    Internal error: Oops: a07 [raspberrypi#1] SMP ARM
    Modules linked in:
    CPU: 0 PID: 1246 Comm: s2ram Tainted: G        W      3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty raspberrypi#589
    task: eeec5240 ti: ee704000 task.ti: ee704000
    PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c
    LR is at __cpufreq_add_dev.isra.24+0x244/0x77c
    pc : [<c0298efc>]    lr : [<c0298ef4>]    psr: 60000153
    sp : ee705d48  ip : ee705d48  fp : ee705d84
    r10: c04e0450  r9 : 00000000  r8 : 00000001
    r7 : c05426a8  r6 : 00000001  r5 : 00000001  r4 : 00000000
    r3 : 00000000  r2 : 00000000  r1 : 20000153  r0 : c0542734

Verify that policy is not NULL before dereferencing it to fix this.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Fixes: 8414809 (cpufreq: Preserve policy structure across suspend/resume)
Cc: 3.12+ <[email protected]> # 3.12+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants