Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible strange behaviour with HDMI on first boot #51

Open
sihil opened this issue Mar 14, 2017 · 39 comments
Open

Possible strange behaviour with HDMI on first boot #51

sihil opened this issue Mar 14, 2017 · 39 comments

Comments

@sihil
Copy link

sihil commented Mar 14, 2017

Apologies for duplicating my post on the Pine64 forum. Unfortunately I'm unable to reply further due to an anti-spam measure that they have introduced on the forums (according to my IRC conversation, as a new user I have to wait three days before I can make my second post).

For completeness I'm going to include my original text:

I've observed a weird issue with the xenial-pine64-bspkernel-20161218-1.img image whilst trying to get it to run headless on my Pine64. Based on an evening of flashing and re-flashing SD cards I have concluded that:
If an HDMI display is connected on the first ever boot then it seems that the OS will NEVER boot without an HDMI display.
If NO HDMI display is connected on the first ever boot then the OS will boot happily - with or without a display for ever more.
This has tripped me up on an OpenHABian derivative image that exhibits the same behaviour (see issue at openhab/openhabian#105).

I figure there is a script that is running on the first ever boot that sets a piece of configuration differently depending on whether a display is connected or not. Thus far I've not figured out what that is or how to fix it so that a system booted with HDMI the first time can later be booted headless.

Sadly I do not have a serial cable for my P64 so am unable to see the console and figure out what's happening.

Sounds suspiciously like unintended behaviour - if anyone has any suggestions then I'd be glad to hear them.

@longsleep kindly replied thus:

Well this sounds strange. The only thing that happens on the first boot is generating keys. This takes a lot of computing power. May be the power supply is not sufficient for this and when HDMI is connected extra power is available through HDMI.

If the board does not not, do you know what the error is? Where is it stuck? How did you find out that it did not boot?

@sihil
Copy link
Author

sihil commented Mar 14, 2017

I'm using a Raspberry Pi 2A PSU that I had to hand so I'm reasonably confident that power is not an issue. Also, the issue only occurs on subsequent boots if an HDMI display was attached on the first boot - and it doesn't sound like it should be generating keys on subsequent boots.

My testing setup has been brutally simple: have it plugged into an ethernet port. My criteria as to whether it has booted or not is whether the interface comes up and I see traffic on the port. I've been leaving my laptop pinging the IP address. Crude, but effective and reproducible many times.

I was looking at dmesg output and noticed that the sunxi disp2 is initialised once on first boot and twice on subsequent boots. I have no idea if that's connected.

Sadly it's impossible to tell where it is stuck without a display or console attached. I've just ordered a USB/UART cable so I can do that (been regretting not buying the Pine64 adaptor in the first place). I might try seeing if I can connect it to the serial port of a raspberry pi tonight rather that waiting for that delivery.

I'd be intrigued to know if anyone else was able to re-produce it (or not able to re-produce it) - would give me more confidence that this is actually a thing rather than it being something silly that I've done or my particular board.

I'll write more when I discover anything new.

@longsleep
Copy link
Owner

Well, just to be clear. I have flashed my images many times and usually do not have HDMI connected at all ever. I gues the issue is specific to your particular setup.

@sihil
Copy link
Author

sihil commented Mar 14, 2017

Yes, and that works. Unfortunately I built a machine that happened to be connected to HDMI on first boot and now I can't unplug the display to hide it in a cupboard as it won't boot :(

The simplest answer for me is to rebuild it and start over (which is now my plan for tonight), but that won't solve it for future users and violates the principle of least surprise.

@pfeerick
Copy link
Contributor

I'll test that tonight, as I can't say with certainly I've done exactly that... connected with HDMI in the first instance, and then run the pine64 headless afterwards. I have mostly run it with HDMI connected all the time as it was a GUI image, or with no HDMI connected right from the start as I have run it with a console cable connected for the initial configuration.

btw, you should be able to post 1 message per day during the settling in period. If not, please send me a PM (same handle on the forum), as it means something has been misconfigured.

@sihil
Copy link
Author

sihil commented Mar 15, 2017

@pfeerick I am able to post again. It would be really helpful if you could add another line of text to the error page that indicates that rate limiting might be the reason.

I'm really interested to hear what your results are :)

@pfeerick
Copy link
Contributor

pfeerick commented Mar 15, 2017

I wasn't able to reproduce that behaviour. Here was my test methodology so we can verify we are on the same page.

I have booted a fresh image of Ubuntu (https://www.stdin.xyz/downloads/people/longsleep/pine64-images/ubuntu/xenial-pine64-bspkernel-20161218-1.img.xz). I plugged in a wireless USB keyboard/mouse dongle, ethernet, and HDMI. Powered up the pine64, let it boot up, logged in, rebooted. I pulled the HDMI as the pine64 was shutting down. Watched the ethernet lights, the pine64 came back up again, and I was able to log in via SSH.

So it has booted up with HDMI in the first instance, and had no problems. Booting up without the HDMI also appear to be fine. I tried powering up the pine64 up and down a few times, and it continued to start up flawlessly, so it wasn't a one off brought about by rebooting it.

My power supply is a 5A capable 12v to quad-usb converter, and it is tuned to the slightly higher voltage of 5.2v. Hopefully that will start to determine what is the cause of the problem. If you have a similar setup bar the power supply, then it does start sounding like it is power related.

@sihil
Copy link
Author

sihil commented Mar 15, 2017

Hmmm, curious. That does sound similar - except I have not plugged in a mouse or keyboard, just HDMI (that sounds ridiculous now I'm writing it down, but none the less).

I'll have another go tonight.

@longsleep
Copy link
Owner

Thanks for testing this. I am very interested in getting this resolved. @sihil do you have an alternative power supply which you could try? Preferably power via the PINs on the Euler connector.

Also connecting any extra USB devices like keyboard or mouse require even more power unless they are connected via a powered USB hub which then might in turn feed power to Pine64.

@pfeerick
Copy link
Contributor

Doesn't sound too ridiculous... you can always plug in the keyboard/mouse after the pine64 has booted and you can see stuff on the screen... or you might have the screen connected just to see boot messages ;)

Another thing to consider is kernel/uboot updates. If you had done that on the first boot, and something went wrong (it can happen, but it is likely to be power or sd card corruption related), that could be the cause, not the first boot with HDMI. In other words, don't do it (just in case that is the issue). And as longsleep said, alternate power supply to the euler pins would be great also, as that will provide more reliable power to the pine64.

@sihil
Copy link
Author

sihil commented Mar 16, 2017

I experienced the same issue again. I'll see if I can borrow a workbench PSU and do as you suggest.

@RyanRamchandar
Copy link

RyanRamchandar commented Mar 31, 2017

I am seeing similar behaviours that you are @sihil when I flashed the xenial-pine64-bspkernel-20161218-1.img. In my case my goal is to run headless, only access the board by ssh.

After flashing the board, I did not connect any cables except power (5V 2A) and ethernet. The board sometimes would come up though other times it would not. I read your post on the forum that it had some success when connecting an HDMI display so I tried that. And to my luck it came up just fine. I then unplugged the HDMI cable and used it headless.

However, if I reboot the board or power is lost, there is a good chance it won't come back up unless I connect an HDMI monitor and power cycle it a few times.

Note about power draw [1]:

On the 1GB and 2GB Pine64+ variants a DC5V/BAT POWER switch can be used to bypass the MT3608 boost converter (input voltage to 5V). If the board is powered from DC-IN (micro-USB or Euler connector), the DC5V setting connects the input voltage to the USB power supply rails, in BAT setting 5V is generated from any of the connected power sources (e.g. battery or DC-IN). The USB ports are current-limited to about 650mA per port in either setting.

Please be aware that when using the jumper in DC5V position an insufficient supply voltage is directly visible on the USB ports. If the Pine64+ is running on battery, the USB ports are only powered when the BAT setting is used.

[1] http://linux-sunxi.org/Pine64#DC5V.2FBAT_POWER_jumper

@longsleep
Copy link
Owner

longsleep commented Apr 1, 2017

@RyanRamchandar - so far i have seen no indication that there is a general issue with my image. I strongly suggest you get a better power supply or a lower AWG cable as i still think you guys suffer from a voltage drop which makes things go sideways on boot and HDMI just gives the extra juice to cope with that.

@TinkerBear
Copy link

I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.

Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.

@longsleep
Copy link
Owner

I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.

Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.

So what are you saying. It does not crash with your bench PSU? What is the reason for the capacitor? Did you try to slightly increase voltage with the bench PSU to 5.1V or 5.2V?

@TinkerBear
Copy link

Yes, with my bench supply (set at 5.00v as exactly as possible) no crash. With all my other power supplies it crashed. Didn't try a higher voltage on the bench supply, because it works fine.

Adding a capacitor between DC IN and GND on the Euler connector gets booting working on several of those supplies... most of the time (roughly 80%).

@whongx
Copy link

whongx commented Apr 11, 2017

Hi, i do encounter the same issue using headless image with kernel 3.10.105. However, it is not caused by HDMI but the ethernet. It cannot boot up at all and shows "BUG: soft lockup - CPU#0 stuck for 22s! " without ethernet plugged in but it sometimes can boot up successfully with ethernet plugged in. So, is it related to power supply issue too?

@longsleep
Copy link
Owner

@whongx yes - Ethernet draws quite some power and Gigabit Ethernet even more.

@whongx
Copy link

whongx commented Apr 11, 2017

@longsleep ok! But it cannot boot up when the ethernet is not plugged in. And I forget to mention that it does not encounter the issue when using kernel 3.10.104.

@longsleep
Copy link
Owner

@whongx what does it mean "cannot boot up" ? Do you have logs or at least an error message?

@zador-blood-stained
Copy link

@longsleep
Most likely related: similar issue can be reproduced with Armbian builds (your BSP kernel source with slightly different configuration). Kernel randomly stalls on boot with different stall to success rate depending on connected/disconnected Ethernet, connected/disconnected HDMI display, etc., but there is no clear conection between these factors.
Dmesg logs with stack traces can be found in attachments in this thread, I'm attaching one of them here:
BOOTFail_2017-04-15-C1.txt

According to my understanding it locks up somewhere here when setting up IRQ for the DE2 HDMI driver:

[   45.232803] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[   45.241520] [<ffffffc000125844>] __setup_irq+0x318/0x3e0
[   45.250792] [<ffffffc000125a84>] request_threaded_irq+0xe0/0x124
[   45.260858] [<ffffffc00041280c>] disp_sys_register_irq+0x88/0x98
[   45.270936] [<ffffffc000420610>] disp_hdmi_enable+0x1d4/0x278
[   45.280724] [<ffffffc000414540>] disp_device_attached_and_enable+0x1bc/0x1d4
[   45.291985] [<ffffffc0004146f8>] bsp_disp_device_switch+0xbc/0xe4
[   45.302194] [<ffffffc00040b50c>] start_work+0x174/0x1f0
[   45.311445] [<ffffffc0000cb788>] process_one_work+0x27c/0x42c
[   45.321274] [<ffffffc0000cc76c>] worker_thread+0x208/0x320
[   45.330810] [<ffffffc0000d27ec>] kthread+0xb4/0xbc

Part of the stack trace above this must be related to the watchdog that detects the lockup, but in case it doesn't it may be related to the arch timer bug referenced in longsleep/linux-pine64#44

I am using modified ATX power supply for tests connected to the pin header, so underpowering should not be an issue in my setup.

@longsleep
Copy link
Owner

I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.

@zador-blood-stained
Copy link

I'm getting these lockups with no USB devices connected (even got one today with another good power supply when I was testing u-boot changes). While the problem can be power related stack traces look too strange to me,
Also one time I got this log pine64-lockup-debug3.txt - it didn't happen in initrd as usual but much later in the boot process.

Anyway I'll try to test the PR changes later.

@longsleep
Copy link
Owner

Yes - i doubt that the USB change does fix lock-ups which happen later. I will also merge your backport-fsl-errata.patch now after reading up on the issue. But as you probably use a Kernel with that patch already this also does not fix every issue. That FSL fix might resolve longsleep/linux-pine64#44 though.

@zador-blood-stained
Copy link

Yes - i doubt that the USB change does fix lock-ups which happen later.

The stack traces for the "stuck" kworker look too similar in both cases, so it looks like the same issue. And since I enabled a lot of debugging options for spinlocks and mutexes, each time HDMI lock was still held by disp_hdmi_enable() function.
Unfortunately it's still not clear what IRQs correspond to lines like el1_irq+0x84/0xec.

@longsleep
Copy link
Owner

I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.

longsleep/linux-pine64#56 makes USB crash less often but it still crashes a lot on boot with "MOSART Semi. Rapoo 2.4G Wireless Touch Desktop" plugged in. Also the FSL fix does not help.

@longsleep
Copy link
Owner

Btw, on Pinebook with exactly same Kernel - it works just fine every time.

@zador-blood-stained
Copy link

@longsleep
Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?

@longsleep
Copy link
Owner

longsleep commented May 14, 2017

@longsleep
Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?

@zador-blood-stained - Yes, very similar to pine64-lockup-debug3.txt - it has

[   39.838477] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:30]                       
[   39.851912] Modules linked in:                                                             
[   39.861726]                                                                                
[   39.869831] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 3.10.105-- #35                    
[   39.883727] Workqueue: events start_work                                                   
[   39.894722] task: ffffffc078b52f80 ti: ffffffc078b54000 task.ti: ffffffc078b54000          
[   39.909764] PC is at __do_softirq+0xb4/0x2d8                                               
[   39.921341] LR is at __do_softirq+0x30/0x2d8 

and

[   44.313504] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[   44.323414] [<ffffffc00012584c>] __setup_irq+0x318/0x3e0
[   44.333885] [<ffffffc000125a8c>] request_threaded_irq+0xe0/0x124
[   44.345147] [<ffffffc00040f004>] disp_sys_register_irq+0x88/0x98
[   44.356431] [<ffffffc00041cf9c>] disp_hdmi_enable+0x1d4/0x278
[   44.367423] [<ffffffc000410d38>] disp_device_attached_and_enable+0x1bc/0x1d4
[   44.379876] [<ffffffc000410ef0>] bsp_disp_device_switch+0xbc/0xe4
[   44.391253] [<ffffffc000407d04>] start_work+0x174/0x1f0
[   44.401655] [<ffffffc0000cb784>] process_one_work+0x27c/0x42c
[   44.412623] [<ffffffc0000cc768>] worker_thread+0x208/0x320
[   44.423315] [<ffffffc0000d27f0>] kthread+0xb4/0xbc
[   44.433240] kworker/1:1     S ffffffc0000853b8     0  

and

   45.225365] [<ffffffc0000853b8>] __switch_to+0x7c/0x88                           [445/9673]
[   45.235455] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[   45.245628] [<ffffffc000724780>] schedule+0x74/0x7c
[   45.255409] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[   45.266012] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[   45.276588] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[   45.287325] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[   45.297312] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[   45.308281] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[   45.318875] [<ffffffc0001a4d54>] migrate_prep+0x14/0x20
[   45.328979] [<ffffffc000167d78>] alloc_contig_range+0xb8/0x26c
[   45.339729] [<ffffffc000493884>] dma_alloc_from_contiguous+0xa4/0x12c
[   45.351152] [<ffffffc0000928cc>] __dma_alloc_coherent+0xb0/0x118
[   45.362088] [<ffffffc000092a00>] __dma_alloc_noncoherent+0xcc/0x158
[   45.373319] [<ffffffc00019979c>] dma_pool_alloc+0xf0/0x1c4
[   45.383705] [<ffffffc0004ef388>] ehci_qh_alloc+0x4c/0xc4
[   45.393894] [<ffffffc0004f1408>] ehci_init+0x13c/0x3b8
[   45.403875] [<ffffffc0004f16a4>] sunxi_ehci_setup+0x20/0x38
[   45.414303] [<ffffffc0004de7a8>] usb_add_hcd+0x1c8/0x5a8
[   45.424417] [<ffffffc0004f5560>] sunxi_insmod_ehci+0x118/0x218
[   45.435096] [<ffffffc0004f56d8>] sunxi_usb_enable_ehci+0x78/0x88
[   45.445982] [<ffffffc00051144c>] usb_msg_center+0x88/0x104
[   45.456307] [<ffffffc00051057c>] usb_host_scan_thread+0x54/0x68
[   45.467110] [<ffffffc0000d27f0>] kthread+0xb4/0xbc

and

[   47.357995] [<ffffffc0000853b8>] __switch_to+0x7c/0x88
[   47.368085] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[   47.378228] [<ffffffc000724780>] schedule+0x74/0x7c
[   47.387959] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[   47.398562] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[   47.409169] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[   47.419962] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[   47.429992] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[   47.440953] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[   47.451515] [<ffffffc0001e5b24>] invalidate_bdev+0x30/0x4c
[   47.461872] [<ffffffc0002453b4>] ext4_put_super+0x264/0x2ec
[   47.472336] [<ffffffc0001b24d8>] generic_shutdown_super+0x68/0xd4
[   47.483396] [<ffffffc0001b27c0>] kill_block_super+0x30/0x7c
[   47.493872] [<ffffffc0001b2b44>] deactivate_locked_super+0x44/0x74
[   47.505016] [<ffffffc0001b2fb4>] deactivate_super+0x68/0x74
[   47.515443] [<ffffffc0001cdbd0>] mntput_no_expire+0x158/0x168
[   47.526039] [<ffffffc0001cef48>] SyS_umount+0x34c/0x36c

I have a rather reliable setup to reproduce this. With the new USB drivers it is less likely to trigger. I boot to initrd only (have simpleimage without rootfs). It just booted 4 times in a row without issue and then crashed twice in a row like this.

I am powering through euler and have HDMI connected (but that does not seem to matter). When i disconnect the USB Keyboard/Mouse dongle it never crashes. Also i can connect the dongler at any time later and it also does not crash.

@longsleep
Copy link
Owner

I tested this in detail yesterday. It still can crash exactly like with even when powered at 5.2V via Euler. It never draws more than 400mA during bootup either.

@zador-blood-stained
Copy link

I did some more tests and compiled the kernel with debug info. Looks like it's actually stuck in a softirq, but it's relatively hard to debug since the stack trace is be incomplete in this case and I'm not sure if the info I got after applying an extra patch is correct

[   42.584359] Last softirq was rcu_process_callbacks+0x0/0x3f8

@Icenowy
Copy link

Icenowy commented Mar 5, 2018

P.S. it seems that this behavior also occured on my SoPine w/ Baseboard, running mainline kernel w/ HDMI driver patched. Strange.

@skjaeve
Copy link

skjaeve commented Oct 25, 2018

I am experiencing a HDMI bug too - if a HDMI cable is plugged in to the HDMI port, the A64 boots fine after a power cycle. If there is no HDMI cable, it may or may not boot.

There is nothing connected at the other end of the HDMI cable. I am running Xenial with Longsleep kernel.

Workaround: Keep a HDMI cable plugged in.

@longsleep
Copy link
Owner

I am experiencing a HDMI bug too - if a HDMI cable is plugged in to the HDMI port, the A64 boots fine after a power cycle. If there is no HDMI cable, it may or may not boot.

There is nothing connected at the other end of the HDMI cable. I am running Xenial with Longsleep kernel.

Workaround: Keep a HDMI cable plugged in.

Most likely the HDMI cable feeds enough extra power to the device that the voltage does not drop on load. Means your power supply solution is to blame and not sufficient.

@skjaeve
Copy link

skjaeve commented Oct 31, 2018

Unlikely, since there's nothing plugged in at the other end of the HDMI cable.

The power supply is the model recommeded in the Pine64 store at the time of purchase.

@mitchmitchell
Copy link

I don't think this is a power supply issue -- I see this happening on two of my boards (bought from separate lots) with about a 30% successful boot rate sometimes. Both boards exhibit this behavior while running off a bench supply powered through the Euler bus at as high as 6 volts (I've not risked going any higher). The crashes happen on all the images I've tried though the behavior is different on each one. Sometimes I can get things to boot more reliably on an image and it will stay about 80% reliable once it boots successfully a few times. I can post output from the serial console if there is any interest.

@longsleep
Copy link
Owner

Well this still is an issue - so feel free to post your findings here in case someone is willing to take a detailed look. If it is HDMI related it might be an idea to get rid of this driver and all related to it.

@mitchmitchell
Copy link

Let me try some experiments and see what I come up with. Is there a way to turn off the HDMI driver completely? The most reliable boot image has been Android, but I've been using debian and xubuntu since I want to run a headless server with these units. I have successfully upgraded one unit to bionic beaver (haven't tried with the other one) but the /boot partition has to be enlarged for the do-release-upgrade to work (I can open another issue to cover that if you like). The bionic beaver image also exhibits this behavior.

@zsolt67
Copy link

zsolt67 commented Jan 17, 2019

I have the same problem. Is there any solution?

@mitchmitchell
Copy link

I think I may have taken care of the problem on my two boards by manually setting the monitor resolution to a valid value using the Mate desktop app. I was always seeing an error message from the HDMI driver about invalid resolution right before the boot would hang. Now that I have set the resolution value I don't see the error message anymore and my boards have been booting ok -- I THINK -- I have that caveat because my boards have been up and running continuously over last few weeks so I have not done much testing yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests