-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootloop: Unable to handle kernel NULL pointer dereference at virtual address 000002d2 #1
Comments
Hi @ananjaser1211, could you please take a look at this? |
Please un-pin the other mentions you have as those people do not work on this platform nor this kernel, thank you for the excellent bug report however according to DMSG your WIFI CHIP is failing bringup via PCIE. To me this screams hardware fault, however i suggest you try running stock ROM and see if the issue persists, if it does you have some fault in your WIFI Chip, possibly when it goes to sleep and disconnects from PCIE bus. If however the issue is not present in stock MM ROM, please backup your existing ROM and try this lineageOS build, and see if the issues shows up there or not, it uses "stock" WiFi driver and blobs, i am not 100% sure as Refined nougat has been made almost 5 years ago, im not sure what wifi driver it uses in the ROM itself, but the kenrel uses a newer Wifi driver which recently (like 3 months ago) we found causes wifi issues on bring up, though never a kernel panic. please try the above and let me know, i will see if i can get the source compiling again in case we need to change the driver, ROM wise for now i dont have any suggestions.
|
This is the Country issue i was referencing, more details can be found here universal5433/android_device_samsung_trelte-common#33 (comment) However, this code / setter happens at a much later state than what your logs are reaching, i.e your phone is not able to bring up the PCIE device itself |
I will try the stock While I do agree it may be the case, but independent of physical failure, there shouldn't be any null pointer dereference at all. The code should prevent it from happening. Why would you want to dereference a null pointer in the first place? At least the code should let me use everything on the phone except the Wi-Fi. After flashing the Stock In the meantime, could you please fix this pointer issue? I've seen the drivers code has thousands of line, is it really way too costly to fix this one? |
From what i see the NPE happens as BCM does not fully fail to initialize, it loads, and then half way through initialization it fails, it seems that samsung did not implement any safe guards regarding your particular case in the driver, between bcmdhd driver and exynos_pcie there are thousands of lines of code, your particular error is dhd_host_recover_link itself passes NPE, the bcm chip failed to initialize much earlier in a different set of code (dhd_wifi_platform_load_pcie) i added a basic NPE check for dhd_host_recover_link but there is no guarantee it will work, to bcm either wifi PCIE configuration is loaded or not, not half way through ive also switched to the "should be" more stable BCMDHD4358 driver, instead of 4358a3 driver, while its newer, it has been problematic in lineage OS, perhaps it causes issues too that went undiscovered till now. Helios_Kernel-V3.2-N910C.H-20231227.zip Unpack this zip and flash the image file through recovery (install > img file > select the kernel > BOOT partition) regarding your device rebooting on samsung logo im not sure, its not really a good sign but hopefully its just a bad battery, the same can be said about BCM by the way, ive seen similar failures with MODEM not initializing due to weak batteries, but that did not result in an NPE just loss of signal at around 20% charge i personally dont have any samsung documentation to fully understand how these drivers talk to each other, this is common with lousy samsung code,the NPE check is just a standard NPE > return, but i suspect other parts of the bcm/pcie stack will show errors, will see |
@ananjaser1211 Okay the battery has arrived. I'd like to test the stability of the Wi-Fi, but there is kernel panic error with the new kernel on RefinedNougat that causes boot-loop again.
last_kmsg_2024-01-07T20.01.44.log |
Odd that even the lineage build is looping hmm, the 3.2 kernel on refined, does it boot the system at all or instantly KPs on splash ? i dont remember seeing that hotplug crash, i see android services so i assume it boots, but then crashes |
It did boot. I was able to see the home screen, then it reboot itself. |
Thank you, im wiping my note 4 right now to install RN to see if it also crashes here, from the hotplug code its trying to bring up CPU and failing > and going to PANIC state _cpu_up: attempt to bring up CPU 1 failed Im hopeful this is some shenanigans due to my build system, as i have not compiled helios in years prior to 3.2, so it should also crash on my phone and i can directly try some stuff unless its some bizarre hardware problem. ill be flashing 7.1.1 T2 |
@Unknown78 well bad news, the kernel is running fine, hotplug is fine, i tried to lock/unlock the display alot after finishing setup wizard and everything was in order, nonetheless, i have compiled a kernel with hotplug disabled, and some patches we made in lineage for hotplug (it disables HP after the device is booted) i have tried both Helios_Kernel-V3.2-N910C.H-20231227 and Helios_Kernel-V3.2-N910C.H-20240107 and both are working as expected. regarding WiFi i also connected just fine. Helios_Kernel-V3.2-N910C.H-20240107.zip I disabled "HOTPLUG_POWERSAVING" Mode which turns off all but one CPU core when the display is OFF, it might be that your board is too sensitive to that operation. |
Thank you and sorry for the wait, finally I have more free time. So in summary: For the LineageOS, I've flashed:
And that results in bootloop right after Samsung Galaxy Note 4 logo. And also when I've turned off the phone, and then plug my charger, either it will stuck on the charging logo, or it will bootloop charge right after that. I'd like to log it, but And for the RefinedNote8 with all of As for the WiFi, it stuck when I toggle the slider, it failed with last_kmsg_2024-01-07T22.40.19.log Now one of the bizzare thing is that this isn't apply only to system mode but also in recovery mode of TWRP. I wonder why sometimes I have bootloop when entering recovery mode, only after few times I got in. These are the logs with twrp-3.7.0_9-0-treltexx.img.tar twrp_last_kmsg_2024-01-08T19.10.35.log
@ananjaser1211 It seems I also need recovery image with proper kernel for the recovery mode. |
@Unknown78 Regarding Recovery, it uses the same kernel that is used by system, all hardware gets parsed and initialized, Wifi included, and it seems to be crashing just like the helios 3.1, so it needs the NPE patch too, but at the very least this confirms that what is going on is not a ROM or kernel specific issue but a hardware initialization failure as TWRP kernel is a kernel based on stock samsung source~ish. once i have some time i will apply the patches and send you a TWRP build, but its clear that to boot any ROM, stock included, you need to patch the NPE in wifi as that code is the same for all kernels including stock. this ofcourse won't make WiFi work due to "dhdpcie_bus_attach" we just add a check that prevents kernel panic, and continue operation without WiFi when pcie is unable to connect to WiFi. This is the patch i added 485d53e The other bug was entering LOW_POWER_CMD mode when screen is locked, aka hotplugging, Which is disabled with this ramdisk entry 2472c78 These two patches are needed on any kernel you want to use on your phone, im still not sure how such a breakage happened, as its the first time i see a partial hardware failure, regardless, when i get more time i will build a twrp kernel with those patches applied, for RefinedNougat you can use the kernel sent above. |
Hi @ananjaser1211, are you still busy? |
Apologies, @Unknown78 been an extremely busy month, unpack the zip, and flash the img in twrp or the .tar in odin as normal
|
@ananjaser1211 With the Helios_Kernel-V3.2-N910C.H-20240107.zip
last_kmsg_2024-02-14T09.25.11.log There's at least the same 42 exceptions in that log. The issue seems to be related to TCP, so it is internet thing? P.S. After the crash reboot my SIM card went invalid/undetected, then I reboot again, and it's fine. Might be related: |
It seems to be unwise to continue here since it clutters a lot, I'll create proper new issues. @ananjaser1211 14th February, 2024: #2 Exception at net/ipv4/tcp_output.c:2026 tcp_send_loss_probe |
@ananjaser1211 |
Background
Device: Samsung Galaxy Note 4 SM-N910H
Architecture: armeabi-v7a
OS Version: Android Nougat 7.1.1 (API 25)
System:
Boot:
Bootloader:
Modem:
Partition Table:
Viewer PIT_Viewer_v1.04.7z
Recovery:
Steps to reproduce
Just daily drive as usual, eventually you got this.
There'll be app optimizing in the beginning (sometime is not), then after complete, instant reboot.
Changing Magisk version has no effect.
Pure Helios Kernel without Magisk has no effect.
P.S. Sometimes when entering twrp recovery mode, it fails and then return to boot to system.
P.S. Sometimes when charging on power off, it fails to show the percentage, and the charging logo keeps rebooting.
P.S. Sometimes if wifi is enabled, it will disable itself later and it will stuck unable to toggle, eventually it will also bootloop. This one also happened on the stock rom and kernel.
Log
last_kmsg_2023-12-17T19.26.22.log
boot_logcat_2023-12-17T19.18.42.log
dmesg_2023-12-17T19.26.20.log
proc_mounts_2023-12-17T19.26.21.log
magisk_2023-12-25T07.34.00.log
Some highlights:
The text was updated successfully, but these errors were encountered: