-
-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FTL crashed latest master branch (newer than my last post, different Pihole) #1124
Comments
A completely different Pihole server is crashing now. Same setup though. [2021-04-24 10:56:35.059 87271/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1 |
This is likely the same issue as #1119, closing the former to keep focused. The issue seems to be with many (> 1500) clients being fairly active on TCP DNS. However, some clients do not behave correctly and just timeout instead of properly closing the connection:
instead of
Could you try
and check if this resolves the issue you're seeing? |
Copy, added to the servers in question to see if issues crop up again. |
Sadly, I had another crash of some sort but no log information which is strange. The server was responsive and I don't see anything within server logs itself. I noticed the outage and restarted pihole at 12:52. I realize this is 99.99% unhelpful but just relaying. [2021-04-28 07:41:19.028 28405/F4942] TCP worker terminating (client disconnected) |
Thanks for coming back to me with this log excerpt. Do you have a feeling for how long it was unresponsive until you restarted? Do you know if the activity at If you still have logging enabled, could you please also check |
I believe it was unresponsive since the last query until I was told that DNS was super slow. Saw that this piHole (one of the two on the network) was down. I think a number of devices just use this one only for whatever reason so makes sense that things weren't loading. Not much in the log. I figured this is super unhelpful but worth mentioning that an outage occurred. Apr 28 07:41:19 dnsmasq[28202]: query[A] googleads.g.doubleclick.net from 127.0.0.1 |
If you have enough free disk space, please add
to the file
Next time the issue happens, please include again the last few log lines. |
Copy, we shall see. Got about 30GB - 50GB free on the two I'm testing on. |
One of the two servers that experience this issue has stopped responding again. UI just spins as well, doesn't show stats or claim FTL is offline. [2021-04-29 11:13:00.964 4788/T4792] Removed lock in add_FTL_clients_to_network_table() (/root/project/src/database/network-table.c:757) Last pihole.log actions: Apr 29 11:13:05 dnsmasq[21475]: query[A] dm16-useast1a.byteoversea.com from 127.0.0.1 |
Sorry quick followup of not much but still.. pihole restartdns[✗] Job for pihole-FTL.service failed because the control process exited with error code. SystemCTL output: Apr 29 13:09:16 pihole pihole-FTL[22569]: Not stopped; may still be shutting down or shutdown may have failed, killing now |
Okay, so the problem is actually somewhere else. Can you also add
to your conf file and try again? I'm sure we'll get down to it in the end. |
Done. My other testing unit was unresponsive. The only line in the FTL log was: [2021-04-29 17:00:00.830 62997/T63003] Waiting for SHM lock in resolveClients() (/root/project/src/resolve.c:364) Huge gap for sure there. The pihole.log was equally nothing fun: Apr 29 16:21:05 dnsmasq[34240]: query[A] gjapplog.ucweb.com from 127.0.0.1 |
Quick update, no crash as of yet on either testing units. I check twice a day to ensure there's eyes on it. Whooo knows. |
Got another crash. Not much different in the logs... Pihole.conf May 4 09:02:47 dnsmasq[40875]: query[A] mail.gandi.net from 127.0.0.1 ...FTL.conf [2021-05-04 09:02:47.619 40875/F1057] Removed lock in _FTL_reply() (/root/project/src/dnsmasq_interface.c:1138) |
The issue here is that the fork is terminated while it is serving a query.
We have countermeasures in place but they don't work reliably in your case. I'll try reproducing this locally next week. |
Sounds good, I can leave debugging on if you want and test as you find things. Just let me know! |
@derekcentrico Please update the fix and try again. |
Sadly now the FTL daemon won't start. |
@derekcentrico Do you have some log lines for me? |
I just pushed another change that may give more helpful log lines. |
Not with a laptop for a few days and trying to do it via mobile. Basically it installs and then hangs on restarting service. When trying to start the service manually it wants systemctl daemon-reload. I do that and it still fails. Says failed to start LSB for timeout. Journalctl isn't helpful. |
Maybe you can try
via the phone's SSH client. A screenshot would be fine, too, no need to copy the text if it is too complicated. |
Not the best from mobile probably, but it seems to just be stuck in this infinite loop when the service tries to start. ������� [2021-05-11 07:19:13.921 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434) |
I don't think this is an infinite loop, the routine is called very often when importing queries from the database. The issue will happen thereafter. However, I also see that you can import 15 queries within about 70 msec. Assuming you may have on the order of 15.000 queries in 24 hours, this means FTL needs more than one minute for starting. Maybe try without debug logging for now to see if it works at all. We can continue debugging when you are in a better position. |
Sadly it didn’t work still w/ that disabled. Crashes. Won’t start. Gotta revert to master each time. Nothing much in logs. root@dns:/home/dns# nano /etc/pihole/pihole-FTL.conf [✓] Branch fix/lock_gravityClose exists |
Okay, this does not suggest a crash. So let's wait until you're in a better position to look at the debug output once you're back. Enjoy your computer-free time! |
Hey, I'll be back in the morning. What would you like me to do? Currently on master with all debug disabled so we're on the same page. |
Allrighty, we are operational with the dev build. Not sure what was going on. Master branch crashed overnight and I pulled the FTL update without issue this time. pihole checkout ftl fix/lock_gravityClose [✓] Branch fix/lock_gravityClose exists |
@DL6ER my second testing unit crashed overnight. FTL tail (both .log and .log.1):
.1:
PIhole.log:
Config file:
|
Sorry for the delay in responding.
Was it using the same version of the fix branch? You can check this with |
Looks like I wasted your time on that. Amusingly test unit 2 crashed again. Will pull now and move forward. Test unit 1: vDev-d5a33a4 |
Okay, so no crashes on either test unit since last post where figured out an update didn't go through. Lesson learned: don't run automated update scripts with pihole when testing. Apparently, it failed to download FTL because FTL crashed and resolv.conf was 127.0.0.1. Doh! |
Okay, no crashes is very good news. |
Thanks! The fix has been merged to |
Cool beans. Got a timeframe for release? Debating pushing developmental to my others to be safe. They crash but much more rarely. |
No real timeframe but shouldn't take long. But saying it will be hours or days or rather weeks is beyond what I can do right now because more people are involved in the process. Using |
Versions
Pi-hole: 5.3.1
AdminLTE: 5.5
FTL: 5.8.1
Platform
OS and version: Debian Buster
Platform: VM KMS
Expected behavior
Not to crash.
Actual behavior / bug
FTL crashed at some point
Steps to reproduce
Unsure, it died.
Debug Token
Debug log:
[2021-04-23 12:40:02.721 25660/F34373] Reopening Gravity database for this fork
[2021-04-23 12:40:02.725 25660/F34373] Closing Telnet socket for this fork
[2021-04-23 12:40:02.725 25660/F34373] Closing Unix socket for this fork
[2021-04-23 12:40:03.434 25660/F34373] TCP worker terminating (client disconnected)
[2021-04-23 12:40:08.628 25509/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:08.628 25509/F34373] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2021-04-23 12:40:08.628 25509/F34373] ----------------------------> FTL crashed! <----------------------------
[2021-04-23 12:40:08.629 25509/F34373] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2021-04-23 12:40:08.629 25509/F34373] Please report a bug at https://github.com/pi-hole/FTL/issues
[2021-04-23 12:40:08.629 25509/F34373] and include in your report already the following details:
[2021-04-23 12:40:08.629 25509/F34373] FTL has been running for 158727 seconds
[2021-04-23 12:40:08.629 25509/F34373] FTL branch: master
[2021-04-23 12:40:08.629 25509/F34373] FTL version: v5.8.1
[2021-04-23 12:40:08.629 25509/F34373] FTL commit: b90ab8b
[2021-04-23 12:40:08.629 25509/F34373] FTL date: 2021-04-21 20:03:47 +0100
[2021-04-23 12:40:08.629 25509/F34373] FTL user: started as pihole, ended as pihole
[2021-04-23 12:40:08.629 25509/F34373] Compiled for x86_64 (compiled on CI) using gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
[2021-04-23 12:40:08.629 25509/F34373] Process details: MID: 34373
[2021-04-23 12:40:08.629 25509/F34373] PID: 25509
[2021-04-23 12:40:08.629 25509/F34373] TID: 25509
[2021-04-23 12:40:08.629 25509/F34373] Name: pihole-FTL
[2021-04-23 12:40:08.629 25509/F34373] Received signal: Segmentation fault
[2021-04-23 12:40:08.629 25509/F34373] at address: 0x7fd25821f000
[2021-04-23 12:40:08.629 25509/F34373] with code: SEGV_MAPERR (Address not mapped to object)
[2021-04-23 12:40:08.629 25509/F34373] Backtrace:
[2021-04-23 12:40:08.630 25509/F34373] B[0000]: /usr/bin/pihole-FTL(+0x5a80b) [0x55e22cc5a80b]
[2021-04-23 12:40:08.655 25509/F34373] L[0000]: /root/project/src/signals.c:197
[2021-04-23 12:40:08.656 25509/F34373] B[0001]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd258562730]
[2021-04-23 12:40:08.656 25509/F34373] B[0002]: /usr/bin/pihole-FTL(_getClient+0x43) [0x55e22cc5a353]
[2021-04-23 12:40:08.665 25509/F34373] L[0002]: /root/project/src/shmem.c:1018
[2021-04-23 12:40:08.667 25509/F34373] B[0003]: /usr/bin/pihole-FTL(+0x624fd) [0x55e22cc624fd]
[2021-04-23 12:40:08.678 25509/F34373] L[0003]: /root/project/src/database/gravity-db.c:896
[2021-04-23 12:40:08.679 25509/F34373] B[0004]: /usr/bin/pihole-FTL(+0x7ed61) [0x55e22cc7ed61]
[2021-04-23 12:40:08.688 25509/F34373] L[0004]: /root/project/src/dnsmasq/dnsmasq.c:1275
[2021-04-23 12:40:08.688 25509/F34373] B[0005]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd258562730]
[2021-04-23 12:40:08.688 25509/F34373] B[0006]: /lib/x86_64-linux-gnu/libpthread.so.0(__read+0x44) [0x7fd258561544]
[2021-04-23 12:40:08.688 25509/F34373] B[0007]: /usr/bin/pihole-FTL(read_write+0x6e) [0x55e22ccbfa4e]
[2021-04-23 12:40:08.698 25509/F34373] L[0007]: /root/project/src/dnsmasq/util.c:700
[2021-04-23 12:40:08.699 25509/F34373] B[0008]: /usr/bin/pihole-FTL(tcp_request+0x17b) [0x55e22cc8899b]
[2021-04-23 12:40:08.710 25509/F34373] L[0008]: /root/project/src/dnsmasq/forward.c:1942
[2021-04-23 12:40:08.711 25509/F34373] B[0009]: /usr/bin/pihole-FTL(+0x7e811) [0x55e22cc7e811]
[2021-04-23 12:40:08.723 25509/F34373] L[0009]: /root/project/src/dnsmasq/dnsmasq.c:2009 (discriminator 4)
[2021-04-23 12:40:08.724 25509/F34373] B[0010]: /usr/bin/pihole-FTL(main_dnsmasq+0x12d9) [0x55e22cc807e9]
[2021-04-23 12:40:08.735 25509/F34373] L[0010]: /root/project/src/dnsmasq/dnsmasq.c:1227
[2021-04-23 12:40:08.736 25509/F34373] B[0011]: /usr/bin/pihole-FTL(main+0x11f) [0x55e22cc452af]
[2021-04-23 12:40:08.748 25509/F34373] L[0011]: /root/project/src/main.c:98
[2021-04-23 12:40:08.750 25509/F34373] B[0012]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fd2583b309b]
[2021-04-23 12:40:08.750 25509/F34373] B[0013]: /usr/bin/pihole-FTL(_start+0x2a) [0x55e22cc4538a]
[2021-04-23 12:40:08.779 25509/F34373] L[0013]: ??:?
[2021-04-23 12:40:08.782 25509/F34373] ------ Listing content of directory /dev/shm ------
[2021-04-23 12:40:08.782 25509/F34373] File Mode User:Group Size Filename
[2021-04-23 12:40:08.782 25509/F34373] rwxrwxrwx root:root 260 .
[2021-04-23 12:40:08.782 25509/F34373] rwxr-xr-x root:root 3K ..
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 16K FTL-per-client-regex
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 512K FTL-dns-cache
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 8K FTL-overTime
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 19M FTL-queries
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 20K FTL-upstreams
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 1M FTL-clients
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 467K FTL-domains
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 614K FTL-strings
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 12 FTL-settings
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 224 FTL-counters
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 48 FTL-lock
[2021-04-23 12:40:08.783 25509/F34373] ---------------------------------------------------
[2021-04-23 12:40:08.783 25509/F34373] Please also include some lines from above the !!!!!!!!! header.
[2021-04-23 12:40:08.783 25509/F34373] Thank you for helping us to improve our FTL engine!
[2021-04-23 12:40:08.783 25509/F34373] Asking parent pihole-FTL (PID 34373) to shut down
[2021-04-23 12:40:08.783 25509/F34373] FTL fork terminated!
[2021-04-23 12:40:08.783 34373M] Received: Real-time signal 2 (36 -> 2)
[2021-04-23 12:40:08.783 34373M] Shutting down...
[2021-04-23 12:40:08.783 25637/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:08.784 25550/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:08.784 25639/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:09.061 34373M] Finished final database update
[2021-04-23 12:40:09.061 34373M] Waiting for threads to join
[2021-04-23 12:40:09.061 34373M] Thread telnet-IPv4 (0) is idle, terminating it.
[2021-04-23 12:40:09.061 34373M] Thread telnet-IPv6 (1) is idle, terminating it.
[2021-04-23 12:40:09.061 34373M] Thread telnet-socket (2) is idle, terminating it.
[2021-04-23 12:40:09.061 34373M] Thread database (3) is idle, terminating it.
[2021-04-23 12:40:09.062 34373M] Thread housekeeper (4) is idle, terminating it.
[2021-04-23 12:40:09.062 34373M] Thread DNS client (5) is idle, terminating it.
[2021-04-23 12:40:09.062 34373M] All threads joined
The text was updated successfully, but these errors were encountered: