Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTL crashed latest master branch (newer than my last post, different Pihole) #1124

Closed
derekcentrico opened this issue Apr 24, 2021 · 36 comments · Fixed by #1134
Closed

FTL crashed latest master branch (newer than my last post, different Pihole) #1124

derekcentrico opened this issue Apr 24, 2021 · 36 comments · Fixed by #1134

Comments

@derekcentrico
Copy link

Versions
Pi-hole: 5.3.1
AdminLTE: 5.5
FTL: 5.8.1

Platform
OS and version: Debian Buster
Platform: VM KMS

Expected behavior
Not to crash.

Actual behavior / bug
FTL crashed at some point

Steps to reproduce
Unsure, it died.

Debug Token
Debug log:

[2021-04-23 12:40:02.721 25660/F34373] Reopening Gravity database for this fork
[2021-04-23 12:40:02.725 25660/F34373] Closing Telnet socket for this fork
[2021-04-23 12:40:02.725 25660/F34373] Closing Unix socket for this fork
[2021-04-23 12:40:03.434 25660/F34373] TCP worker terminating (client disconnected)
[2021-04-23 12:40:08.628 25509/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:08.628 25509/F34373] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2021-04-23 12:40:08.628 25509/F34373] ----------------------------> FTL crashed! <----------------------------
[2021-04-23 12:40:08.629 25509/F34373] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2021-04-23 12:40:08.629 25509/F34373] Please report a bug at https://github.com/pi-hole/FTL/issues
[2021-04-23 12:40:08.629 25509/F34373] and include in your report already the following details:
[2021-04-23 12:40:08.629 25509/F34373] FTL has been running for 158727 seconds
[2021-04-23 12:40:08.629 25509/F34373] FTL branch: master
[2021-04-23 12:40:08.629 25509/F34373] FTL version: v5.8.1
[2021-04-23 12:40:08.629 25509/F34373] FTL commit: b90ab8b
[2021-04-23 12:40:08.629 25509/F34373] FTL date: 2021-04-21 20:03:47 +0100
[2021-04-23 12:40:08.629 25509/F34373] FTL user: started as pihole, ended as pihole
[2021-04-23 12:40:08.629 25509/F34373] Compiled for x86_64 (compiled on CI) using gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
[2021-04-23 12:40:08.629 25509/F34373] Process details: MID: 34373
[2021-04-23 12:40:08.629 25509/F34373] PID: 25509
[2021-04-23 12:40:08.629 25509/F34373] TID: 25509
[2021-04-23 12:40:08.629 25509/F34373] Name: pihole-FTL
[2021-04-23 12:40:08.629 25509/F34373] Received signal: Segmentation fault
[2021-04-23 12:40:08.629 25509/F34373] at address: 0x7fd25821f000
[2021-04-23 12:40:08.629 25509/F34373] with code: SEGV_MAPERR (Address not mapped to object)
[2021-04-23 12:40:08.629 25509/F34373] Backtrace:
[2021-04-23 12:40:08.630 25509/F34373] B[0000]: /usr/bin/pihole-FTL(+0x5a80b) [0x55e22cc5a80b]
[2021-04-23 12:40:08.655 25509/F34373] L[0000]: /root/project/src/signals.c:197
[2021-04-23 12:40:08.656 25509/F34373] B[0001]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd258562730]
[2021-04-23 12:40:08.656 25509/F34373] B[0002]: /usr/bin/pihole-FTL(_getClient+0x43) [0x55e22cc5a353]
[2021-04-23 12:40:08.665 25509/F34373] L[0002]: /root/project/src/shmem.c:1018
[2021-04-23 12:40:08.667 25509/F34373] B[0003]: /usr/bin/pihole-FTL(+0x624fd) [0x55e22cc624fd]
[2021-04-23 12:40:08.678 25509/F34373] L[0003]: /root/project/src/database/gravity-db.c:896
[2021-04-23 12:40:08.679 25509/F34373] B[0004]: /usr/bin/pihole-FTL(+0x7ed61) [0x55e22cc7ed61]
[2021-04-23 12:40:08.688 25509/F34373] L[0004]: /root/project/src/dnsmasq/dnsmasq.c:1275
[2021-04-23 12:40:08.688 25509/F34373] B[0005]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd258562730]
[2021-04-23 12:40:08.688 25509/F34373] B[0006]: /lib/x86_64-linux-gnu/libpthread.so.0(__read+0x44) [0x7fd258561544]
[2021-04-23 12:40:08.688 25509/F34373] B[0007]: /usr/bin/pihole-FTL(read_write+0x6e) [0x55e22ccbfa4e]
[2021-04-23 12:40:08.698 25509/F34373] L[0007]: /root/project/src/dnsmasq/util.c:700
[2021-04-23 12:40:08.699 25509/F34373] B[0008]: /usr/bin/pihole-FTL(tcp_request+0x17b) [0x55e22cc8899b]
[2021-04-23 12:40:08.710 25509/F34373] L[0008]: /root/project/src/dnsmasq/forward.c:1942
[2021-04-23 12:40:08.711 25509/F34373] B[0009]: /usr/bin/pihole-FTL(+0x7e811) [0x55e22cc7e811]
[2021-04-23 12:40:08.723 25509/F34373] L[0009]: /root/project/src/dnsmasq/dnsmasq.c:2009 (discriminator 4)
[2021-04-23 12:40:08.724 25509/F34373] B[0010]: /usr/bin/pihole-FTL(main_dnsmasq+0x12d9) [0x55e22cc807e9]
[2021-04-23 12:40:08.735 25509/F34373] L[0010]: /root/project/src/dnsmasq/dnsmasq.c:1227
[2021-04-23 12:40:08.736 25509/F34373] B[0011]: /usr/bin/pihole-FTL(main+0x11f) [0x55e22cc452af]
[2021-04-23 12:40:08.748 25509/F34373] L[0011]: /root/project/src/main.c:98
[2021-04-23 12:40:08.750 25509/F34373] B[0012]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fd2583b309b]
[2021-04-23 12:40:08.750 25509/F34373] B[0013]: /usr/bin/pihole-FTL(_start+0x2a) [0x55e22cc4538a]
[2021-04-23 12:40:08.779 25509/F34373] L[0013]: ??:?
[2021-04-23 12:40:08.782 25509/F34373] ------ Listing content of directory /dev/shm ------
[2021-04-23 12:40:08.782 25509/F34373] File Mode User:Group Size Filename
[2021-04-23 12:40:08.782 25509/F34373] rwxrwxrwx root:root 260 .
[2021-04-23 12:40:08.782 25509/F34373] rwxr-xr-x root:root 3K ..
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 16K FTL-per-client-regex
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 512K FTL-dns-cache
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 8K FTL-overTime
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 19M FTL-queries
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 20K FTL-upstreams
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 1M FTL-clients
[2021-04-23 12:40:08.782 25509/F34373] rw------- pihole:pihole 467K FTL-domains
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 614K FTL-strings
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 12 FTL-settings
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 224 FTL-counters
[2021-04-23 12:40:08.783 25509/F34373] rw------- pihole:pihole 48 FTL-lock
[2021-04-23 12:40:08.783 25509/F34373] ---------------------------------------------------
[2021-04-23 12:40:08.783 25509/F34373] Please also include some lines from above the !!!!!!!!! header.
[2021-04-23 12:40:08.783 25509/F34373] Thank you for helping us to improve our FTL engine!
[2021-04-23 12:40:08.783 25509/F34373] Asking parent pihole-FTL (PID 34373) to shut down
[2021-04-23 12:40:08.783 25509/F34373] FTL fork terminated!
[2021-04-23 12:40:08.783 34373M] Received: Real-time signal 2 (36 -> 2)
[2021-04-23 12:40:08.783 34373M] Shutting down...
[2021-04-23 12:40:08.783 25637/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:08.784 25550/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:08.784 25639/F34373] TCP worker terminating (timeout)
[2021-04-23 12:40:09.061 34373M] Finished final database update
[2021-04-23 12:40:09.061 34373M] Waiting for threads to join
[2021-04-23 12:40:09.061 34373M] Thread telnet-IPv4 (0) is idle, terminating it.
[2021-04-23 12:40:09.061 34373M] Thread telnet-IPv6 (1) is idle, terminating it.
[2021-04-23 12:40:09.061 34373M] Thread telnet-socket (2) is idle, terminating it.
[2021-04-23 12:40:09.061 34373M] Thread database (3) is idle, terminating it.
[2021-04-23 12:40:09.062 34373M] Thread housekeeper (4) is idle, terminating it.
[2021-04-23 12:40:09.062 34373M] Thread DNS client (5) is idle, terminating it.
[2021-04-23 12:40:09.062 34373M] All threads joined

@derekcentrico
Copy link
Author

A completely different Pihole server is crashing now. Same setup though.

[2021-04-24 10:56:35.059 87271/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:56:35.059 87271/F15585] Reopening Gravity database for this fork
[2021-04-24 10:56:35.061 87271/F15585] Closing Telnet socket for this fork
[2021-04-24 10:56:35.061 87271/F15585] Closing Unix socket for this fork
[2021-04-24 10:56:35.997 87269/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:56:40.341 87273/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:56:40.342 87273/F15585] Reopening Gravity database for this fork
[2021-04-24 10:56:40.344 87273/F15585] Closing Telnet socket for this fork
[2021-04-24 10:56:40.344 87273/F15585] Closing Unix socket for this fork
[2021-04-24 10:56:45.655 87220/F15585] Resizing "FTL-dns-cache" from 942080 to (59136 * 16) == 946176 (/dev/shm: 18.7MB used, 1.0GB total, FTL uses 18.6MB)
[2021-04-24 10:56:45.729 15585M] Remapping "FTL-queries" from 14680064 to (229376 * 64) == 14680064
[2021-04-24 10:56:45.730 15585M] Remapping "FTL-domains" from 1044480 to (43520 * 24) == 1044480
[2021-04-24 10:56:45.730 15585M] Remapping "FTL-clients" from 712704 to (1024 * 696) == 712704
[2021-04-24 10:56:45.730 15585M] Remapping "FTL-upstreams" from 20480 to (512 * 40) == 20480
[2021-04-24 10:56:45.730 15585M] Remapping "FTL-dns-cache" from 942080 to (59136 * 16) == 946176
[2021-04-24 10:56:45.730 15585M] Remapping "FTL-per-client-regex" from 8192 to (8192 * 1) == 8192
[2021-04-24 10:56:45.730 15585M] Remapping "FTL-strings" from 1228800 to (1228800 * 1) == 1228800
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-queries" from 14680064 to (229376 * 64) == 14680064
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-domains" from 1044480 to (43520 * 24) == 1044480
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-clients" from 712704 to (1024 * 696) == 712704
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-upstreams" from 20480 to (512 * 40) == 20480
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-dns-cache" from 942080 to (59136 * 16) == 946176
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-per-client-regex" from 8192 to (8192 * 1) == 8192
[2021-04-24 10:56:46.868 87263/F15585] Remapping "FTL-strings" from 1228800 to (1228800 * 1) == 1228800
[2021-04-24 10:56:47.679 87265/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:56:48.748 87254/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:56:49.418 87267/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:56:55.175 87271/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:56:55.749 87276/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:56:55.749 87276/F15585] Reopening Gravity database for this fork
[2021-04-24 10:56:55.751 87276/F15585] Closing Telnet socket for this fork
[2021-04-24 10:56:55.751 87276/F15585] Closing Unix socket for this fork
[2021-04-24 10:56:56.144 87277/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:56:56.144 87277/F15585] Reopening Gravity database for this fork
[2021-04-24 10:56:56.146 87277/F15585] Closing Telnet socket for this fork
[2021-04-24 10:56:56.146 87277/F15585] Closing Unix socket for this fork
[2021-04-24 10:56:56.900 87259/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:56:57.234 87279/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:56:57.234 87279/F15585] Reopening Gravity database for this fork
[2021-04-24 10:56:57.237 87279/F15585] Closing Telnet socket for this fork
[2021-04-24 10:56:57.237 87279/F15585] Closing Unix socket for this fork
[2021-04-24 10:56:57.521 87246/F15585] Remapping "FTL-queries" from 14680064 to (229376 * 64) == 14680064
[2021-04-24 10:56:57.522 87246/F15585] Remapping "FTL-domains" from 1044480 to (43520 * 24) == 1044480
[2021-04-24 10:56:57.522 87246/F15585] Remapping "FTL-clients" from 712704 to (1024 * 696) == 712704
[2021-04-24 10:56:57.522 87246/F15585] Remapping "FTL-upstreams" from 20480 to (512 * 40) == 20480
[2021-04-24 10:56:57.522 87246/F15585] Remapping "FTL-dns-cache" from 942080 to (59136 * 16) == 946176
[2021-04-24 10:56:57.522 87246/F15585] Remapping "FTL-per-client-regex" from 8192 to (8192 * 1) == 8192
[2021-04-24 10:56:57.522 87246/F15585] Remapping "FTL-strings" from 1228800 to (1228800 * 1) == 1228800
[2021-04-24 10:57:00.524 87285/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:57:00.524 87285/F15585] Reopening Gravity database for this fork
[2021-04-24 10:57:00.527 87285/F15585] Closing Telnet socket for this fork
[2021-04-24 10:57:00.527 87285/F15585] Closing Unix socket for this fork
[2021-04-24 10:57:01.272 87273/F15585] TCP worker terminating (client disconnected)
[2021-04-24 10:57:05.386 87288/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:57:05.386 87288/F15585] Reopening Gravity database for this fork
[2021-04-24 10:57:05.389 87288/F15585] Closing Telnet socket for this fork
[2021-04-24 10:57:05.389 87288/F15585] Closing Unix socket for this fork
[2021-04-24 10:57:05.744 87289/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:57:05.744 87289/F15585] Reopening Gravity database for this fork
[2021-04-24 10:57:05.747 87289/F15585] Closing Telnet socket for this fork
[2021-04-24 10:57:05.747 87289/F15585] Closing Unix socket for this fork
[2021-04-24 10:57:06.372 87291/F15585] TCP worker forked for client 127.0.0.1 on interface eth0 with IP 127.0.0.1
[2021-04-24 10:57:06.372 87291/F15585] Reopening Gravity database for this fork
[2021-04-24 10:57:06.376 87291/F15585] Closing Telnet socket for this fork
[2021-04-24 10:57:06.376 87291/F15585] Closing Unix socket for this fork
[2021-04-24 10:57:06.883 87118/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.884 87118/F15585] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2021-04-24 10:57:06.884 87118/F15585] ----------------------------> FTL crashed! <----------------------------
[2021-04-24 10:57:06.884 87118/F15585] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2021-04-24 10:57:06.884 87118/F15585] Please report a bug at https://github.com/pi-hole/FTL/issues
[2021-04-24 10:57:06.884 87118/F15585] and include in your report already the following details:
[2021-04-24 10:57:06.884 87118/F15585] FTL has been running for 242557 seconds
[2021-04-24 10:57:06.884 87118/F15585] FTL branch: master
[2021-04-24 10:57:06.884 87118/F15585] FTL version: v5.8.1
[2021-04-24 10:57:06.884 87118/F15585] FTL commit: b90ab8b
[2021-04-24 10:57:06.884 87118/F15585] FTL date: 2021-04-21 20:03:47 +0100
[2021-04-24 10:57:06.884 87118/F15585] FTL user: started as pihole, ended as pihole
[2021-04-24 10:57:06.884 87118/F15585] Compiled for x86_64 (compiled on CI) using gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
[2021-04-24 10:57:06.884 87118/F15585] Process details: MID: 15585
[2021-04-24 10:57:06.884 87118/F15585] PID: 87118
[2021-04-24 10:57:06.884 87118/F15585] TID: 87118
[2021-04-24 10:57:06.884 87118/F15585] Name: pihole-FTL
[2021-04-24 10:57:06.884 87118/F15585] Received signal: Segmentation fault
[2021-04-24 10:57:06.884 87118/F15585] at address: 0x7f3fb2ce0000
[2021-04-24 10:57:06.884 87118/F15585] with code: SEGV_MAPERR (Address not mapped to object)
[2021-04-24 10:57:06.885 87118/F15585] Backtrace:
[2021-04-24 10:57:06.886 87118/F15585] B[0000]: /usr/bin/pihole-FTL(+0x5a80b) [0x5651d705a80b]
[2021-04-24 10:57:06.892 87118/F15585] L[0000]: N/A (0x5a80b)
[2021-04-24 10:57:06.892 87118/F15585] B[0001]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f3fb2ed4730]
[2021-04-24 10:57:06.892 87118/F15585] B[0002]: /usr/bin/pihole-FTL(_getClient+0x43) [0x5651d705a353]
[2021-04-24 10:57:06.895 87118/F15585] L[0002]: N/A (0x5a353)
[2021-04-24 10:57:06.895 87118/F15585] B[0003]: /usr/bin/pihole-FTL(+0x624fd) [0x5651d70624fd]
[2021-04-24 10:57:06.898 87118/F15585] L[0003]: N/A (0x624fd)
[2021-04-24 10:57:06.898 87118/F15585] B[0004]: /usr/bin/pihole-FTL(+0x7ed61) [0x5651d707ed61]
[2021-04-24 10:57:06.901 87118/F15585] L[0004]: N/A (0x7ed61)
[2021-04-24 10:57:06.901 87118/F15585] B[0005]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f3fb2ed4730]
[2021-04-24 10:57:06.901 87118/F15585] B[0006]: /lib/x86_64-linux-gnu/libpthread.so.0(__read+0x44) [0x7f3fb2ed3544]
[2021-04-24 10:57:06.901 87118/F15585] B[0007]: /usr/bin/pihole-FTL(read_write+0x6e) [0x5651d70bfa4e]
[2021-04-24 10:57:06.904 87118/F15585] L[0007]: N/A (0xbfa4e)
[2021-04-24 10:57:06.904 87118/F15585] B[0008]: /usr/bin/pihole-FTL(tcp_request+0x17b) [0x5651d708899b]
[2021-04-24 10:57:06.906 87118/F15585] L[0008]: N/A (0x8899b)
[2021-04-24 10:57:06.906 87118/F15585] B[0009]: /usr/bin/pihole-FTL(+0x7e811) [0x5651d707e811]
[2021-04-24 10:57:06.909 87118/F15585] L[0009]: N/A (0x7e811)
[2021-04-24 10:57:06.909 87118/F15585] B[0010]: /usr/bin/pihole-FTL(main_dnsmasq+0x12d9) [0x5651d70807e9]
[2021-04-24 10:57:06.912 87118/F15585] L[0010]: N/A (0x807e9)
[2021-04-24 10:57:06.912 87118/F15585] B[0011]: /usr/bin/pihole-FTL(main+0x11f) [0x5651d70452af]
[2021-04-24 10:57:06.915 87118/F15585] L[0011]: N/A (0x452af)
[2021-04-24 10:57:06.915 87118/F15585] B[0012]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7f3fb2d2509b]
[2021-04-24 10:57:06.915 87118/F15585] B[0013]: /usr/bin/pihole-FTL(_start+0x2a) [0x5651d704538a]
[2021-04-24 10:57:06.918 87118/F15585] L[0013]: N/A (0x4538a)
[2021-04-24 10:57:06.918 87118/F15585] ------ Listing content of directory /dev/shm ------
[2021-04-24 10:57:06.918 87118/F15585] File Mode User:Group Size Filename
[2021-04-24 10:57:06.918 87118/F15585] rwxrwxrwx root:root 260 .
[2021-04-24 10:57:06.918 87118/F15585] rwxr-xr-x root:root 3K ..
[2021-04-24 10:57:06.918 87118/F15585] rw------- pihole:pihole 8K FTL-per-client-regex
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 946K FTL-dns-cache
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 8K FTL-overTime
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 15M FTL-queries
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 20K FTL-upstreams
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 713K FTL-clients
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 1M FTL-domains
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 1M FTL-strings
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 12 FTL-settings
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 224 FTL-counters
[2021-04-24 10:57:06.919 87118/F15585] rw------- pihole:pihole 48 FTL-lock
[2021-04-24 10:57:06.919 87118/F15585] ---------------------------------------------------
[2021-04-24 10:57:06.919 87118/F15585] Please also include some lines from above the !!!!!!!!! header.
[2021-04-24 10:57:06.919 87118/F15585] Thank you for helping us to improve our FTL engine!
[2021-04-24 10:57:06.919 87118/F15585] Asking parent pihole-FTL (PID 15585) to shut down
[2021-04-24 10:57:06.919 87118/F15585] FTL fork terminated!
[2021-04-24 10:57:06.920 15585M] Received: Real-time signal 2 (36 -> 2)
[2021-04-24 10:57:06.920 15585M] Shutting down...
[2021-04-24 10:57:06.920 87174/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.920 87291/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.922 87289/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.923 87288/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.923 87285/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.924 87279/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.924 87193/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.926 87277/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.927 87276/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.929 87153/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.930 87263/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.932 87161/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.932 87246/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.934 87220/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:06.934 87242/F15585] TCP worker terminating (timeout)
[2021-04-24 10:57:07.209 15585M] Finished final database update
[2021-04-24 10:57:07.209 15585M] Waiting for threads to join
[2021-04-24 10:57:07.209 15585M] Thread telnet-IPv4 (0) is idle, terminating it.
[2021-04-24 10:57:07.210 15585M] Thread telnet-IPv6 (1) is idle, terminating it.
[2021-04-24 10:57:07.210 15585M] Thread telnet-socket (2) is idle, terminating it.
[2021-04-24 10:57:07.210 15585M] Thread database (3) is idle, terminating it.
[2021-04-24 10:57:07.211 15585M] Thread housekeeper (4) is idle, terminating it.
[2021-04-24 10:57:07.211 15585M] Thread DNS client (5) is idle, terminating it.
[2021-04-24 10:57:07.211 15585M] All threads joined
[2021-04-24 10:57:07.232 15585M] ########## FTL terminated after 2d 19h 22m 38s (code 1)! ##########

@DL6ER
Copy link
Member

DL6ER commented Apr 27, 2021

This is likely the same issue as #1119, closing the former to keep focused.

The issue seems to be with many (> 1500) clients being fairly active on TCP DNS. However, some clients do not behave correctly and just timeout instead of properly closing the connection:

[2021-04-24 10:57:06.883 87118/F15585] TCP worker terminating (timeout)

instead of

[2021-04-24 10:56:47.679 87265/F15585] TCP worker terminating (client disconnected)

Could you try

pihole checkout ftl fix/lock_gravityClose

and check if this resolves the issue you're seeing?

@derekcentrico
Copy link
Author

Copy, added to the servers in question to see if issues crop up again.

@derekcentrico
Copy link
Author

Sadly, I had another crash of some sort but no log information which is strange. The server was responsive and I don't see anything within server logs itself. I noticed the outage and restarted pihole at 12:52. I realize this is 99.99% unhelpful but just relaying.

[2021-04-28 07:41:19.028 28405/F4942] TCP worker terminating (client disconnected)
[2021-04-28 07:41:19.101 28202/F4942] Reloading externally changed regular expressions
[2021-04-28 07:41:19.440 28204/F4942] TCP worker terminating (timeout)
[2021-04-28 07:41:24.269 28397/F4942] TCP worker terminating (client disconnected)
[2021-04-28 07:42:50.617 28236/F4942] TCP worker terminating (timeout)
[2021-04-28 07:43:06.858 28244/F4942] TCP worker terminating (timeout)
[2021-04-28 07:44:17.946 28363/F4942] TCP worker terminating (timeout)
[2021-04-28 07:45:51.341 28397/F4942] TCP worker already terminating!
[2021-04-28 12:52:31.106 28204/F4942] Remapping "FTL-queries" from 1835008 to (28672 * 64) == 1835008
[2021-04-28 12:52:31.106 28204/F4942] Remapping "FTL-domains" from 393216 to (16896 * 24) == 405504
[2021-04-28 12:52:31.106 28204/F4942] Remapping "FTL-clients" from 356352 to (512 * 696) == 356352
[2021-04-28 12:52:31.107 28204/F4942] Remapping "FTL-upstreams" from 20480 to (512 * 40) == 20480
[2021-04-28 12:52:31.107 28204/F4942] Remapping "FTL-dns-cache" from 331776 to (20736 * 16) == 331776
[2021-04-28 12:52:31.107 28204/F4942] Remapping "FTL-per-client-regex" from 8192 to (8192 * 1) == 8192
[2021-04-28 12:52:31.107 28204/F4942] Remapping "FTL-strings" from 491520 to (491520 * 1) == 491520
[2021-04-28 12:52:31.108 28363/F4942] Remapping "FTL-queries" from 1835008 to (28672 * 64) == 1835008
[2021-04-28 12:52:31.108 28363/F4942] Remapping "FTL-domains" from 393216 to (16896 * 24) == 405504
[2021-04-28 12:52:31.108 28363/F4942] Remapping "FTL-clients" from 356352 to (512 * 696) == 356352
[2021-04-28 12:52:31.108 28363/F4942] Remapping "FTL-upstreams" from 20480 to (512 * 40) == 20480

@DL6ER
Copy link
Member

DL6ER commented Apr 28, 2021

Thanks for coming back to me with this log excerpt. Do you have a feeling for how long it was unresponsive until you restarted? Do you know if the activity at 12:52:31 was triggered by you or was it just a conincidence? Can the outage have been there since 07:45:51 and nobody realized it?

If you still have logging enabled, could you please also check /var/log/pihole.log if there was any activity at/before 12:52:31 or if there was just silence since 07:45 ? This will (hopefully!) give us valuable pointers as to where we have to look.

@derekcentrico
Copy link
Author

I believe it was unresponsive since the last query until I was told that DNS was super slow. Saw that this piHole (one of the two on the network) was down. I think a number of devices just use this one only for whatever reason so makes sense that things weren't loading.

Not much in the log. I figured this is super unhelpful but worth mentioning that an outage occurred.

Apr 28 07:41:19 dnsmasq[28202]: query[A] googleads.g.doubleclick.net from 127.0.0.1
Apr 28 12:53:33 dnsmasq[1120]: started, version pi-hole-2.85 cachesize 10000
Apr 28 12:53:33 dnsmasq[1120]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n IDN DHCP DHCPv6 Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile
Apr 28 12:53:33 dnsmasq[1120]: DNSSEC validation enabled

@DL6ER
Copy link
Member

DL6ER commented Apr 28, 2021

If you have enough free disk space, please add

DEBUG_LOCKS=true

to the file /etc/pihole/pihole-FTL.conf (create if it does not exist) and run

pihole restartdns

Next time the issue happens, please include again the last few log lines.
This may help us getting a better picture of where/why exactly the issue happens.

@derekcentrico
Copy link
Author

Copy, we shall see. Got about 30GB - 50GB free on the two I'm testing on.

@derekcentrico
Copy link
Author

One of the two servers that experience this issue has stopped responding again. UI just spins as well, doesn't show stats or claim FTL is offline.

[2021-04-29 11:13:00.964 4788/T4792] Removed lock in add_FTL_clients_to_network_table() (/root/project/src/database/network-table.c:757)
[2021-04-29 11:13:00.964 4788/T4792] Waiting for SHM lock in add_FTL_clients_to_network_table() (/root/project/src/database/network-table.c:759)
[2021-04-29 11:13:00.964 4788/T4792] Obtained SHM lock for add_FTL_clients_to_network_table() (/root/project/src/database/network-table.c:759)
[2021-04-29 11:13:00.964 4788/T4792] Removed lock in add_FTL_clients_to_network_table() (/root/project/src/database/network-table.c:853)
[2021-04-29 11:13:00.964 21477/F4788] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-04-29 11:13:00.964 4788/T4792] Waiting for SHM lock in add_FTL_clients_to_network_table() (/root/project/src/database/network-table.c:863)
[2021-04-29 11:13:00.964 21477/F4788] Obtained SHM lock for _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-04-29 11:13:05.941 21475/F4788] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-04-29 11:13:34.375 4788M] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-04-29 11:13:58.086 4788/T4793] Waiting for SHM lock in GC_thread() (/root/project/src/gc.c:53)
[2021-04-29 11:17:31.353 21475/F4788] TCP worker terminating (timeout)
[2021-04-29 11:17:31.354 21475/F4788] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-04-29 11:18:00.774 21477/F4788] TCP worker terminating (timeout)
[2021-04-29 11:18:00.774 21477/F4788] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-04-29 12:00:00.927 4788/T4794] Waiting for SHM lock in resolveClients() (/root/project/src/resolve.c:364)
[2021-04-29 13:02:24.813 4788/T22363] Waiting for SHM lock in process_request() (/root/project/src/api/request.c:38)

Last pihole.log actions:

Apr 29 11:13:05 dnsmasq[21475]: query[A] dm16-useast1a.byteoversea.com from 127.0.0.1
Apr 29 11:13:34 dnsmasq[4788]: query[ANY] sl from 127.0.0.1

@derekcentrico
Copy link
Author

Sorry quick followup of not much but still..

pihole restartdns

[✗] Job for pihole-FTL.service failed because the control process exited with error code.
See "systemctl status pihole-FTL.service" and "journalctl -xe" for details.
Journal:
Apr 29 13:08:26 pihole pihole-FTL[22445]: .....
Apr 29 13:08:27 pihole pihole-FTL[22445]: Not stopped; may still be shutting down or shutdown may have failed, killing now
Apr 29 13:08:27 pihole systemd[1]: pihole-FTL.service: Control process exited, code=exited, status=1/FAILURE

SystemCTL output:

Apr 29 13:09:16 pihole pihole-FTL[22569]: Not stopped; may still be shutting down or shutdown may have failed, killing now
Apr 29 13:09:16 pihole systemd[1]: pihole-FTL.service: Control process exited, code=exited, status=1/FAILURE
Apr 29 13:09:16 pihole systemd[1]: pihole-FTL.service: Failed with result 'exit-code'.
Apr 29 13:09:16 pihole systemd[1]: Failed to start LSB: pihole-FTL daemon.

@DL6ER
Copy link
Member

DL6ER commented Apr 30, 2021

Okay, so the problem is actually somewhere else. Can you also add

DEBUG_QUERIES=true
DEBUG_CLIENTS=true

to your conf file and try again?

I'm sure we'll get down to it in the end.

@derekcentrico
Copy link
Author

Done. My other testing unit was unresponsive.

The only line in the FTL log was:

[2021-04-29 17:00:00.830 62997/T63003] Waiting for SHM lock in resolveClients() (/root/project/src/resolve.c:364)
[2021-04-30 06:22:29.568 62997/T59001] Waiting for SHM lock in process_request() (/root/project/src/api/request.c:38)

Huge gap for sure there.

The pihole.log was equally nothing fun:

Apr 29 16:21:05 dnsmasq[34240]: query[A] gjapplog.ucweb.com from 127.0.0.1
Apr 29 16:23:23 dnsmasq[34173]: query[A] self.events.data.microsoft.com from 127.0.0.1

@derekcentrico
Copy link
Author

Quick update, no crash as of yet on either testing units. I check twice a day to ensure there's eyes on it. Whooo knows.

@derekcentrico
Copy link
Author

Got another crash. Not much different in the logs...

Pihole.conf

May 4 09:02:47 dnsmasq[40875]: query[A] mail.gandi.net from 127.0.0.1
May 4 09:02:48 dnsmasq[41049]: query[A] api.huqindustries.co.uk from 127.0.0.1
May 4 09:02:48 dnsmasq[41093]: query[A] rms-dre.platform.dbankcloud.com from 127.0.0.1
May 4 09:02:52 dnsmasq[41081]: query[A] i.ytimg.com from 127.0.0.1
May 4 09:02:53 dnsmasq[41092]: query[A] web.facebook.com from 127.0.0.1
May 4 09:02:55 dnsmasq[41025]: query[AAAA] edge-mqtt.facebook.com from 127.0.0.1
May 4 09:02:57 dnsmasq[41083]: query[AAAA] pagead2.googlesyndication.com from 127.0.0.1
May 4 09:03:04 dnsmasq[41076]: query[A] wpk-auth.ucweb.com from 127.0.0.1
May 4 09:03:05 dnsmasq[40930]: query[A] googleads.g.doubleclick.net from 127.0.0.1
May 4 09:03:22 dnsmasq[41005]: query[A] edge-mqtt.facebook.com from 127.0.0.1

...FTL.conf

[2021-05-04 09:02:47.619 40875/F1057] Removed lock in _FTL_reply() (/root/project/src/dnsmasq_interface.c:1138)
[2021-05-04 09:02:47.746 40875/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:47.746 40875/F1057] Obtained SHM lock for _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:47.746 40875/F1057] **** new TCP query[A] query "mail.gandi.net" from lo:127.0.0.1 (ID 8948935, FTL 84042, /root/project/src/dnsmasq/forward.c:1985)
[2021-05-04 09:02:47.747 40875/F1057] TCP worker terminating (timeout)
[2021-05-04 09:02:47.747 40875/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:02:48.543 41079/F1057] TCP worker terminating (client disconnected)
[2021-05-04 09:02:48.544 41079/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:02:48.670 41049/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:48.740 41093/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:50.779 1057M] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:52.699 41081/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:53.024 41092/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:55.887 41025/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:02:57.109 41083/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:03:00.729 1057/T1061] Waiting for SHM lock in DB_thread() (/root/project/src/database/database-thread.c:59)
[2021-05-04 09:03:04.936 41076/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:03:05.260 41087/F1057] TCP worker terminating (client disconnected)
[2021-05-04 09:03:05.260 41087/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:03:05.676 40930/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:03:07.263 1057/T1062] Waiting for SHM lock in GC_thread() (/root/project/src/gc.c:53)
[2021-05-04 09:03:17.671 40892/F1057] TCP worker terminating (timeout)
[2021-05-04 09:03:17.671 40892/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:03:22.774 41005/F1057] Waiting for SHM lock in _FTL_new_query() (/root/project/src/dnsmasq_interface.c:664)
[2021-05-04 09:04:03.128 40930/F1057] TCP worker terminating (timeout)
[2021-05-04 09:04:03.128 40930/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:04:45.150 41041/F1057] TCP worker terminating (client disconnected)
[2021-05-04 09:04:45.150 41041/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:05:27.486 41005/F1057] TCP worker terminating (timeout)
[2021-05-04 09:05:27.486 41005/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:05:55.520 41025/F1057] TCP worker terminating (timeout)
[2021-05-04 09:05:55.520 41025/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:06:35.577 41041/F1057] TCP worker already terminating!
[2021-05-04 09:06:38.707 41049/F1057] TCP worker terminating (timeout)
[2021-05-04 09:06:38.707 41049/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:12.123 41072/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:12.124 41072/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:19.106 41076/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:19.106 41076/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:20.358 41079/F1057] TCP worker already terminating!
[2021-05-04 09:07:35.095 41081/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:35.095 41081/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:35.181 41083/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:35.182 41083/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:40.185 41085/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:40.185 41085/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:44.888 41087/F1057] TCP worker already terminating!
[2021-05-04 09:07:46.409 41092/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:46.409 41092/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 09:07:46.462 41093/F1057] TCP worker terminating (timeout)
[2021-05-04 09:07:46.462 41093/F1057] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-04 10:00:00.641 1057/T1063] Waiting for SHM lock in resolveClients() (/root/project/src/resolve.c:364)
[2021-05-04 17:18:51.876 1057/T55597] Waiting for SHM lock in process_request() (/root/project/src/api/request.c:38)

@DL6ER
Copy link
Member

DL6ER commented May 5, 2021

The issue here is that the fork is terminated while it is serving a query.

[2021-05-04 09:02:47.746 40875/F1057] **** new TCP query[A] query "mail.gandi.net" from lo:127.0.0.1 (ID 8948935, FTL 84042, /root/project/src/dnsmasq/forward.c:1985)
[2021-05-04 09:02:47.747 40875/F1057] TCP worker terminating (timeout)

We have countermeasures in place but they don't work reliably in your case. I'll try reproducing this locally next week.

@derekcentrico
Copy link
Author

Sounds good, I can leave debugging on if you want and test as you find things. Just let me know!

@DL6ER
Copy link
Member

DL6ER commented May 10, 2021

@derekcentrico Please update the fix and try again.

@derekcentrico
Copy link
Author

Sadly now the FTL daemon won't start.

@DL6ER
Copy link
Member

DL6ER commented May 10, 2021

@derekcentrico Do you have some log lines for me?

@DL6ER
Copy link
Member

DL6ER commented May 10, 2021

I just pushed another change that may give more helpful log lines.

@derekcentrico
Copy link
Author

Not with a laptop for a few days and trying to do it via mobile. Basically it installs and then hangs on restarting service. When trying to start the service manually it wants systemctl daemon-reload. I do that and it still fails. Says failed to start LSB for timeout. Journalctl isn't helpful.

@DL6ER
Copy link
Member

DL6ER commented May 11, 2021

Maybe you can try

tail /var/log/pihole-FTL.log

via the phone's SSH client. A screenshot would be fine, too, no need to copy the text if it is too complicated.

@derekcentrico
Copy link
Author

Not the best from mobile probably, but it seems to just be stuck in this infinite loop when the service tries to start.

������� [2021-05-11 07:19:13.921 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.921 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.922 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.922 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.922 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.922 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.922 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.923 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.923 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.923 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.923 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.923 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.923 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.923 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.923 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.923 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.962 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.962 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.962 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.962 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.962 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.962 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.963 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.963 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.963 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.984 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.984 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.984 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.984 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.984 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.984 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.993 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.994 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.994 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.994 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.994 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.994 55542M] Obtained SHM lock for DB_read_queries() (/root/project/src/database/query-table.c:434)
[2021-05-11 07:19:13.995 55542M] Removed lock in DB_read_queries() (/root/project/src/database/query-table.c:595)
[2021-05-11 07:19:13.995 55542M] Waiting for SHM lock in DB_read_queries() (/root/project/src/database/query-table.c:434)

@DL6ER
Copy link
Member

DL6ER commented May 11, 2021

I don't think this is an infinite loop, the routine is called very often when importing queries from the database. The issue will happen thereafter. However, I also see that you can import 15 queries within about 70 msec. Assuming you may have on the order of 15.000 queries in 24 hours, this means FTL needs more than one minute for starting.

Maybe try without debug logging for now to see if it works at all. We can continue debugging when you are in a better position.

@derekcentrico
Copy link
Author

Sadly it didn’t work still w/ that disabled. Crashes. Won’t start. Gotta revert to master each time.

Nothing much in logs.

root@dns:/home/dns# nano /etc/pihole/pihole-FTL.conf
root@dns:/home/dns# pihole checkout ftl fix/lock_gravityClose
Please note that changing branches severely alters your Pi-hole subsystems
Features that work on the master branch, may not on a development branch
This feature is NOT supported unless a Pi-hole developer explicitly asks!
Have you read and understood this? [y/N] y

[✓] Branch fix/lock_gravityClose exists
[✓] Downloading and Installing FTL
[i] Restarting pihole-FTL se
root@dns:/home/dns#
root@dns:/home/dns# pihole status
[✗] DNS service is NOT listening
root@dns:/home/dns# tail /var/log/pihole-FTL.log
[2021-05-11 10:35:22.277 12689M] Resizing "FTL-queries" from 2621440 to (45056 * 64) == 2883584 (/dev/shm: 3.3MB used, 1.0GB total, FTL uses 3.3MB)
[2021-05-11 10:35:26.468 12689M] Resizing "FTL-domains" from 110592 to (5120 * 24) == 122880 (/dev/shm: 3.6MB used, 1.0GB total, FTL uses 3.6MB)
[2021-05-11 10:35:38.993 12689M] Resizing "FTL-strings" from 163840 to (204800 * 1) == 204800 (/dev/shm: 3.6MB used, 1.0GB total, FTL uses 3.6MB)
[2021-05-11 10:35:47.741 12689M] Resizing "FTL-queries" from 2883584 to (49152 * 64) == 3145728 (/dev/shm: 3.6MB used, 1.0GB total, FTL uses 3.6MB)
[2021-05-11 10:36:14.927 12689M] Resizing "FTL-queries" from 3145728 to (53248 * 64) == 3407872 (/dev/shm: 3.9MB used, 1.0GB total, FTL uses 3.9MB)
[2021-05-11 10:36:20.890 12689M] Resizing "FTL-domains" from 122880 to (5632 * 24) == 135168 (/dev/shm: 4.1MB used, 1.0GB total, FTL uses 4.1MB)
[2021-05-11 10:36:40.242 12689M] Resizing "FTL-queries" from 3407872 to (57344 * 64) == 3670016 (/dev/shm: 4.2MB used, 1.0GB total, FTL uses 4.1MB)
[2021-05-11 10:37:05.229 12689M] Resizing "FTL-queries" from 3670016 to (61440 * 64) == 3932160 (/dev/shm: 4.4MB used, 1.0GB total, FTL uses 4.4MB)
[2021-05-11 10:37:21.120 12689M] Resizing "FTL-domains" from 135168 to (6144 * 24) == 147456 (/dev/shm: 4.7MB used, 1.0GB total, FTL uses 4.7MB)
[2021-05-11 10:37:41.998 12689M] Resizing "FTL-queries" from 3932160 to (65536 * 64) == 4194304 (/dev/shm: 4.7MB used, 1.0GB total, FTL uses 4.7MB)
root@dns:/home/dns#

@DL6ER
Copy link
Member

DL6ER commented May 12, 2021

Okay, this does not suggest a crash. So let's wait until you're in a better position to look at the debug output once you're back. Enjoy your computer-free time!

@derekcentrico
Copy link
Author

Hey, I'll be back in the morning. What would you like me to do? Currently on master with all debug disabled so we're on the same page.

@derekcentrico
Copy link
Author

Allrighty, we are operational with the dev build. Not sure what was going on. Master branch crashed overnight and I pulled the FTL update without issue this time.

pihole checkout ftl fix/lock_gravityClose
Please note that changing branches severely alters your Pi-hole subsystems
Features that work on the master branch, may not on a development branch
This feature is NOT supported unless a Pi-hole developer explicitly asks!
Have you read and understood this? [y/N] y

[✓] Branch fix/lock_gravityClose exists
[✓] Downloading and Installing FTL
[✓] Restarting pihole-FTL service...
[✓] Enabling pihole-FTL service to start on reboot...

@derekcentrico
Copy link
Author

derekcentrico commented May 16, 2021

@DL6ER my second testing unit crashed overnight.

FTL tail (both .log and .log.1):

[2021-05-16 00:03:15.090 1074M] Obtained SHM lock for _FTL_dnssec() (/root/project/src/dnsmasq_interface.c:1471)
[2021-05-16 00:03:15.090 1074M] **** got DNSSEC details for : 2 (ID 15235810, /root/project/src/dnsmasq/forward.c:1244)
[2021-05-16 00:03:15.090 1074M] Removed lock in _FTL_dnssec() (/root/project/src/dnsmasq_interface.c:1511)
[2021-05-16 00:03:15.090 1074M] Waiting for SHM lock in _FTL_upstream_error() (/root/project/src/dnsmasq_interface.c:1521)
[2021-05-16 00:03:15.090 1074M] Obtained SHM lock for _FTL_upstream_error() (/root/project/src/dnsmasq_interface.c:1521)
[2021-05-16 00:03:15.090 1074M] **** got error report for : NOT IMPLEMENTED (ID 15235810, /root/project/src/dnsmasq/forward.c:770)
[2021-05-16 00:03:15.090 1074M] Removed lock in _FTL_upstream_error() (/root/project/src/dnsmasq_interface.c:1585)
[2021-05-16 00:03:15.091 1074M] Waiting for SHM lock in FTL_multiple_replies() (/root/project/src/dnsmasq_interface.c:2224)
[2021-05-16 00:03:15.091 1074M] Obtained SHM lock for FTL_multiple_replies() (/root/project/src/dnsmasq_interface.c:2224)
[2021-05-16 00:03:15.091 1074M] Removed lock in FTL_multiple_replies() (/root/project/src/dnsmasq_interface.c:2241)

.1:

[2021-05-16 04:59:14.963 21121/F1074] TCP worker terminating (timeout)
[2021-05-16 04:59:14.963 21121/F1074] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-16 04:59:15.007 21125/F1074] TCP worker terminating (timeout)
[2021-05-16 04:59:15.007 21125/F1074] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-16 04:59:15.099 21129/F1074] TCP worker terminating (timeout)
[2021-05-16 04:59:15.099 21129/F1074] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-16 04:59:58.148 21137/F1074] TCP worker terminating (timeout)
[2021-05-16 04:59:58.149 21137/F1074] Waiting for SHM lock in FTL_TCP_worker_terminating() (/root/project/src/dnsmasq_interface.c:2048)
[2021-05-16 05:00:00.548 1074/T1080] Waiting for SHM lock in resolveClients() (/root/project/src/resolve.c:364)
[2021-05-16 07:49:40.949 1074/T22877] Waiting for SHM lock in process_request() (/root/project/src/api/request.c:38)

PIhole.log:

May 16 04:55:11 dnsmasq[21129]: validation result is INSECURE
May 16 04:55:11 dnsmasq[21121]: dnssec-query[DS] edgekey.net to 208.67.220.220
May 16 04:55:11 dnsmasq[21121]: reply edgekey.net is no DS
May 16 04:55:11 dnsmasq[21121]: dnssec-query[DS] akamaiedge.net to 208.67.220.220
May 16 04:55:11 dnsmasq[21121]: reply akamaiedge.net is no DS
May 16 04:55:11 dnsmasq[21121]: validation result is INSECURE
May 16 04:55:11 dnsmasq[20262]: gravity blocked api16-access-sg.pangle.io is 0.0.0.0
May 16 04:55:11 dnsmasq[21125]: forwarded as-sg.shadow.igamecj.com to 208.67.220.220
May 16 04:55:27 dnsmasq[20262]: query[A] api16-access-sg.pangle.io from 127.0.0.1

Config file:

DEBUG_GC=true
DEBUG_LOCKS=true
DEBUG_QUERIES=true
DEBUG_CLIENTS=true
EDNS0_ECS=true
RATE_LIMIT=500/60
NAMES_FROM_NETDB=false
MAXLOGAGE=8
MAXNETAGE=7
MAXDBDAYS=7
DBINTERVAL=1.0
PRIVACYLEVEL=0

@DL6ER
Copy link
Member

DL6ER commented May 19, 2021

Sorry for the delay in responding.

my second testing unit crashed overnight.

Was it using the same version of the fix branch? You can check this with pihole-FTL -v.
They should be the same version containing gd5a33a4 at the end.

@DL6ER DL6ER linked a pull request May 19, 2021 that will close this issue
5 tasks
@derekcentrico
Copy link
Author

Looks like I wasted your time on that. Amusingly test unit 2 crashed again. Will pull now and move forward.

Test unit 1: vDev-d5a33a4
Test unit 2: vDev-809e721

@derekcentrico
Copy link
Author

Okay, so no crashes on either test unit since last post where figured out an update didn't go through.

Lesson learned: don't run automated update scripts with pihole when testing. Apparently, it failed to download FTL because FTL crashed and resolv.conf was 127.0.0.1. Doh!

@DL6ER
Copy link
Member

DL6ER commented May 22, 2021

Okay, no crashes is very good news.

@DL6ER
Copy link
Member

DL6ER commented May 23, 2021

Thanks! The fix has been merged to development and will be included in the next release of Pi-hole.

@derekcentrico
Copy link
Author

Thanks! The fix has been merged to development and will be included in the next release of Pi-hole.

Cool beans. Got a timeframe for release? Debating pushing developmental to my others to be safe. They crash but much more rarely.

@DL6ER
Copy link
Member

DL6ER commented May 24, 2021

No real timeframe but shouldn't take long. But saying it will be hours or days or rather weeks is beyond what I can do right now because more people are involved in the process. Using development on the others shouldn't harm. If it does, it's important we find this out before releasing. Any testing is always much appreciated and we try to be quick with resolving any issues you may find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants