Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase SQLite3's cache size #1489

Merged
merged 1 commit into from
Nov 18, 2022
Merged

Increase SQLite3's cache size #1489

merged 1 commit into from
Nov 18, 2022

Conversation

DL6ER
Copy link
Member

@DL6ER DL6ER commented Nov 17, 2022

What does this implement/fix?

Increase default SQLite3 cache size from ~2 MiB to 16 MiB

FTL heavily relies on the B-tree of the databases to be able to make extremely fast decisions if a domain is to be blocked. The speed-up between a B-tree and a conventional lookup is on the order of O(N/log(N)) so something like 7,000x for 120,000 gravity-blocked domains or roughly 100,000x for two million domains.

For this to be efficient, FTL needs to have sufficiently fast access to the B-tree. By default, SQLite3 uses an upper limit for the cache of roughly 2 MiB. This PR increases this limit to 16 MiB. Given how much memory FTL uses overall, this seems an acceptable amount.

One further technical detail here: One domain takes roughly 32 bytes of memory in the index (it's a bit more because of bookkeeping things). So the default cache is only sufficient for about 60,000 domains. Increasing this to 16 MiB will make enough room for at least 480,000 domains.

You may ask "but can FTL be as fast with a cache is too small at the moment?" and this is indeed a very good question. Even when the cache is too small to keep the entire tree in memory, the kernel sees that you're reading from the file gravity.db really really often and decides to cache the file for you. It's the kernel's way to make intelligent use of "free" memory. So - in a way - the table already completely resides in memory just now.
So what is this PR going to change? Well, when the entire file is kept in the kernel's cache, we still have to interact with the database file and read the file and do all the interactions using I/O whereas, when the cache is sufficiently large, SQLite3 will be able to cache the tree itself and does not need to read the file ever and ever again.

What's the benefit? It's really hard to measure. It can surely be seen as a kind of micro-optimization but it something that basically comes for free and allows us to notably reduce I/O that has to go through the kernel and, hence, reduces syscalls and overall latency.

Related issue or feature (if applicable): N/A

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code and I have tested my changes.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

  • The code change is tested and works locally.
  • I have read the above and my PR is ready for review. Check this box to confirm

@DL6ER DL6ER requested a review from a team November 17, 2022 17:59
@yubiuser
Copy link
Member

yubiuser commented Nov 17, 2022

So at the moment, the cache is not even big enough to hold the 158531 domains from the default adlist (https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts)?

src/CMakeLists.txt Outdated Show resolved Hide resolved
@DL6ER
Copy link
Member Author

DL6ER commented Nov 17, 2022

So at the moment, the cache is not even big enough to hold the 158531 domains from the default adlist ?

Yes...

@DL6ER DL6ER force-pushed the tweak/sqlite3_cachesize branch from f62f0f9 to d10887d Compare November 17, 2022 18:17
@yubiuser
Copy link
Member

Yes...

Maybe we should even increase it a bit more? 24MiB for a maximum of 750000?

@DL6ER
Copy link
Member Author

DL6ER commented Nov 17, 2022

I don't know, it's open for discussion. This is not a global cache but a per-connection (i.e., per-file) cache. So a connection to FTL's database will consume (up to) the same memory as does the gravity database connection. I also thought about making this configurable rather than hard-code anything but then this seemed too much effort given how small the actual performance advantage might be (noone really knows)

@rdwebdesign
Copy link
Member

My only concern is about low memory devices, like Raspberry Pi Zero or even Raspberry Pi 1 and 2.

Could this be a problem for any of those devices?

Copy link
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if this is per file (gravity.db and pihole-FTL.db) we should keep it at 16MiB. The lowest RAM amount on a RPi is 256. So we take already 32 MiB from it.

__

I know it's only logged when debug is enabled, but I think this is logged to often. Esp. because the value does not change during runtime.

[2022-11-17 19:26:33.503 21146/T21162] FTL database cache size is 16.0 MiB
[2022-11-17 19:26:33.504 21146/T21162] Found database host name (same address) 10.0.1.26 -> Google-Pixel-4a.fritz.box
[2022-11-17 19:26:33.504 21146/T21162] Closing FTL database in getNameFromIP() (/__w/FTL/FTL/src/database/network-table.c:2074)
[2022-11-17 19:26:33.505 21146/T21162] Opening FTL database in getIfaceFromIP() (/__w/FTL/FTL/src/database/network-table.c:2153)
[2022-11-17 19:26:33.508 21146/T21162] FTL database cache size is 16.0 MiB
[2022-11-17 19:26:33.508 21146/T21162] getDatabaseHostname(): "SELECT interface FROM network JOIN network_addresses ON network_addresses.network_id = network.id WHERE network_addresses.ip = ? AND interface != 'N/A' AND interface IS NOT NULL;" with ? = "10.0.1.26"
[2022-11-17 19:26:33.509 21146/T21162] Found database interface 10.0.1.26 -> eth0
[2022-11-17 19:26:33.509 21146/T21162] Closing FTL database in getIfaceFromIP() (/__w/FTL/FTL/src/database/network-table.c:2223)
[2022-11-17 19:26:33.511 21146/T21162] Opening FTL database in getMACfromIP() (/__w/FTL/FTL/src/database/network-table.c:1834)
[2022-11-17 19:26:33.514 21146/T21162] FTL database cache size is 16.0 MiB
[2022-11-17 19:26:33.515 21146/T21162] Found database hardware address 10.0.1.57 -> 24:29:34:91:56:42
[2022-11-17 19:26:33.515 21146/T21162] Closing FTL database in getMACfromIP() (/__w/FTL/FTL/src/database/network-table.c:1904)
[2022-11-17 19:26:33.516 21146/T21162] Opening FTL database in getNameFromIP() (/__w/FTL/FTL/src/database/network-table.c:2011)
[2022-11-17 19:26:33.519 21146/T21162] FTL database cache size is 16.0 MiB
[2022-11-17 19:26:33.519 21146/T21162] Found database host name (same address) 10.0.1.57 -> Pixel-6a.fritz.box
[2022-11-17 19:26:33.519 21146/T21162] Closing FTL database in getNameFromIP() (/__w/FTL/FTL/src/database/network-table.c:2074)
[2022-11-17 19:26:33.520 21146/T21162] Opening FTL database in getIfaceFromIP() (/__w/FTL/FTL/src/database/network-table.c:2153)
[2022-11-17 19:26:33.523 21146/T21162] FTL database cache size is 16.0 MiB
[2022-11-17 19:26:33.523 21146/T21162] getDatabaseHostname(): "SELECT interface FROM network JOIN network_addresses ON network_addresses.network_id = network.id WHERE network_addresses.ip = ? AND interface != 'N/A' AND interface IS NOT NULL;" with ? = "10.0.1.57"
[2022-11-17 19:26:33.524 21146/T21162] Found database interface 10.0.1.57 -> eth0
[2022-11-17 19:26:33.524 21146/T21162] Closing FTL database in getIfaceFromIP() (/__w/FTL/FTL/src/database/network-table.c:2223)
[2022-11-17 19:26:33.527 21146/T21162] Opening FTL database in getMACfromIP() (/__w/FTL/FTL/src/database/network-table.c:1834)
[2022-11-17 19:26:33.529 21146/T21162] FTL database cache size is 16.0 MiB

@DL6ER
Copy link
Member Author

DL6ER commented Nov 17, 2022

As it is per-connection, we log it once for every opened connection. Any suggestions?

@yubiuser
Copy link
Member

Only log it for the first database connection (per-file).

E.g. during start-up somewhere along

[2022-11-17 19:27:10.768 21275M] Database version is 12
[2022-11-17 19:27:10.768 21275M] Imported 0 alias-clients
[2022-11-17 19:27:10.768 21275M] Database successfully initialized

and

[2022-11-17 19:27:12.754 21277/T21293] REGEX WARNING: Invalid regex blacklist filter "[[[[": Missing ']'
[2022-11-17 19:27:12.876 21277/T21293] Compiled 3 whitelist and 25 blacklist regex filters for 13 clients in 129.8 msec
[2022-11-17 19:27:12.876 21277/T21293] Adlist warning: Adlist with ID 58 (www.test.com/test) was inaccessible during last gravity run

@rdwebdesign
Copy link
Member

The only reason I see to log many times is if this value can vary for each connection during runtime.
Will the cache be always 16MB or it can be less?
Do we need to know its size every time?

@DL6ER DL6ER force-pushed the tweak/sqlite3_cachesize branch from d10887d to fe4077e Compare November 18, 2022 09:52
@DL6ER DL6ER merged commit e5bbab0 into development Nov 18, 2022
@DL6ER DL6ER deleted the tweak/sqlite3_cachesize branch November 18, 2022 10:00
@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pi-hole-ftl-v5-20-and-web-v5-18-released/59959/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants