-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up 10.4+ timezone initialization #320
Conversation
FYI mariadb releases have occurred. I didn't get a fix into upstream before the release. Further analysis indicates its not just (fuse)overlayfs affected per upstream MDEV. While disabling the crash safety during initialization has some risks, any errors will abort the starting of the container because of the SQL errors. I do have a crash safe performance optimization work in progress that will be ready for next release (and consumes ~3s for the tz initialization). This change will help the default deployment of mariadb containers of the user base without penalty. |
This is the bit that has me confused -- this image defines The common thread we saw in the "slowness" discussions was spinning disks vs SSDs (or even SSDs with very low available IOPS), so I'd love to make sure we're testing the same thing before we merge a fix which is made assuming the two slowness tests are the same. |
I was wrong about overlayfs being the cause. I generally saw problems even on my local nvme. tmpfs as a VOLUME didn't seem to be an issue. Catch me as |
Oh, I'm aware there's a lot that's gone into this (I've been following your adventures in https://jira.mariadb.org/browse/MDEV-23326 😄), I just want to make sure you've done some tests on a non-NVMe (preferably spinning disk) drive as well to ensure the change is still dramatic there before we consider #262 fully "fixed" / closed. @yosifkit just ran a simple test on a spinning drive in his system with 10.5.4 and it took ~11s before it even started the temporary server, and it was a full three minutes later when the temporary server was stopped (doing nothing but loading timezone data and setting a root password), so it's significantly more dramatic on a spinning drive, and I just want to make sure your testing has covered that case (since that's the one that's the most common in #262). He's going to test this change on that same drive to get a simple comparison. 👍 |
He had to test this change against 10.5.5 (because 10.5.4 is no longer available thanks to the new version being published) but it went from ~3m down to ~7s, so I'd say that's pretty compelling. 😅 |
Model Family: Western Digital Green Device Model: WDC WD40EZRX-00SPEB0 Serial Number: WD-WCC4E5000UCH ext4 mounted on /home/dan/datadir rest of smart output showed it to be in not a great state. test script for v in 10.3 10.4 do podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass \ --expose 3306 \ --volume /home/dan/datadir/data$v:/var/lib/mysql:Z \ --name maria$v mariadb_test:$v & sleep 1 time grep -iq "ready for start up" <(podman logs -f maria$v 2>&1) podman logs maria$v sleep 1 podman kill maria$v sleep 1 done 10.3 result + podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.3:/var/lib/mysql:Z --name maria10.3 mariadb_test:10.3 b5e35dbf6783dc0fffd3b41d755ddfae8617260f68abcde196287569a1b619f3 + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.3 real 0m7.789s user 0m0.000s sys 0m0.002s 10.4 result + podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.4:/var/lib/mysql:Z --name maria10.4 mariadb_test:10.4 0b86680f45cc7f8af3e0e96e136ab2c6799187767e3517cc59f50dd15e065a61 + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.4 real 0m13.793s user 0m0.000s sys 0m0.002s + podman logs maria10.4 |
and before change: + podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.4:/var/lib/mysql:Z --name maria10.4 mariadb:10.4 7365495faf0f4767909ea1818b0290730a51f40db45011767ab5b34ab300b39e + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.4 real 1m36.864s user 0m0.000s sys 0m0.002s + podman run -d --rm -e MYSQL_ROOT_PASSWORD=pass --expose 3306 --volume /home/dan/datadir/data10.3:/var/lib/mysql:Z --name maria10.3 mariadb:10.3 c78d97c1889a0bdf37e87da7ef673046418bb5307cfab6c8265253445ecba2de + grep -iq 'ready for start up' /dev/fd/63 ++ podman logs -f maria10.3 real 0m7.786s user 0m0.002s sys 0m0.000s So remaining question is if you want to script in some Aria recovery |
On crash recovery, I managed to kill the statup of 10.3 (MyISAM) with a volume and the restart detected errors in the tz tables. The same applies now in 10.4 (though I haven't got the timings right - from MDEV seems there's a ~1 s window). As such I propose to leave that as is. |
MariaDB-10.4 defaulted to Aria for system tables. This introduced crash safety under the name of "transactional" that was not previously in MyISAM. The Aria implementation of checkpointing incurs significant penality on fuse-overlayfs that occurs significantly in container environments, especially those without a /var/lib/mysql volume. We work around this penality by disabling the crash safety of timezone tables for the period of timezone initialization. Analysis and timings are in https://jira.mariadb.org/browse/MDEV-23326 and local tests show that 10.4 is only 0.8 seconds slower than 10.3 on startup (6.8 seconds total). Version specific comments are used to ensure that ALTER TABLE statements aren't run on < 10.4 server versions. closes MariaDB#262
Co-authored-by: Tianon Gravi <[email protected]>
108b168
to
88ff4ee
Compare
Nice, thank you!! 🤘 ❤️ I did a rebase against master (and ran |
Changes: - MariaDB/mariadb-docker@83f552f: Merge pull request MariaDB/mariadb-docker#320 from grooverdan/MDEV-23326-issue262 - MariaDB/mariadb-docker@88ff4ee: reduce docker_process_sql runs - MariaDB/mariadb-docker@3a151d9: Speed up 10.4+ timezone initialization - MariaDB/mariadb-docker@846fe2f: Update to 1:10.4.14+maria~focal - MariaDB/mariadb-docker@b847957: Update to 1:10.3.24+maria~focal - MariaDB/mariadb-docker@a43b52d: Update to 1:10.2.33+maria~bionic - MariaDB/mariadb-docker@f44c127: Update to 1:10.1.46+maria-1~bionic - MariaDB/mariadb-docker@7c75646: Update to 1:10.5.5+maria~focal
Monty suggested after #320 was submitted that LOCK TABLES reduces the IO by making the fdatasync occur at the UNLOCK TABLES. The advantage of this is that the timezone tables are not written twice. The two impedimates are that TRUNCATE TABLE is in the output of mysql_tzinfo_to_sql, which implictly UNLOCKS the tables, and START TRANSACTION, useful for Galera, but also implictly UNLOCKS. A comparison of the method prior to commit (with TRUNCATE/START TRANSACTION removed for a fair comparision). real 0m1.865s user 0m0.205s sys 0m0.091s To now: real 0m1.254s user 0m0.193s sys 0m0.120s Lower performing storage will show better gains. Further improving mysql_tzinfo_to_sql remains server task MDEV-23326. https://bugs.mysql.com/bug.php?id=20545 doesn't output "Local time" in MariaDB in combination with the bionic/focal images so remove it. As tested by: $ podman run --rm mariadb:10.{2,6} mysql_tzinfo_to_sql /usr/share/zoneinfo | grep 'Local time'
MariaDB-10.4 defaulted to Aria for system tables.
This introduced crash safety under the name of "transactional"
that was not previously in MyISAM.
The Aria implementation of checkpointing incurs significant
penalty on fuse-overlayfs that occurs significantly in
container environments, especially those without a
/var/lib/mysql volume.
We work around this penalty by disabling the crash
safety of timezone tables for the period of timezone
initialization.
Analysis and timings are in https://jira.mariadb.org/browse/MDEV-23326
and local tests show that 10.4 is only 0.8 seconds slower
than 10.3 on startup (6.8 seconds total).
Version specific comments are used to ensure that ALTER TABLE
statements aren't run on < 10.4 server versions.
closes #262
I'm unconvinced I can get any significant fix into MariaDB before the next release so this should close off a major issue for the next release(s).
This won't be the end of the story. Lets see if we can do all the docker_setup_db under docker_init_database_dir with a little upstream help and improve the statup time again.