Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use link-time optimization for the production build #10942

Closed
mbautin opened this issue Dec 26, 2021 · 2 comments
Closed

Use link-time optimization for the production build #10942

mbautin opened this issue Dec 26, 2021 · 2 comments
Assignees

Comments

@mbautin
Copy link
Contributor

mbautin commented Dec 26, 2021

It would probably be very good for YugabyteDB performance to compile all YB + postgres code using Clang's LTO (link time optimization).


Preliminary results produced with Clang 12, thin LTO, x86_64, using the Linuxbrew glibc and other libraries. First and third experiments are on the LTO build, the middle one is the non-LTO release build.

# java -jar ~/yb-sample-apps.jar --workload CassandraKeyValue --num_unique_keys 1000000 --nodes 127.0.0.1:9042 --num_threads_read 0 --num_threads_write 32 --uuid 1b3d9ad2-85cd-4aa2-93a2-f441449f4e9e --num_reads 0 --num_writes 10000000
0 [main] INFO com.yugabyte.sample.Main  - Starting sample app...
23 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Using given UUID : 1b3d9ad2-85cd-4aa2-93a2-f441449f4e9e
30 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - App: CassandraKeyValue
30 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Run time (seconds): -1
30 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Adding node: 127.0.0.1:9042
30 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num reader threads: 0, num writer threads: 32
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num unique keys to insert: 1000000
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num keys to update: 9000000
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num keys to read: 0
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Value size: 0
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Restrict values to ASCII strings: false
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Perform sanity check at end of app run: false
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Table TTL (secs): -1
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Local reads: false
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Read only load: false
32 [main] INFO com.yugabyte.sample.apps.AppBase  - Creating YCQL tables...
114 [main] INFO com.yugabyte.sample.apps.AppBase  - Connecting with 4 clients to nodes: /127.0.0.1:9042
758 [main] INFO com.yugabyte.sample.apps.AppBase  - Created a Cassandra table using query: [CREATE TABLE IF NOT EXISTS CassandraKeyValue (k varchar, v blob, primary key (k));]
5778 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 29927.23 ops/sec (0.90 ms/op), 149753 total ops  |  Uptime: 5020 ms | maxWrittenKey: 149758 | maxGeneratedKey: 149804 | 
10779 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 34702.05 ops/sec (0.92 ms/op), 323312 total ops  |  Uptime: 10021 ms | maxWrittenKey: 323317 | maxGeneratedKey: 323350 | 
15780 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 36894.89 ops/sec (0.87 ms/op), 507821 total ops  |  Uptime: 15022 ms | maxWrittenKey: 507821 | maxGeneratedKey: 507860 | 
20780 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 37778.57 ops/sec (0.85 ms/op), 696742 total ops  |  Uptime: 20022 ms | maxWrittenKey: 696745 | maxGeneratedKey: 696778 | 
25781 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 33203.34 ops/sec (0.96 ms/op), 862780 total ops  |  Uptime: 25023 ms | maxWrittenKey: 862763 | maxGeneratedKey: 862815 | 
30782 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 38821.92 ops/sec (0.82 ms/op), 1056909 total ops  |  Uptime: 30023 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000035 | 
35782 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 38607.49 ops/sec (0.83 ms/op), 1249967 total ops  |  Uptime: 35024 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000035 | 
40783 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 38846.92 ops/sec (0.82 ms/op), 1444220 total ops  |  Uptime: 40025 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000035 | 
45783 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 39106.31 ops/sec (0.82 ms/op), 1639771 total ops  |  Uptime: 45025 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000035 | 
^C^C23:32:47 12/25/2021 mb@hyperion ~/code/yugabyte-db20 (clang12_linuxbrew[*5]) # bin/yb-ctl stop
Stopping cluster.
23:32:54 12/25/2021 mb@hyperion ~/code/yugabyte-db20 (clang12_linuxbrew[*5]) # java -jar ~/yb-sample-apps.jar --workload CassandraKeyValue --num_unique_keys 1000000 --nodes 127.0.0.1:9042 --num_threads_read 0 --num_threads_write 32 --uuid 1b3d9ad2-85cd-4aa2-93a2-f441449f4e9e --num_reads 0 --num_writes 10000000
1 [main] INFO com.yugabyte.sample.Main  - Starting sample app...
24 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Using given UUID : 1b3d9ad2-85cd-4aa2-93a2-f441449f4e9e
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - App: CassandraKeyValue
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Run time (seconds): -1
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Adding node: 127.0.0.1:9042
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num reader threads: 0, num writer threads: 32
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num unique keys to insert: 1000000
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num keys to update: 9000000
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num keys to read: 0
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Value size: 0
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Restrict values to ASCII strings: false
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Perform sanity check at end of app run: false
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Table TTL (secs): -1
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Local reads: false
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Read only load: false
33 [main] INFO com.yugabyte.sample.apps.AppBase  - Creating YCQL tables...
116 [main] INFO com.yugabyte.sample.apps.AppBase  - Connecting with 4 clients to nodes: /127.0.0.1:9042
766 [main] INFO com.yugabyte.sample.apps.AppBase  - Created a Cassandra table using query: [CREATE TABLE IF NOT EXISTS CassandraKeyValue (k varchar, v blob, primary key (k));]
5785 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 23571.75 ops/sec (1.14 ms/op), 117907 total ops  |  Uptime: 5018 ms | maxWrittenKey: 117903 | maxGeneratedKey: 117947 | 
10786 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 27413.72 ops/sec (1.17 ms/op), 255010 total ops  |  Uptime: 10019 ms | maxWrittenKey: 254998 | maxGeneratedKey: 255043 | 
15786 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 28203.55 ops/sec (1.13 ms/op), 396045 total ops  |  Uptime: 15019 ms | maxWrittenKey: 396045 | maxGeneratedKey: 396079 | 
20787 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 27856.51 ops/sec (1.15 ms/op), 535343 total ops  |  Uptime: 20020 ms | maxWrittenKey: 535347 | maxGeneratedKey: 535384 | 
25788 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 28119.33 ops/sec (1.14 ms/op), 675963 total ops  |  Uptime: 25021 ms | maxWrittenKey: 675955 | maxGeneratedKey: 675998 | 
30788 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 28212.35 ops/sec (1.13 ms/op), 817040 total ops  |  Uptime: 30021 ms | maxWrittenKey: 817039 | maxGeneratedKey: 817074 | 
35789 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 26550.31 ops/sec (1.20 ms/op), 949804 total ops  |  Uptime: 35022 ms | maxWrittenKey: 949795 | maxGeneratedKey: 949838 | 
40789 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 27485.51 ops/sec (1.16 ms/op), 1087246 total ops  |  Uptime: 40022 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000036 | 
45790 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 27419.71 ops/sec (1.17 ms/op), 1224356 total ops  |  Uptime: 45023 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000036 | 
50790 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 27551.38 ops/sec (1.16 ms/op), 1362123 total ops  |  Uptime: 50023 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000036 | 
55790 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 27624.47 ops/sec (1.16 ms/op), 1500257 total ops  |  Uptime: 55023 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000036 | 


60791 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 28704.50 ops/sec (1.11 ms/op), 1643795 total ops  |  Uptime: 60024 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000036 | 
^C^C23:34:21 12/25/2021 mb@hyperion ~/code/yugabyte-db20 (clang12_linuxbrew[*5]
23:34:30 12/25/2021 mb@hyperion ~/code/yugabyte-db20 (clang12_linuxbrew[*5]) # 
23:34:30 12/25/2021 mb@hyperion ~/code/yugabyte-db20 (clang12_linuxbrew[*5]) # bin/yb-ctl start
Starting cluster with base directory /home/mbautin/yugabyte-data
Waiting for cluster to be ready.
----------------------------------------------------------------------------------------------------
| Node Count: 1 | Replication Factor: 1                                                            |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte                                  |
| YSQL Shell          : build/latest/bin/ysqlsh                                                    |
| YCQL Shell          : build/latest/bin/ycqlsh                                                    |
| YEDIS Shell         : build/latest/bin/redis-cli                                                 |
| Web UI              : http://127.0.0.1:7000/                                                     |
| Cluster Data        : /home/mbautin/yugabyte-data                                                |
----------------------------------------------------------------------------------------------------

For more info, please use: yb-ctl status
23:34:36 12/25/2021 mb@hyperion ~/code/yugabyte-db20 (clang12_linuxbrew[*5]) # java -jar ~/yb-sample-apps.jar --workload CassandraKeyValue --num_unique_keys 1000000 --nodes 127.0.0.1:9042 --num_threads_read 0 --num_threads_write 32 --uuid 1b3d9ad2-85cd-4aa2-93a2-f441449f4e9e --num_reads 0 --num_writes 10000000
0 [main] INFO com.yugabyte.sample.Main  - Starting sample app...
24 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Using given UUID : 1b3d9ad2-85cd-4aa2-93a2-f441449f4e9e
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - App: CassandraKeyValue
31 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Run time (seconds): -1
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Adding node: 127.0.0.1:9042
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num reader threads: 0, num writer threads: 32
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num unique keys to insert: 1000000
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num keys to update: 9000000
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Num keys to read: 0
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Value size: 0
32 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Restrict values to ASCII strings: false
33 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Perform sanity check at end of app run: false
33 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Table TTL (secs): -1
33 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Local reads: false
33 [main] INFO com.yugabyte.sample.common.CmdLineOpts  - Read only load: false
34 [main] INFO com.yugabyte.sample.apps.AppBase  - Creating YCQL tables...
122 [main] INFO com.yugabyte.sample.apps.AppBase  - Connecting with 4 clients to nodes: /127.0.0.1:9042
795 [main] INFO com.yugabyte.sample.apps.AppBase  - Created a Cassandra table using query: [CREATE TABLE IF NOT EXISTS CassandraKeyValue (k varchar, v blob, primary key (k));]
5812 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Uptime: 5017 ms | maxWrittenKey: -1 | maxGeneratedKey: 31 | 
10813 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 12545.47 ops/sec (4.70 ms/op), 62739 total ops  |  Uptime: 10018 ms | maxWrittenKey: 62736 | maxGeneratedKey: 62773 | 
15814 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 31498.96 ops/sec (1.01 ms/op), 220261 total ops  |  Uptime: 15019 ms | maxWrittenKey: 220265 | maxGeneratedKey: 220300 | 
20815 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 35629.93 ops/sec (0.90 ms/op), 398438 total ops  |  Uptime: 20020 ms | maxWrittenKey: 398433 | maxGeneratedKey: 398472 | 
25815 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 35396.18 ops/sec (0.90 ms/op), 575443 total ops  |  Uptime: 25020 ms | maxWrittenKey: 575438 | maxGeneratedKey: 575476 | 
30816 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 33898.51 ops/sec (0.94 ms/op), 744953 total ops  |  Uptime: 30021 ms | maxWrittenKey: 744955 | maxGeneratedKey: 744988 | 
35816 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 34629.78 ops/sec (0.92 ms/op), 918120 total ops  |  Uptime: 35021 ms | maxWrittenKey: 918099 | maxGeneratedKey: 918151 | 
40817 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 35098.44 ops/sec (0.91 ms/op), 1093631 total ops  |  Uptime: 40022 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
45817 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 32220.56 ops/sec (0.99 ms/op), 1254750 total ops  |  Uptime: 45022 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
50818 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 36542.55 ops/sec (0.87 ms/op), 1437477 total ops  |  Uptime: 50023 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
55818 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 35305.74 ops/sec (0.91 ms/op), 1614022 total ops  |  Uptime: 55023 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
60819 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 34953.37 ops/sec (0.91 ms/op), 1788806 total ops  |  Uptime: 60024 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
65819 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 29875.24 ops/sec (1.07 ms/op), 1938196 total ops  |  Uptime: 65024 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
70819 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 33644.50 ops/sec (0.95 ms/op), 2106431 total ops  |  Uptime: 70024 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
75820 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 29036.93 ops/sec (0.99 ms/op), 2251631 total ops  |  Uptime: 75025 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
80820 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 35234.12 ops/sec (1.00 ms/op), 2427820 total ops  |  Uptime: 80025 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 
85821 [Thread-2] INFO com.yugabyte.sample.common.metrics.MetricsTracker  - Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 35178.53 ops/sec (0.91 ms/op), 2603724 total ops  |  Uptime: 85026 ms | maxWrittenKey: 999999 | maxGeneratedKey: 1000037 | 

The build is done with -fwhole-program, and YB + postgres code are currently included in LTO (could also include third-party libraries). Only yb-tserver is compiled with LTO.

 # ldd ./yb-tserver
	linux-vdso.so.1 (0x00007fffae5ec000)
	libtcmalloc.so.4 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libtcmalloc.so.4 (0x00007f23ab3b8000)
	libunwind.so.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libunwind.so.1 (0x00007f23ab9d3000)
	libboost_date_time.so.1.69.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libboost_date_time.so.1.69.0 (0x00007f23ab9c7000)
	libldap-2.4.so.2 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libldap-2.4.so.2 (0x00007f23ab16f000)
	liblber-2.4.so.2 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/liblber-2.4.so.2 (0x00007f23aaf5f000)
	libprofiler.so.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libprofiler.so.0 (0x00007f23aad4a000)
	libz.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libz.so.1 (0x00007f23aab35000)
	libuuid.so.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libuuid.so.1 (0x00007f23aa931000)
	libboost_atomic.so.1.69.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libboost_atomic.so.1.69.0 (0x00007f23ab9c0000)
	libboost_system.so.1.69.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libboost_system.so.1.69.0 (0x00007f23ab9bc000)
	libboost_thread.so.1.69.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libboost_thread.so.1.69.0 (0x00007f23ab99a000)
	libglog.so.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libglog.so.0 (0x00007f23ab962000)
	libgflags.so.2 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libgflags.so.2 (0x00007f23ab945000)
	libev.so.4 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libev.so.4 (0x00007f23ab930000)
	libcds.so.2.3.3 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libcds.so.2.3.3 (0x00007f23ab924000)
	libicui18n.so.67 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libicui18n.so.67 (0x00007f23aa61e000)
	libicuuc.so.67 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libicuuc.so.67 (0x00007f23aa42a000)
	libcrcutil.so.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libcrcutil.so.0 (0x00007f23aa21a000)
	libprotobuf.so.15 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libprotobuf.so.15 (0x00007f23a9d5d000)
	librt.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/librt.so.1 (0x00007f23a9b55000)
	libbacktrace.so.0 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libbacktrace.so.0 (0x00007f23a9948000)
	libcurl.so.4 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libcurl.so.4 (0x00007f23ab8b9000)
	libdl.so.2 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libdl.so.2 (0x00007f23a9744000)
	libsnappy.so.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libsnappy.so.1 (0x00007f23a953b000)
	libcrypto.so.1.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libcrypto.so.1.1 (0x00007f23a926b000)
	libssl.so.1.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/common/lib/libssl.so.1.1 (0x00007f23ab824000)
	libm.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libm.so.6 (0x00007f23a8f68000)
	libcrypt.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libcrypt.so.1 (0x00007f23a8d31000)
	libicudata.so.67 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/lib/libicudata.so.67 (0x00007f23a7219000)
	libc++.so.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/lib/libc++.so.1 (0x00007f23a7145000)
	libc++abi.so.1 => /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064126-dd4872fe56-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/lib/libc++abi.so.1 (0x00007f23a7102000)
	libgcc_s.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libgcc_s.so.1 (0x00007f23a6eeb000)
	libpthread.so.0 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libpthread.so.0 (0x00007f23a6cce000)
	libc.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libc.so.6 (0x00007f23a6932000)
	libresolv.so.2 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libresolv.so.2 (0x00007f23a671b000)
	/opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/ld.so => /lib64/ld-linux-x86-64.so.2 (0x00007f23ab7ba000)
	libatomic.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libatomic.so.1 (0x00007f23a6513000)
@mbautin mbautin self-assigned this Dec 26, 2021
@mbautin
Copy link
Contributor Author

mbautin commented Dec 29, 2021

A preliminary/naive single-node performance test of a yb-tserver binary built with Clang 12 link-time optimization (LTO) shows about 30% throughput improvement on a write-only CassandraKeyValue workload. LTO works by putting LLVM bitcode into .o files instead of native code, and at link time, the entire program is loaded into memory and optimized as a whole. E.g. this allows better inlining, devirtualization (replacing virtual function calls with direct calls in case the class is known at compile time), etc. For a dynamically linked program (the way we build code today), these are not possible because in theory any function could be replaced by a different implementation, e.g. through LD_PRELOAD. https://gist.githubusercontent.com/mbautin/6d2debaef1286aa045afde0c08853760/raw -- and linking the remaining shared library statically could be even better ( right now a few libraries are still dynamically linked: https://gist.githubusercontent.com/mbautin/bc8769a9ae93f8d6d1f2244591a08376/raw ). Potentially we could even link statically with glibc (we'll have to rebuild it). On the flip side, this statically linked binary is 480 MB with debug info (but only 61 MB without it). I was thinking of creating a "busybox style" binary (busybox is a single executable that provides lots of Unix utilities -- https://en.wikipedia.org/wiki/BusyBox ). So we could create one binary that can be yb-master, yb-tserver, or postgres, depending on argv[0]. And the rest of the tools in our release tarball could still use dynamic linking the same way they do today.

mbautin added a commit that referenced this issue Jan 16, 2022
…lang 12

Summary:
Link-time optimization ( https://llvm.org/docs/LinkTimeOptimization.html ) allows for more aggressive optimizations, including inlining, compared to the shared library based model that we currently use.

This diff enables link-time optimization for the Clang 12 Linuxbrew-based release build for the yb-tserver executable only, producing a binary that statically links all object files needed by yb-tserver, including those that are included in the yb_pgbackend library. Third-party libraries are being linked statically but they are not LTO-enabled yet.

The linking of the final LTO-enabled binary is currently being done outside of the CMake build system, using the dependency_graph.py tool that can access the dependency graph of targets and object files, and therefore has all the information needed to construct the linker command line. This also gives us more flexibility customizing the linker command line compared to attempts to do this in the CMake build system. Moving this linking step to CMake may be a future project.

Refactored the dependency_graph.py script into multiple modules: dependency_graph.py, dep_graph_common.py, source_files.py, as well as lto.py (with the new LTO logic).

Also refactored master_main.cc and tablet_server_main.cc and extracted common initialization code to tserver/server_main_util.cc. It is in the tserver directory because the master code currently uses the tserver code.

For building LTO-enabled binaries, we need to use LLVM's lld linker. It has issues with our distributed compilation framework ( #11034 ). Fixing this by always running LLD-enabled linking commands locally and not on a remote build worker.

Various static initialization issues were identified as fixed as part of this work. If not fixed, these would result in the yb-tserver binary crashing immediately with a core dump.
- In consensus_queue.cc, the RpcThrottleThresholdBytesValidator function for validating the rpc_throttle_threshold_bytes flag was trying to access other flags before they were fully initialized. Moved this validation to the main program.
- The webserver_doc_root flag was calling yb::GetDefaultDocumentRoot() to determine its default value. Moved that default value determination to where the flag is being used.
- [ #11033 ] The INTERNAL_TRACE_EVENT_ADD_SCOPED macro, when invoked during static initialization, led to a crash in std::string construction. Added a new atomic trace_events_enabled for enabling trace events and only turned it on after main() started executing. The INTERNAL_TRACE_EVENT_ADD_SCOPED is a no-op before trace_events_enabled is set to true.
- [ #10964 ] The kGlobalTransactionTableName global constant of the YBTableName type relied on the statically initialized string constant, kGlobalTransactionsTableName, which turned out to be empty during initialization. As a result, the transaction status table could not be properly located. Changed kGlobalTransactionsTableName to be a `const char*`.

In addition, in the LTO-enable build, it became apparent that some symbols were duplicated between the gperftools library and the gutil part of YugabyteDB code ( #10956 ):
- AtomicOps_Internalx86CPUFeatures -- renamed to YbAtomicOps_Internalx86CPUFeatures
- RunningOnValgrind -- renamed to YbRunningOnValgrind
- ValgrindSlowdown -- renamed to YbValgrindSlowdown
- base::internal::SpinLockDelay, base::internal::SpinLockWake -- added a top-level yb namespace

To enable easily switching between regular and LTO binaries, we are updating yb-ctl to support YB_CTL_TSERVER_DAEMON_FILE_NAME and YB_CTL_MASTER_DAEMON_FILE_NAME environment variables. For example, by setting YB_CTL_TSERVER_DAEMON_FILE_NAME=yb-tserver-lto, you can tell yb-ctl to launch the tablet server using build/latest/bin/yb-tserver-lto. However, for the release package, the LTO-enabled yb-tserver executable will still be named yb-tserver, replacing the previous shared library based executable.

Another tooling change in this diff is how we handle the `--no-tests` flag passed to `yb_build.sh`. That flag results in setting the YB_DO_NOT_BUILD_TESTS environment variable to 1, and our CMake scripts skip all the test targets. However, it is easy to forget to keep specifying that flag. In this diff, we are storing the variable BUILD_TESTS in CMake's build cache, and reuse it during future CMake runs, without the developer having to specify `--no-tests`. It can be reset by setting YB_DO_NOT_BUILD_TESTS=0.

Test Plan:
Jenkins

```
# ./yb_build.sh --clang12 release
# build-support/tserver_lto.sh
# ldd build/latest/bin/yb-tserver-lto
	linux-vdso.so.1 (0x00007fff535bf000)
	libm.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libm.so.6 (0x00007f1b85b7d000)
	libgcc_s.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libgcc_s.so.1 (0x00007f1b85966000)
	libc.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libc.so.6 (0x00007f1b855ca000)
	/opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/ld.so => /lib64/ld-linux-x86-64.so.2 (0x00007f1b85e80000)
	libdl.so.2 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libdl.so.2 (0x00007f1b853c6000)
	libpthread.so.0 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libpthread.so.0 (0x00007f1b851a9000)
	librt.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/librt.so.1 (0x00007f1b84fa1000)
```
The yb-tserver-lto is ~326 MiB.

Microbenchmark
--------------
The test was done on a dual-socket Xeon E5-2670 machine (16 cores total, 32 hyper-threads) running AlmaLinux 8.5.
Details: https://gist.githubusercontent.com/mbautin/7f9784fb2ea4173539d2e2656cfe117f/raw
Results (CassandraKeyValue workload): 78K ops/sec with GCC 5.5, 85K ops/sec with Clang 12 without LTO, 104K ops/sec with Clang 12 with LTO.

Reviewers: sergei

Reviewed By: sergei

Subscribers: sergei, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D14616
@mbautin
Copy link
Contributor Author

mbautin commented Jan 28, 2022

Enabled for yb-tserver. Will create follow-up issues for doing more LTO for yb-master and postgres.

@mbautin mbautin closed this as completed Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant