perf: Add workflow template informer to server #13672

jakkubu · 2024-09-27T13:46:15Z

Motivation

Improve performance of creating workflows with complex templateRef structure.

During template validation k8s API is called for each templateRef. For complex workflows with many refs it creates huge overhead. Let's cache such templates

Connected to issue #7418

This is a follow up PR #13633

Modifications

Added informer to the server and use it in workflow validation

Verification

I run the tests similar to the ones in 1st PR. The results are awesome - benchmarking results and details in separate comment

pkg/apiclient/argo-kube-client.go

jakkubu · 2024-10-10T07:36:06Z

Benchmarking multiple-ref template creation

Setup

Branches:

Main (commit 5244064)
Rebased to above commit and changes from PR: perf: Add template validation caching #13633 (commit 9df7abf)

Using fresh kind cluster v1.28.9

Argo server started with server --auth-mode=server --auth-mode=client --kube-api-burst=200 --kube-api-qps=200

Benchmark workflow templates are placed in test/benchmarks/*.yaml.

Before each tests following procedure were followed:

Delete all workflow
Wait for all workflows pods to be removed
Restart controller and server

Benchmarking tool: hey. It runs command in parallel by default 200 times using 50 workers. Those values can be modified using:

-n: number of requests
-c: number of workers

Typical call is described in test/benchmarks/README.md.

Results

Requests	Workers	Template	No cache ART [s]	Manual cache ART [s]	Informer ART [s]
200	50	20-echos	deadline exceeded	9.1370	0.0833
50	2	20-echos	4.3682	0.3974	0.0119
16	8	20-echos	18.0247	1.0127	0.0290
50	1	20-echos	2.3204	0.2095	0.0273
200	50	echo-1	11.7437	4.3005	0.0362

*ART - Average Response Time

Appendix

Manual Caching hey output

hey \
    -n 200 -c 50 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	38.5292 secs
  Slowest:	10.0132 secs
  Fastest:	0.2812 secs
  Average:	9.1370 secs
  Requests/sec:	5.1909


Response time histogram:
  0.281 [1]	|
  1.254 [0]	|
  2.228 [2]	|■
  3.201 [3]	|■
  4.174 [3]	|■
  5.147 [1]	|
  6.120 [0]	|
  7.094 [6]	|■■
  8.067 [8]	|■■
  9.040 [26]	|■■■■■■■
  10.013 [150]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 7.8264 secs
  25% in 9.0510 secs
  50% in 9.9969 secs
  75% in 9.9999 secs
  90% in 10.0010 secs
  95% in 10.0017 secs
  99% in 10.0090 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0086 secs, 0.2812 secs, 10.0132 secs
  DNS-lookup:	0.0008 secs, 0.0003 secs, 0.0028 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0009 secs
  resp wait:	9.1282 secs, 0.2608 secs, 10.0097 secs
  resp read:	0.0001 secs, 0.0001 secs, 0.0006 secs

Status code distribution:
  [200]	200 responses

hey \
    -n 50 -c 1 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	10.4736 secs
  Slowest:	2.8200 secs
  Fastest:	0.0116 secs
  Average:	0.2095 secs
  Requests/sec:	4.7739


Response time histogram:
  0.012 [1]	|■
  0.292 [43]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.573 [2]	|■■
  0.854 [2]	|■■
  1.135 [1]	|■
  1.416 [0]	|
  1.697 [0]	|
  1.977 [0]	|
  2.258 [0]	|
  2.539 [0]	|
  2.820 [1]	|■


Latency distribution:
  10% in 0.0215 secs
  25% in 0.0398 secs
  50% in 0.0904 secs
  75% in 0.2085 secs
  90% in 0.4544 secs
  95% in 0.9240 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0150 secs, 0.0116 secs, 2.8200 secs
  DNS-lookup:	0.0004 secs, 0.0003 secs, 0.0022 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:	0.1941 secs, 0.0084 secs, 2.7740 secs
  resp read:	0.0003 secs, 0.0001 secs, 0.0037 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 16 -c 8 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	2.0559 secs
  Slowest:	1.9955 secs
  Fastest:	0.0570 secs
  Average:	1.0127 secs
  Requests/sec:	7.7826


Response time histogram:
  0.057 [1]	|■■■■■
  0.251 [7]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.445 [0]	|
  0.639 [0]	|
  0.832 [0]	|
  1.026 [0]	|
  1.220 [0]	|
  1.414 [0]	|
  1.608 [0]	|
  1.802 [0]	|
  1.996 [8]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 0.0593 secs
  25% in 0.0601 secs
  50% in 1.9162 secs
  75% in 1.9707 secs
  90% in 1.9955 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0140 secs, 0.0570 secs, 1.9955 secs
  DNS-lookup:	0.0017 secs, 0.0001 secs, 0.0032 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:	0.9971 secs, 0.0330 secs, 1.9879 secs
  resp read:	0.0004 secs, 0.0000 secs, 0.0013 secs

Status code distribution:
  [200]	16 responses

 hey \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "echo-1"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	18.5190 secs
  Slowest:	5.0074 secs
  Fastest:	0.0267 secs
  Average:	4.3005 secs
  Requests/sec:	10.7997

  Total data:	200600 bytes
  Size/request:	1003 bytes

Response time histogram:
  0.027 [1]	|
  0.525 [2]	|■
  1.023 [0]	|
  1.521 [8]	|■■
  2.019 [9]	|■■
  2.517 [7]	|■■
  3.015 [9]	|■■
  3.513 [10]	|■■■
  4.011 [3]	|■
  4.509 [2]	|■
  5.007 [149]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 2.0682 secs
  25% in 4.3698 secs
  50% in 4.9992 secs
  75% in 5.0003 secs
  90% in 5.0012 secs
  95% in 5.0018 secs
  99% in 5.0058 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0076 secs, 0.0267 secs, 5.0074 secs
  DNS-lookup:	0.0013 secs, 0.0002 secs, 0.0070 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0007 secs
  resp wait:	4.2928 secs, 0.0135 secs, 5.0043 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200]	200 responses

hey \
    -n 50 -c 2 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	9.9623 secs
  Slowest:	2.1692 secs
  Fastest:	0.0177 secs
  Average:	0.3974 secs
  Requests/sec:	5.0189


Response time histogram:
  0.018 [1]	|■■
  0.233 [21]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.448 [19]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.663 [0]	|
  0.878 [4]	|■■■■■■■■
  1.093 [1]	|■■
  1.309 [0]	|
  1.524 [1]	|■■
  1.739 [0]	|
  1.954 [2]	|■■■■
  2.169 [1]	|■■


Latency distribution:
  10% in 0.0415 secs
  25% in 0.0680 secs
  50% in 0.3625 secs
  75% in 0.4159 secs
  90% in 1.0482 secs
  95% in 1.8283 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0151 secs, 0.0177 secs, 2.1692 secs
  DNS-lookup:	0.0005 secs, 0.0001 secs, 0.0015 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:	0.3813 secs, 0.0131 secs, 2.1538 secs
  resp read:	0.0009 secs, 0.0001 secs, 0.0175 secs

Status code distribution:
  [200]	50 responses

Caching OFF hey output

hey \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Error distribution:
  [1]	Post "https://localhost:2746/api/v1/workflows/argo-test": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

hey \
    -n 50 -c 1 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	116.0186 secs
  Slowest:	3.0743 secs
  Fastest:	0.8805 secs
  Average:	2.3204 secs
  Requests/sec:	0.4310


Response time histogram:
  0.881 [1]	|■
  1.100 [0]	|
  1.319 [1]	|■
  1.539 [0]	|
  1.758 [0]	|
  1.977 [0]	|
  2.197 [0]	|
  2.416 [46]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  2.636 [0]	|
  2.855 [1]	|■
  3.074 [1]	|■


Latency distribution:
  10% in 2.3432 secs
  25% in 2.3488 secs
  50% in 2.3497 secs
  75% in 2.3521 secs
  90% in 2.3539 secs
  95% in 2.8334 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0046 secs, 0.8805 secs, 3.0743 secs
  DNS-lookup:	0.0005 secs, 0.0003 secs, 0.0014 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0001 secs
  resp wait:	2.3155 secs, 0.8690 secs, 3.0706 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0007 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 10 -c 2 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	22.0162 secs
  Slowest:	5.2764 secs
  Fastest:	3.1682 secs
  Average:	4.3682 secs
  Requests/sec:	0.4542


Response time histogram:
  3.168 [1]	|■■■■■■■■■■■■■
  3.379 [1]	|■■■■■■■■■■■■■
  3.590 [0]	|
  3.801 [0]	|
  4.011 [0]	|
  4.222 [2]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■
  4.433 [1]	|■■■■■■■■■■■■■
  4.644 [1]	|■■■■■■■■■■■■■
  4.855 [1]	|■■■■■■■■■■■■■
  5.066 [0]	|
  5.276 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 3.2170 secs
  25% in 4.1037 secs
  50% in 4.4375 secs
  75% in 5.2165 secs
  90% in 5.2764 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0101 secs, 3.1682 secs, 5.2764 secs
  DNS-lookup:	0.0010 secs, 0.0004 secs, 0.0025 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0001 secs
  resp wait:	4.3578 secs, 3.1597 secs, 5.2671 secs
  resp read:	0.0002 secs, 0.0001 secs, 0.0006 secs

Status code distribution:
  [200]	10 responses

 hey \
    -n 16 -c 8 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	36.8523 secs
  Slowest:	19.6692 secs
  Fastest:	16.5928 secs
  Average:	18.0247 secs
  Requests/sec:	0.4342


Response time histogram:
  16.593 [1]	|■■■■■■■■■■
  16.900 [0]	|
  17.208 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  17.516 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  17.823 [0]	|
  18.131 [0]	|
  18.439 [0]	|
  18.746 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  19.054 [2]	|■■■■■■■■■■■■■■■■■■■■
  19.362 [0]	|
  19.669 [2]	|■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 17.1318 secs
  25% in 17.2294 secs
  50% in 18.4704 secs
  75% in 18.7662 secs
  90% in 19.6692 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0123 secs, 16.5928 secs, 19.6692 secs
  DNS-lookup:	0.0013 secs, 0.0003 secs, 0.0021 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0002 secs
  resp wait:	18.0119 secs, 16.5720 secs, 19.6658 secs
  resp read:	0.0002 secs, 0.0001 secs, 0.0008 secs

Status code distribution:
  [200]	16 responses

hey \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "echo-1"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	48.5225 secs
  Slowest:	12.7513 secs
  Fastest:	6.5247 secs
  Average:	11.7437 secs
  Requests/sec:	4.1218

  Total data:	200600 bytes
  Size/request:	1003 bytes

Response time histogram:
  6.525 [1]	|
  7.147 [5]	|■■
  7.770 [2]	|■
  8.393 [1]	|
  9.015 [0]	|
  9.638 [7]	|■■
  10.261 [9]	|■■■
  10.883 [13]	|■■■■
  11.506 [12]	|■■■■
  12.129 [33]	|■■■■■■■■■■■
  12.751 [117]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 9.9773 secs
  25% in 12.0355 secs
  50% in 12.4015 secs
  75% in 12.4993 secs
  90% in 12.5020 secs
  95% in 12.5169 secs
  99% in 12.6785 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0083 secs, 6.5247 secs, 12.7513 secs
  DNS-lookup:	0.0008 secs, 0.0002 secs, 0.0030 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0013 secs
  resp wait:	11.7352 secs, 6.5076 secs, 12.7478 secs
  resp read:	0.0001 secs, 0.0001 secs, 0.0020 secs

Status code distribution:
  [200]	200 responses

Informer Hey outputs

hey \
    -n 50 -c 1 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	1.3652 secs
  Slowest:	0.0959 secs
  Fastest:	0.0093 secs
  Average:	0.0273 secs
  Requests/sec:	36.6243


Response time histogram:
  0.009 [1]	|■■
  0.018 [11]	|■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.027 [17]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.035 [8]	|■■■■■■■■■■■■■■■■■■■
  0.044 [10]	|■■■■■■■■■■■■■■■■■■■■■■■■
  0.053 [2]	|■■■■■
  0.061 [0]	|
  0.070 [0]	|
  0.079 [0]	|
  0.087 [0]	|
  0.096 [1]	|■■


Latency distribution:
  10% in 0.0111 secs
  25% in 0.0185 secs
  50% in 0.0249 secs
  75% in 0.0377 secs
  90% in 0.0408 secs
  95% in 0.0524 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0096 secs, 0.0093 secs, 0.0959 secs
  DNS-lookup:	0.0005 secs, 0.0003 secs, 0.0013 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0001 secs
  resp wait:	0.0174 secs, 0.0060 secs, 0.0581 secs
  resp read:	0.0002 secs, 0.0001 secs, 0.0036 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 16 -c 8 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.0608 secs
  Slowest:	0.0422 secs
  Fastest:	0.0159 secs
  Average:	0.0290 secs
  Requests/sec:	263.1996


Response time histogram:
  0.016 [1]	|■■■■■■■■
  0.019 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■
  0.021 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■
  0.024 [1]	|■■■■■■■■
  0.026 [0]	|
  0.029 [0]	|
  0.032 [0]	|
  0.034 [0]	|
  0.037 [0]	|
  0.040 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.042 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 0.0163 secs
  25% in 0.0193 secs
  50% in 0.0389 secs
  75% in 0.0393 secs
  90% in 0.0422 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0091 secs, 0.0159 secs, 0.0422 secs
  DNS-lookup:	0.0012 secs, 0.0002 secs, 0.0020 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0005 secs
  resp wait:	0.0190 secs, 0.0098 secs, 0.0280 secs
  resp read:	0.0002 secs, 0.0000 secs, 0.0010 secs

Status code distribution:
  [200]	16 responses

hey \
    -n 50 -c 2 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.3063 secs
  Slowest:	0.0279 secs
  Fastest:	0.0083 secs
  Average:	0.0119 secs
  Requests/sec:	163.2477


Response time histogram:
  0.008 [1]	|■■
  0.010 [21]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.012 [17]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.014 [3]	|■■■■■■
  0.016 [2]	|■■■■
  0.018 [1]	|■■
  0.020 [3]	|■■■■■■
  0.022 [0]	|
  0.024 [0]	|
  0.026 [0]	|
  0.028 [2]	|■■■■


Latency distribution:
  10% in 0.0089 secs
  25% in 0.0096 secs
  50% in 0.0104 secs
  75% in 0.0122 secs
  90% in 0.0185 secs
  95% in 0.0279 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0043 secs, 0.0083 secs, 0.0279 secs
  DNS-lookup:	0.0004 secs, 0.0000 secs, 0.0021 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0001 secs
  resp wait:	0.0075 secs, 0.0054 secs, 0.0174 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0002 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 200 -c 50 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.3854 secs
  Slowest:	0.1707 secs
  Fastest:	0.0137 secs
  Average:	0.0833 secs
  Requests/sec:	518.9769


Response time histogram:
  0.014 [1]	|■
  0.029 [1]	|■
  0.045 [13]	|■■■■■■■■■■
  0.061 [21]	|■■■■■■■■■■■■■■■■
  0.076 [49]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.092 [54]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.108 [27]	|■■■■■■■■■■■■■■■■■■■■
  0.124 [14]	|■■■■■■■■■■
  0.139 [13]	|■■■■■■■■■■
  0.155 [6]	|■■■■
  0.171 [1]	|■


Latency distribution:
  10% in 0.0480 secs
  25% in 0.0686 secs
  50% in 0.0810 secs
  75% in 0.0956 secs
  90% in 0.1353 secs
  95% in 0.1381 secs
  99% in 0.1471 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0235 secs, 0.0137 secs, 0.1707 secs
  DNS-lookup:	0.0009 secs, 0.0000 secs, 0.0037 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0010 secs
  resp wait:	0.0592 secs, 0.0056 secs, 0.1481 secs
  resp read:	0.0005 secs, 0.0000 secs, 0.0089 secs

Status code distribution:
  [200]	200 responses

hey \
    -n 200 -c 50 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "echo-1"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.1556 secs
  Slowest:	0.0568 secs
  Fastest:	0.0171 secs
  Average:	0.0362 secs
  Requests/sec:	1285.0704

  Total data:	230400 bytes
  Size/request:	1152 bytes

Response time histogram:
  0.017 [1]	|■
  0.021 [2]	|■■
  0.025 [7]	|■■■■■■
  0.029 [29]	|■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.033 [44]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.037 [32]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.041 [33]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.045 [22]	|■■■■■■■■■■■■■■■■■■■■
  0.049 [13]	|■■■■■■■■■■■■
  0.053 [9]	|■■■■■■■■
  0.057 [8]	|■■■■■■■


Latency distribution:
  10% in 0.0277 secs
  25% in 0.0299 secs
  50% in 0.0356 secs
  75% in 0.0418 secs
  90% in 0.0480 secs
  95% in 0.0524 secs
  99% in 0.0566 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0184 secs, 0.0171 secs, 0.0568 secs
  DNS-lookup:	0.0008 secs, 0.0000 secs, 0.0024 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0011 secs
  resp wait:	0.0175 secs, 0.0063 secs, 0.0279 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0006 secs

Status code distribution:
  [200]	200 responses

Joibel · 2024-10-14T14:41:51Z

pkg/apiclient/argo-kube-client.go

+}
+
+func (a *argoKubeClient) startStores(restConfig *restclient.Config, namespace string) error {
+	if a.opts.UseCaching {


UseCaching appears to be always false

This was the intention - not to introduce breaking change. In the same time my team is using argoKubeClient in code and we would like to enable caching here. The code that depends on this is tested - it's basically server code.

I'm not sure why you consider the caching version a breaking change? What does it break?

This PR is marked as a performance improvement, but doesn't improve the performance of the product, only of your usage of it as a go-client? Why wouldn't everyone want this enabled? It uses more memory...

The problem I'm facing is that there is little testing happening in pkg/apiclient.
I could expose this option in CLI to run e2e, to make it more testable. However I don't think this option make sense in CLI. Informer would simply make startup time longer - in very specific conditions this could make some difference. Even in such case you could simply connect to server that has caching enabled by default, instead of using k8s connection.

Joibel · 2024-10-15T13:48:06Z

pkg/apiclient/argo-kube-client.go

@@ -37,14 +37,34 @@ var (
 	NoArgoServerErr               = fmt.Errorf("this is impossible if you are not using the Argo Server, see %s", help.CLI())
 )

+type ArgoKubeOpts struct {


This struct is never used as initialised in this code, nor are there any tests for UseCaching = true.

I believe this might be "for the future" but please could it not be included in this PR and saved for a future one until it's tested and used.

It's for the code that uses argo api client in go code, so we would be able to turn caching on - it will make a huge difference for us.

Joibel · 2024-10-15T13:56:28Z

test/benchmarks/README.md

@@ -0,0 +1,51 @@
+# Benchmarks


This is not too far off something that already happens in an e2e test. I'd like to see a new github action to test this.

That framework has the ability to time tests (and already does to show us slow tests), and this is somewhat akin to the CLI test suite in that you're wanting to invoke command line actions (albeit not the argocli) and monitor the result.

You can try all this out in github in your own fork and see whether a CI test is feasible. If you take the necessary github action file and add workflow_dispatch as a trigger event you can then also manually trigger it if you want to try the same thing over and over (as might be prudent here).

I see. I added it here as @agilgur5 suggested, but I didn't have time to automate tests. Do you think it's better to remove them or keep them here?

I was hoping that you'd consider automating the tests.

server/workflowtemplate/informer.go

server/workflowtemplate/wf_client_store.go

During template validation k8s API is called for each templateRef. For complex workflows with many refs it creates huge overhead. Let's use informer for getting templates and use old mechanism as fallback Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

…late server Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

Remove Lister() method (as informer don't support full k8s list options) Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

fix not starting clusterWftmpl Informer in server add more descriptive client store naming Signed-off-by: Jakub Buczak <[email protected]>

Pass created client stores in tests Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

jakkubu commented Sep 27, 2024

View reviewed changes

pkg/apiclient/argo-kube-client.go Outdated Show resolved Hide resolved

jakkubu changed the title ~~Add workflow template informer to server~~ perf: Add workflow template informer to server Sep 27, 2024

blkperl added the area/server label Oct 2, 2024

jakkubu force-pushed the add-server-informer branch 3 times, most recently from 9dac1ce to 3d90e33 Compare October 9, 2024 08:07

agilgur5 mentioned this pull request Oct 9, 2024

perf: Add template validation caching #13633

Closed

jakkubu force-pushed the add-server-informer branch from fbf13ba to 7221f93 Compare October 10, 2024 07:24

jakkubu force-pushed the add-server-informer branch 5 times, most recently from 653023b to f1f89a9 Compare October 11, 2024 10:07

jakkubu marked this pull request as ready for review October 11, 2024 10:50

jakkubu force-pushed the add-server-informer branch from f1f89a9 to 2659d1a Compare October 14, 2024 13:32

Joibel requested changes Oct 15, 2024

View reviewed changes

jakkubu added 12 commits October 17, 2024 11:56

perf: Add workflow template informer to workflow template server

3906015

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add workflow template informer to cron workflow server

cf08531

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer

a0633d1

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer to workflow template server

37c710f

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer to cron workflow server

e3d99cd

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer to cluster workflow temp…

0606c37

…late server Signed-off-by: Jakub Buczak <[email protected]>

perf: Add (Custer)WorkflowTemplateStore implementation using wfClient

41e5cc2

Signed-off-by: Jakub Buczak <[email protected]>

perf: Use template store for all viable get requests

5344a4c

Remove Lister() method (as informer don't support full k8s list options) Signed-off-by: Jakub Buczak <[email protected]>

perf: Add benchmarks workflows + instructions

dd878a9

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add kube-client-opts for enabling caching

31e05d8

fix not starting clusterWftmpl Informer in server add more descriptive client store naming Signed-off-by: Jakub Buczak <[email protected]>

perf: Remove default template store implementation

21892f1

Pass created client stores in tests Signed-off-by: Jakub Buczak <[email protected]>

jakkubu force-pushed the add-server-informer branch from 2659d1a to 21892f1 Compare October 17, 2024 11:12

fix: remove leftover comments and blank lines

12b9b94

Signed-off-by: Jakub Buczak <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add workflow template informer to server #13672

perf: Add workflow template informer to server #13672

jakkubu commented Sep 27, 2024 •

edited

Loading

jakkubu commented Oct 10, 2024

Manual Caching hey output

Caching OFF hey output

Informer Hey outputs

Joibel Oct 14, 2024

jakkubu Oct 17, 2024 •

edited

Loading

Joibel Oct 17, 2024

jakkubu Oct 17, 2024

Joibel Oct 15, 2024

jakkubu Oct 17, 2024

Joibel Oct 15, 2024

jakkubu Oct 17, 2024

Joibel Oct 17, 2024

perf: Add workflow template informer to server #13672

Are you sure you want to change the base?

perf: Add workflow template informer to server #13672

Conversation

jakkubu commented Sep 27, 2024 • edited Loading

Motivation

Modifications

Verification

jakkubu commented Oct 10, 2024

Benchmarking multiple-ref template creation

Setup

Results

Appendix

Manual Caching hey output

Caching OFF hey output

Informer Hey outputs

Choose a reason for hiding this comment

jakkubu Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakkubu commented Sep 27, 2024 •

edited

Loading

jakkubu Oct 17, 2024 •

edited

Loading