perf(ecs): Narrowing the cache search for the ECS provider on views #6256

christosarvanitis · 2024-08-06T14:45:02Z

Attempt to address some of the issues described in spinnaker/spinnaker#6084

Improving the response times on:

/clusters
/applications
/serverGroups
Endpoints when ECS is enabled and a substantial amount of accounts/services exist in cache.

The perf issue with the Alarms still exists and will be addressed in a future PR

Adding some results from a performance test clouddriver response times:

GET {CLOUDDRIVER_URL}/applications
- Average: 104ms → 92.1ms (11% improvement)
- 95th Percentile: 130ms → 126ms
GET {CLOUDDRIVER_URL}/applications/{application_name}
- Average: 7.48s → 4.2s (43% improvement)
- 95th Percentile: 8.55s → 5.86s
GET {CLOUDDRIVER_URL}/applications/{application_name}/serverGroups
- Average: 2.72s → 2.16s (20% improvement)
- 95th Percentile: 3.17s → 3.11s
GET {CLOUDDRIVER_URL}/applications/{application_name}/clusters
- Average: 107ms → 43.3ms (59% improvement)
- 95th Percentile: 135ms → 88.4ms

christosarvanitis · 2024-08-07T07:40:13Z

@dbyron0 @deverton would appreciate your feedback on this change. There are still improvements to be made as the current implementation of ECS goes through every region per account to retrieve the necessary data from cache which is far from ideal when there are hundreds of accounts.
The main idea here is to limit the retrieval with an application name when we can.

The perf of alarms is still a problem as right now it goes through all the alarms and tries to match with a service but this will be addressed in a future PR.

christosarvanitis · 2024-09-04T13:40:04Z

@dbyron-sf @jasonmcintosh Added some results from an internal testing related to this change. Would appreciate any feedback!

...cs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/AbstractCacheClient.java

.../main/java/com/netflix/spinnaker/clouddriver/ecs/provider/view/EcsServerClusterProvider.java

...ver-ecs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/view/EcsApplicationProvider.java

jasonmcintosh · 2024-09-06T22:56:05Z

Few minor things but overall looks good.

christosarvanitis · 2024-09-11T14:53:23Z

@jasonmcintosh planning to push the Alarm caching/lookup perf improvements as well tomorrow.

christosarvanitis · 2024-09-12T14:49:15Z

...n/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/EcsCloudWatchAlarmCacheClient.java


-    Collection<EcsMetricAlarm> allMetricAlarms = getAll(accountName, region);


Before All the alarms for an ECS account/region where fetched and iterated through to match the service. This is extremely costly.

After the change the ECSCluster is added during the caching cycles to the cache key id for the ECS provider in the alarms. We retrieve the IDs with ECS account/region/EcsClusterName and then try to match the service.

christosarvanitis · 2024-09-12T14:49:37Z

...n/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/EcsCloudWatchAlarmCacheClient.java

-          metricAlarms.add(metricAlarm);
-          continue outLoop;
-        }
+      if (metricAlarm.getAlarmActions().stream().anyMatch(action -> action.contains(serviceName))


Small refactoring here to make it more readable

christosarvanitis · 2024-09-12T14:51:45Z

...va/com/netflix/spinnaker/clouddriver/ecs/provider/agent/EcsCloudMetricAlarmCachingAgent.java

@@ -118,7 +118,13 @@ Map<String, Collection<CacheData>> generateFreshData(Set<MetricAlarm> cacheableM
    Map<String, Collection<CacheData>> newDataMap = new HashMap<>();

    for (MetricAlarm metricAlarm : cacheableMetricAlarm) {
-      String key = Keys.getAlarmKey(accountName, region, metricAlarm.getAlarmArn());
+      String cluster =
+          metricAlarm.getDimensions().stream()


Based on the AWS SDK a cloudwatch alarm for the ECS contains 2 dimensions depending for the type:

Service alarm contains the dimension ECSCluster and ServiceName

Autoscaling group alarm of an ECS cluster contains the ECSCluster and the Capacity provider.

This change includes the ECSClusterName in the cached key id to make the search less costly

christosarvanitis · 2024-09-12T14:53:00Z

.../main/java/com/netflix/spinnaker/clouddriver/ecs/provider/view/EcsServerClusterProvider.java

-            .setMoniker(moniker);
-
+    EcsServerGroup serverGroup = new EcsServerGroup();
+    if (includeDetails) {


includeDetails is false only for the getSummaries. The rest of the logic remains the same

christosarvanitis · 2024-09-20T13:53:03Z

Following up with the Alarm changes the perf improvements:

The GET ServerGroups call reduced from 7+ secs on an application to 3.5secs.

And timings on a single request before the change:

And after the change:

jasonmcintosh · 2024-09-20T20:18:05Z

Overall I think this looks good ;) Would like one more set of eyes given the changes on the way ECS operates. One concern is how the change on the cache id's will be cleaned up since this changes the storage ids - but that MAY get taken care of by one of the cleanup jobs (need to confirm). Talked in slack - the ID's should still fit the max column length so adding that shouldn't impact any database stuff.

christosarvanitis · 2024-09-23T12:20:27Z

Thanks @jasonmcintosh! 🚀

but that MAY get taken care of by one of the cleanup jobs (need to confirm)

I have added a test that validates this. The previously ECS cached keys will be evicted and recached with the appended id for the alarms table.

jasonmcintosh · 2024-09-25T18:07:05Z

OK will merge shortly. I'd LIKE to get a release notes updated for this please!

jasonmcintosh reviewed Sep 6, 2024

View reviewed changes

...cs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/AbstractCacheClient.java Show resolved Hide resolved

jasonmcintosh reviewed Sep 6, 2024

View reviewed changes

.../main/java/com/netflix/spinnaker/clouddriver/ecs/provider/view/EcsServerClusterProvider.java Outdated Show resolved Hide resolved

jasonmcintosh reviewed Sep 6, 2024

View reviewed changes

...ver-ecs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/view/EcsApplicationProvider.java Outdated Show resolved Hide resolved

perf(ecs): Narrowing the cache search for the ECS provider on views

6d6ec79

christosarvanitis force-pushed the perf-ecs branch from 36aa330 to 6d6ec79 Compare September 11, 2024 08:17

christosarvanitis commented Sep 12, 2024

View reviewed changes

christosarvanitis requested a review from jasonmcintosh September 12, 2024 14:54

perf(ecs): ECS alarms to be cached/searched with EcsClusterName id

4ca4f4b

christosarvanitis force-pushed the perf-ecs branch from 3e714e2 to 4ca4f4b Compare September 20, 2024 13:04

jasonmcintosh approved these changes Sep 20, 2024

View reviewed changes

jasonmcintosh added the ready to merge Approved and ready for a merge label Sep 25, 2024

mergify bot added the auto merged Merged automatically by a bot label Sep 25, 2024

jasonmcintosh merged commit 3cdf32e into spinnaker:master Sep 25, 2024
21 of 23 checks passed

spinnakerbot added the target-release/1.36 label Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(ecs): Narrowing the cache search for the ECS provider on views #6256

perf(ecs): Narrowing the cache search for the ECS provider on views #6256

christosarvanitis commented Aug 6, 2024 •

edited

Loading

christosarvanitis commented Aug 7, 2024

christosarvanitis commented Sep 4, 2024

jasonmcintosh commented Sep 6, 2024

christosarvanitis commented Sep 11, 2024

christosarvanitis Sep 12, 2024

christosarvanitis Sep 12, 2024

christosarvanitis Sep 12, 2024

christosarvanitis Sep 12, 2024

christosarvanitis commented Sep 20, 2024

jasonmcintosh commented Sep 20, 2024

christosarvanitis commented Sep 23, 2024

jasonmcintosh commented Sep 25, 2024


		Collection<EcsMetricAlarm> allMetricAlarms = getAll(accountName, region);

perf(ecs): Narrowing the cache search for the ECS provider on views #6256

perf(ecs): Narrowing the cache search for the ECS provider on views #6256

Conversation

christosarvanitis commented Aug 6, 2024 • edited Loading

christosarvanitis commented Aug 7, 2024

christosarvanitis commented Sep 4, 2024

jasonmcintosh commented Sep 6, 2024

christosarvanitis commented Sep 11, 2024

christosarvanitis Sep 12, 2024

Choose a reason for hiding this comment

christosarvanitis Sep 12, 2024

Choose a reason for hiding this comment

christosarvanitis Sep 12, 2024

Choose a reason for hiding this comment

christosarvanitis Sep 12, 2024

Choose a reason for hiding this comment

christosarvanitis commented Sep 20, 2024

jasonmcintosh commented Sep 20, 2024

christosarvanitis commented Sep 23, 2024

jasonmcintosh commented Sep 25, 2024

christosarvanitis commented Aug 6, 2024 •

edited

Loading