-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(ecs): Narrowing the cache search for the ECS provider on views #6256
Conversation
@dbyron0 @deverton would appreciate your feedback on this change. There are still improvements to be made as the current implementation of ECS goes through every region per account to retrieve the necessary data from cache which is far from ideal when there are hundreds of accounts. The perf of alarms is still a problem as right now it goes through all the alarms and tries to match with a service but this will be addressed in a future PR. |
@dbyron-sf @jasonmcintosh Added some results from an internal testing related to this change. Would appreciate any feedback! |
...cs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/AbstractCacheClient.java
Show resolved
Hide resolved
.../main/java/com/netflix/spinnaker/clouddriver/ecs/provider/view/EcsServerClusterProvider.java
Outdated
Show resolved
Hide resolved
...ver-ecs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/view/EcsApplicationProvider.java
Outdated
Show resolved
Hide resolved
Few minor things but overall looks good. |
36aa330
to
6d6ec79
Compare
@jasonmcintosh planning to push the Alarm caching/lookup perf improvements as well tomorrow. |
|
||
Collection<EcsMetricAlarm> allMetricAlarms = getAll(accountName, region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before All the alarms for an ECS account/region where fetched and iterated through to match the service. This is extremely costly.
After the change the ECSCluster is added during the caching cycles to the cache key id for the ECS provider in the alarms. We retrieve the IDs with ECS account/region/EcsClusterName and then try to match the service.
metricAlarms.add(metricAlarm); | ||
continue outLoop; | ||
} | ||
if (metricAlarm.getAlarmActions().stream().anyMatch(action -> action.contains(serviceName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small refactoring here to make it more readable
@@ -118,7 +118,13 @@ Map<String, Collection<CacheData>> generateFreshData(Set<MetricAlarm> cacheableM | |||
Map<String, Collection<CacheData>> newDataMap = new HashMap<>(); | |||
|
|||
for (MetricAlarm metricAlarm : cacheableMetricAlarm) { | |||
String key = Keys.getAlarmKey(accountName, region, metricAlarm.getAlarmArn()); | |||
String cluster = | |||
metricAlarm.getDimensions().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the AWS SDK a cloudwatch alarm for the ECS contains 2 dimensions depending for the type:
- Service alarm contains the dimension ECSCluster and ServiceName
- Autoscaling group alarm of an ECS cluster contains the ECSCluster and the Capacity provider.
This change includes the ECSClusterName in the cached key id to make the search less costly
.setMoniker(moniker); | ||
|
||
EcsServerGroup serverGroup = new EcsServerGroup(); | ||
if (includeDetails) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
includeDetails is false only for the getSummaries. The rest of the logic remains the same
3e714e2
to
4ca4f4b
Compare
Overall I think this looks good ;) Would like one more set of eyes given the changes on the way ECS operates. One concern is how the change on the cache id's will be cleaned up since this changes the storage ids - but that MAY get taken care of by one of the cleanup jobs (need to confirm). Talked in slack - the ID's should still fit the max column length so adding that shouldn't impact any database stuff. |
Thanks @jasonmcintosh! 🚀
I have added a test that validates this. The previously ECS cached keys will be evicted and recached with the appended id for the alarms table. |
OK will merge shortly. I'd LIKE to get a release notes updated for this please! |
Attempt to address some of the issues described in spinnaker/spinnaker#6084
Improving the response times on:
Endpoints when ECS is enabled and a substantial amount of accounts/services exist in cache.
The perf issue with the Alarms still exists and will be addressed in a future PR
Adding some results from a performance test clouddriver response times: