Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Migrate autoscaling (⚠️ devops) #3566

Merged
merged 155 commits into from
Nov 27, 2022

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Nov 15, 2022

What do these changes do?

This PR finalizes the migration of @Surfict 's cron job script into the osparc-simcore stack.

Main current features are:

  • background task running:
    • every AUTOSCALING_POLL_INTERVAL (default 10seconds),
    • that checks for pending (insufficient resources) services if they live on nodes with NODES_MONITORING_NODE_LABELS and services with NODES_MONITORING_SERVICE_LABELS and/or using NODES_MONITORING_SERVICE_IMAGE_NAMES
    • it will create a new EC2 instance of EC2_INSTANCES_ALLOWED_TYPES,
    • that closely fits the pending task (this was the current behavior and was not changed),
    • until a maximum of EC2_INSTANCES_MAX_INSTANCES (default 10)
    • can be disabled if any of AUTOSCALING_NODES_MONITORING, AUTOSCALING_EC2_ACCESS or AUTOSCALING_EC2_INSTANCES is null
    • NOTE: Only 1 instance will be created at a time for now and it will wait until it is created before the next interval checks starts
  • tested against Moto (AWS mock library), >90% test coverage
  • upgraded libraries
  • mypy enabled

⚠️ devops changes:

  1. when autoscaling is disabled there is nothing to be done
  2. if enabled then the following must be defined for the Autoscaling
EC2_ACCESS_KEY_ID # defines the access key to EC2
EC2_SECRET_ACCESS_KEY # defines the secret key to EC2
# has default EC2_REGION_NAME="us-east-1"

EC2_INSTANCES_ALLOWED_TYPES # defines the allowed EC2 instance types (["t2.nano", ...])
EC2_INSTANCES_AMI_ID # defines the AMI ID to use
# has default EC2_INSTANCES_MAX_INSTANCES=10 maximum number of instances
EC2_INSTANCES_SECURITY_GROUP_IDS # defines the security groups for starting EC2s
EC2_INSTANCES_SUBNET_ID # defines the subnet ID to use for the EC2 instances
EC2_INSTANCES_KEY_NAME # defines the used Key name for the EC2 instances

NODES_MONITORING_NODE_LABELS # defines the labels to check for on the nodes (i.e. ["dynamicsidecar"])
NODES_MONITORING_SERVICE_LABELS # optional defines the label to check for on the monitored services
# has default NODES_MONITORING_NEW_NODES_LABELS=["io.osparc.autoscaled-node"] adds labels on newly created nodes besides the ones in NODES_MONITORING_NODE_LABELS

Next steps might be:

  • define a policy that closely fits what is needed for having a machine that can run 3 tasks if needed
  • connect with rabbitMQ to send information to UI
  • instead of polling docker engine, use docker events to check when services are started
  • allow creation of several machines in one call

Related issue/s

How to test

make devenv
source .venv/bin/activate
cd services/autoscaling
make install-dev
make test-dev

Checklist

@sanderegg sanderegg self-assigned this Nov 15, 2022
@sanderegg sanderegg added this to the Athena milestone Nov 15, 2022
@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Merging #3566 (a99ff1b) into master (3189f11) will increase coverage by 0.4%.
The diff coverage is 99.1%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #3566     +/-   ##
========================================
+ Coverage    83.6%   84.1%   +0.4%     
========================================
  Files         861     864      +3     
  Lines       36229   36401    +172     
  Branches      779     780      +1     
========================================
+ Hits        30319   30630    +311     
+ Misses       5706    5565    -141     
- Partials      204     206      +2     
Flag Coverage Δ
integrationtests 67.4% <ø> (+<0.1%) ⬆️
unittests 81.3% <99.1%> (+0.4%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ng/src/simcore_service_autoscaling/utils_docker.py 96.8% <96.7%> (+59.8%) ⬆️
...ckages/models-library/src/models_library/docker.py 100.0% <100.0%> (ø)
...models_library/generated_models/docker_rest_api.py 100.0% <100.0%> (ø)
.../service-library/src/servicelib/background_task.py 100.0% <100.0%> (ø)
...rc/simcore_service_autoscaling/core/application.py 100.0% <100.0%> (+100.0%) ⬆️
...ing/src/simcore_service_autoscaling/core/errors.py 100.0% <100.0%> (ø)
...g/src/simcore_service_autoscaling/core/settings.py 100.0% <100.0%> (+2.4%) ⬆️
...src/simcore_service_autoscaling/dynamic_scaling.py 100.0% <100.0%> (ø)
...imcore_service_autoscaling/dynamic_scaling_core.py 100.0% <100.0%> (ø)
...oscaling/src/simcore_service_autoscaling/models.py 100.0% <100.0%> (ø)
... and 16 more

@sanderegg sanderegg force-pushed the is3558/migrate_autoscaling branch 4 times, most recently from 9d651fe to e370d36 Compare November 17, 2022 21:51
@sanderegg sanderegg marked this pull request as ready for review November 22, 2022 09:21
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice and readable.
Have some questions, please see them below.

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome effort, this looks super sleek and the tight testing is mega appreciated! Sorry for the many questions and ideas in the review, this time I properly took quiet time to read the code ;) I am looking forward to your input and seeing this merged :--) Kudos!

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome effort, this looks super sleek and the tight testing is mega appreciated! Sorry for the many questions and ideas in the review, this time I properly took quiet time to read the code ;) I am looking forward to your input and seeing this merged :--) Kudos!

@sanderegg sanderegg force-pushed the is3558/migrate_autoscaling branch 2 times, most recently from 385e497 to fe3d161 Compare November 24, 2022 07:43
@sanderegg sanderegg changed the title ✨ Migrate autoscaling ✨ Migrate autoscaling (⚠️ devops) Nov 24, 2022
@mrnicegyu11
Copy link
Member

Some more stuff we caught on aws-staging yesterday during a hotfix-to-staging release

  • 2 Autoscaled nodes were present on the system --> They should have deleted themselves I guess?
  • The 2 Autoscaled nodes both mismatched their docker-engine version with the manager. We had trouble with this in the past, the docker engine versions on the nodes need to always be equal!
  • Simcore-services were running and/or failing on the autoscaled nodes (identified by their hostname with prefix ip-0), and their presence made the hotfix-to-staging release fail. I guess docker placement labels are not correctly set yet?

9nAqRqmfMy
ve0Fnhj57c

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot, approved!

@sonarcloud
Copy link

sonarcloud bot commented Nov 27, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sanderegg sanderegg merged commit 1791123 into ITISFoundation:master Nov 27, 2022
@sanderegg sanderegg deleted the is3558/migrate_autoscaling branch November 27, 2022 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants