-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve state and config changes in ec2_instance #550
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1443,24 +1443,30 @@ def get_default_subnet(ec2, vpc, availability_zone=None): | |
|
||
|
||
def ensure_instance_state(state, ec2): | ||
""" | ||
Sets return keys depending on the desired instance state | ||
""" | ||
results = dict() | ||
if state in ('running', 'started'): | ||
changed, failed, instances, failure_reason = change_instance_state(filters=module.params.get('filters'), desired_state='RUNNING', ec2=ec2) | ||
|
||
if failed: | ||
module.fail_json( | ||
msg="Unable to start instances: {0}".format(failure_reason), | ||
reboot_success=list(changed), | ||
reboot_failed=failed) | ||
|
||
module.exit_json( | ||
results = dict( | ||
msg='Instances started', | ||
reboot_success=list(changed), | ||
changed=bool(len(changed)), | ||
reboot_failed=[], | ||
instances=[pretty_instance(i) for i in instances], | ||
) | ||
elif state in ('restarted', 'rebooted'): | ||
changed, failed, instances, failure_reason = change_instance_state( | ||
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-reboot.html | ||
# The Ansible behaviour of issuing a stop/start has a minor impact on user billing | ||
# This will need to be changelogged if we ever change to ec2.reboot_instance | ||
_changed, _failed, _instances, _failure_reason = change_instance_state( | ||
filters=module.params.get('filters'), | ||
desired_state='STOPPED', | ||
ec2=ec2) | ||
|
@@ -1475,7 +1481,7 @@ def ensure_instance_state(state, ec2): | |
reboot_success=list(changed), | ||
reboot_failed=failed) | ||
|
||
module.exit_json( | ||
results = dict( | ||
msg='Instances restarted', | ||
reboot_success=list(changed), | ||
changed=bool(len(changed)), | ||
|
@@ -1487,14 +1493,13 @@ def ensure_instance_state(state, ec2): | |
filters=module.params.get('filters'), | ||
desired_state='STOPPED', | ||
ec2=ec2) | ||
|
||
if failed: | ||
module.fail_json( | ||
msg="Unable to stop instances: {0}".format(failure_reason), | ||
stop_success=list(changed), | ||
stop_failed=failed) | ||
|
||
module.exit_json( | ||
results = dict( | ||
msg='Instances stopped', | ||
stop_success=list(changed), | ||
changed=bool(len(changed)), | ||
|
@@ -1512,13 +1517,14 @@ def ensure_instance_state(state, ec2): | |
msg="Unable to terminate instances: {0}".format(failure_reason), | ||
terminate_success=list(terminated), | ||
terminate_failed=terminate_failed) | ||
module.exit_json( | ||
results = dict( | ||
msg='Instances terminated', | ||
terminate_success=list(terminated), | ||
changed=bool(len(terminated)), | ||
terminate_failed=[], | ||
instances=[pretty_instance(i) for i in instances], | ||
) | ||
return results | ||
|
||
|
||
def change_instance_state(filters, desired_state, ec2): | ||
|
@@ -1550,7 +1556,6 @@ def change_instance_state(filters, desired_state, ec2): | |
if module.check_mode: | ||
changed.add(inst['InstanceId']) | ||
continue | ||
|
||
resp = ec2.stop_instances(aws_retry=True, InstanceIds=[inst['InstanceId']]) | ||
[changed.add(i['InstanceId']) for i in resp['StoppingInstances']] | ||
if desired_state == 'RUNNING': | ||
|
@@ -1596,16 +1601,22 @@ def determine_iam_role(name_or_arn): | |
|
||
|
||
def handle_existing(existing_matches, changed, ec2, state): | ||
to_change_state = False | ||
state_results = [] | ||
alter_config_result = [] | ||
if state in ('running', 'started') and [i for i in existing_matches if i['State']['Name'] != 'running']: | ||
ins_changed, failed, instances, failure_reason = change_instance_state(filters=module.params.get('filters'), desired_state='RUNNING', ec2=ec2) | ||
if failed: | ||
module.fail_json(msg="Couldn't start instances: {0}. Failure reason: {1}".format(instances, failure_reason)) | ||
module.exit_json( | ||
changed=bool(len(ins_changed)) or changed, | ||
instances=[pretty_instance(i) for i in instances], | ||
instance_ids=[i['InstanceId'] for i in instances], | ||
) | ||
to_change_state = True | ||
elif state == 'stopped' and [i for i in existing_matches if i['State']['Name'] != 'stopped']: | ||
to_change_state = True | ||
elif state in ('restarted', 'rebooted'): | ||
to_change_state = True | ||
if to_change_state: | ||
# TODO: this breaks check_mode, because change_instance_state handles checkmode but not ensure | ||
# because ensure passes off to change... | ||
Comment on lines
+1614
to
+1615
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's scary, I hope check mode will work properly before the final version 🙏 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That one might be a lingering comment from an earlier stage of the work-in-progress actually, I'll test to confirm - thanks for catching that! |
||
state_results = ensure_instance_state(state, ec2) | ||
|
||
changes = diff_instance_and_params(existing_matches[0], module.params, ec2) | ||
|
||
for c in changes: | ||
if not module.check_mode: | ||
try: | ||
|
@@ -1616,24 +1627,21 @@ def handle_existing(existing_matches, changed, ec2, state): | |
changed |= add_or_update_instance_profile(existing_matches[0], module.params.get('instance_role'), ec2) | ||
changed |= change_network_attachments(existing_matches[0], module.params, ec2) | ||
altered = find_instances(ec2, ids=[i['InstanceId'] for i in existing_matches]) | ||
module.exit_json( | ||
alter_config_result = dict( | ||
changed=bool(len(changes)) or changed, | ||
instances=[pretty_instance(i) for i in altered], | ||
instance_ids=[i['InstanceId'] for i in altered], | ||
changes=changes, | ||
) | ||
|
||
if len(state_results): | ||
result = {**state_results, **alter_config_result} | ||
else: | ||
result = alter_config_result | ||
return result | ||
|
||
|
||
def ensure_present(existing_matches, changed, ec2, state): | ||
if len(existing_matches): | ||
try: | ||
handle_existing(existing_matches, changed, ec2, state) | ||
except (botocore.exceptions.BotoCoreError, botocore.exceptions.ClientError) as e: | ||
module.fail_json_aws( | ||
e, msg="Failed to handle existing instances {0}".format(', '.join([i['InstanceId'] for i in existing_matches])), | ||
# instances=[pretty_instance(i) for i in existing_matches], | ||
# instance_ids=[i['InstanceId'] for i in existing_matches], | ||
) | ||
try: | ||
instance_spec = build_run_instance_spec(module.params, ec2) | ||
# If check mode is enabled,suspend 'ensure function'. | ||
|
@@ -1733,6 +1741,7 @@ def main(): | |
], | ||
supports_check_mode=True | ||
) | ||
result = dict() | ||
|
||
if module.params.get('network'): | ||
if module.params.get('network').get('interfaces'): | ||
|
@@ -1748,10 +1757,6 @@ def main(): | |
# all states except shutting-down and terminated | ||
'instance-state-name': ['pending', 'running', 'stopping', 'stopped'] | ||
} | ||
if state == 'stopped': | ||
# only need to change instances that aren't already stopped | ||
filters['instance-state-name'] = ['stopping', 'pending', 'running'] | ||
|
||
if isinstance(module.params.get('instance_ids'), string_types): | ||
filters['instance-id'] = [module.params.get('instance_ids')] | ||
elif isinstance(module.params.get('instance_ids'), list) and len(module.params.get('instance_ids')): | ||
|
@@ -1799,11 +1804,15 @@ def main(): | |
tags['Name'] = name | ||
changed |= manage_tags(match, tags, module.params.get('purge_tags', False), ec2) | ||
|
||
if state in ('present', 'running', 'started'): | ||
ensure_present(existing_matches=existing_matches, changed=changed, ec2=ec2, state=state) | ||
elif state in ('restarted', 'rebooted', 'stopped', 'absent', 'terminated'): | ||
if state in ('present', 'running', 'started', 'restarted', 'rebooted', 'stopped'): | ||
# If we're not removing the instance, handle any config changes and update state | ||
if len(existing_matches): | ||
result = handle_existing(existing_matches, changed, ec2, state) | ||
else: | ||
ensure_present(existing_matches=existing_matches, changed=changed, ec2=ec2, state=state) | ||
elif state in ('absent', 'terminated'): | ||
if existing_matches: | ||
ensure_instance_state(state, ec2) | ||
result = ensure_instance_state(state, ec2) | ||
else: | ||
module.exit_json( | ||
msg='No matching instances found', | ||
|
@@ -1813,6 +1822,8 @@ def main(): | |
else: | ||
module.fail_json(msg="We don't handle the state {0}".format(state)) | ||
|
||
module.exit_json(**result) | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Test that configuration changes, like security groups and instance attributes, | ||
# are updated correctly when the instance has different states, and also when | ||
# changing the state of an instance. | ||
# https://github.com/ansible-collections/community.aws/issues/16 | ||
- block: | ||
- name: "Make instance with sg and termination protection enabled" | ||
ec2_instance: | ||
state: present | ||
name: "{{ resource_prefix }}-test-state-param-changes" | ||
image_id: "{{ ec2_ami_image }}" | ||
tags: | ||
TestId: "{{ ec2_instance_tag_TestId }}" | ||
security_groups: "{{ sg.group_id }}" | ||
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}" | ||
termination_protection: False | ||
instance_type: "{{ ec2_instance_type }}" | ||
wait: True | ||
register: create_result | ||
|
||
- assert: | ||
that: | ||
- create_result is not failed | ||
- create_result.changed | ||
- '"instances" in create_result' | ||
- '"instance_ids" in create_result' | ||
- '"spec" in create_result' | ||
- create_result.instances[0].security_groups[0].group_id == "{{ sg.group_id }}" | ||
- create_result.spec.DisableApiTermination == False | ||
|
||
- name: "Change sg and termination protection while instance is in state running" | ||
ec2_instance: | ||
state: running | ||
name: "{{ resource_prefix }}-test-state-param-changes" | ||
image_id: "{{ ec2_ami_image }}" | ||
tags: | ||
TestId: "{{ ec2_instance_tag_TestId }}" | ||
security_groups: "{{ sg2.group_id }}" | ||
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}" | ||
termination_protection: True | ||
instance_type: "{{ ec2_instance_type }}" | ||
register: change_params_result | ||
|
||
- assert: | ||
that: | ||
- change_params_result is not failed | ||
- change_params_result.changed | ||
- '"instances" in change_params_result' | ||
- '"instance_ids" in change_params_result' | ||
- '"changes" in change_params_result' | ||
- change_params_result.instances[0].security_groups[0].group_id == "{{ sg2.group_id }}" | ||
- change_params_result.changes[0].DisableApiTermination.Value == True | ||
- change_params_result.changes[1].Groups[0] == "{{ sg2.group_id }}" # TODO fix this to be less fragile | ||
|
||
|
||
- name: "Change instance state from running to stopped, and change sg and termination protection" | ||
ec2_instance: | ||
state: stopped | ||
name: "{{ resource_prefix }}-test-state-param-changes" | ||
image_id: "{{ ec2_ami_image }}" | ||
tags: | ||
TestId: "{{ ec2_instance_tag_TestId }}" | ||
security_groups: "{{ sg.group_id }}" | ||
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}" | ||
termination_protection: False | ||
instance_type: "{{ ec2_instance_type }}" | ||
register: change_state_params_result | ||
|
||
- assert: | ||
that: | ||
- change_state_params_result is not failed | ||
- change_state_params_result.changed | ||
- '"instances" in change_state_params_result' | ||
- '"instance_ids" in change_state_params_result' | ||
- '"changes" in change_state_params_result' | ||
- '"stop_success" in change_state_params_result' | ||
- '"stop_failed" in change_state_params_result' | ||
- change_state_params_result.instances[0].security_groups[0].group_id == "{{ sg.group_id }}" | ||
- change_state_params_result.changes[0].DisableApiTermination.Value == False | ||
|
||
- name: "Change sg and termination protection while instance is in state stopped" | ||
ec2_instance: | ||
state: stopped | ||
name: "{{ resource_prefix }}-test-state-param-changes" | ||
image_id: "{{ ec2_ami_image }}" | ||
tags: | ||
TestId: "{{ ec2_instance_tag_TestId }}" | ||
security_groups: "{{ sg2.group_id }}" | ||
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}" | ||
termination_protection: True | ||
instance_type: "{{ ec2_instance_type }}" | ||
register: change_params_stopped_result | ||
|
||
- assert: | ||
that: | ||
- change_params_stopped_result is not failed | ||
- change_params_stopped_result.changed | ||
- '"instances" in change_params_stopped_result' | ||
- '"instance_ids" in change_params_stopped_result' | ||
- '"changes" in change_params_stopped_result' | ||
- change_params_stopped_result.instances[0].security_groups[0].group_id == "{{ sg2.group_id }}" | ||
- change_params_stopped_result.changes[0].DisableApiTermination.Value == True | ||
|
||
- name: "Change instance state from stopped to running, and change sg and termination protection" | ||
ec2_instance: | ||
state: running | ||
name: "{{ resource_prefix }}-test-state-param-changes" | ||
image_id: "{{ ec2_ami_image }}" | ||
tags: | ||
TestId: "{{ ec2_instance_tag_TestId }}" | ||
security_groups: "{{ sg.group_id }}" | ||
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}" | ||
termination_protection: False | ||
instance_type: "{{ ec2_instance_type }}" | ||
register: change_params_start_result | ||
|
||
- assert: | ||
that: | ||
- change_params_start_result is not failed | ||
- change_params_start_result.changed | ||
- '"instances" in change_params_start_result' | ||
- '"instance_ids" in change_params_start_result' | ||
- '"changes" in change_params_start_result' | ||
- '"reboot_success" in change_params_start_result' | ||
- '"reboot_failed" in change_params_start_result' | ||
- change_params_start_result.instances[0].security_groups[0].group_id == "{{ sg.group_id }}" | ||
- change_params_start_result.changes[0].DisableApiTermination.Value == False | ||
|
||
always: | ||
|
||
- name: Set termination protection to false (so we can terminate instance) (cleanup) | ||
ec2_instance: | ||
filters: | ||
tag:TestId: "{{ ec2_instance_tag_TestId }}" | ||
termination_protection: False | ||
ignore_errors: yes | ||
|
||
- name: Terminate instance | ||
ec2_instance: | ||
filters: | ||
tag:TestId: "{{ ec2_instance_tag_TestId }}" | ||
state: absent | ||
wait: False | ||
ignore_errors: yes | ||
|
||
- include_tasks: env_cleanup.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be interesting if this behavior changed because stopping and starting an instance gives new underlying hardware, but a reboot call doesn't. That's an important distinction when there are hardware issues on the instance (if you run long running instances like I do, this is quite a common occurrence).
I hadn't realized that this is what the module did so in playbooks I have for recovering, I do actually call the module twice, once to stop it and once to start. I wonder if anyone has settled on using using the
restarted
/rebooted
statuses in Ansible to recover, only to find that it won't work anymore if this changes. Probably a niche thing...But it does make me think.. a dedicated integrated "
stop
thenstart
" mode (whatrestarted
actually does now) would be useful.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut inclination was to change this to actually call `reboot' since it's not really doing what it advertises right now, and we'll have a narrow window before 2.0 where we can make non-backwards-compatible changes. But I don't want to overload this PR with too many changes.
Maybe @tremble can weigh in if it would be useful to make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, if it advertises reboot and a reboot call is available, that's what should be called. I'm only mentioning a case (albeit one that might be rare) where that change could be breaking. And that another option that explicitly does what the current behavior is might have uses. The new option value should be introduced before changing the behavior imo so that there's an overlap where people could make forward-compatible changes if they wanted to keep that behavior.
But I dunno maybe that's all overthinking.
(I also agree it's not really related to this PR)