Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve state and config changes in ec2_instance #550

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 45 additions & 34 deletions plugins/modules/ec2_instance.py
Original file line number Diff line number Diff line change
Expand Up @@ -1443,24 +1443,30 @@ def get_default_subnet(ec2, vpc, availability_zone=None):


def ensure_instance_state(state, ec2):
"""
Sets return keys depending on the desired instance state
"""
results = dict()
if state in ('running', 'started'):
changed, failed, instances, failure_reason = change_instance_state(filters=module.params.get('filters'), desired_state='RUNNING', ec2=ec2)

if failed:
module.fail_json(
msg="Unable to start instances: {0}".format(failure_reason),
reboot_success=list(changed),
reboot_failed=failed)

module.exit_json(
results = dict(
msg='Instances started',
reboot_success=list(changed),
changed=bool(len(changed)),
reboot_failed=[],
instances=[pretty_instance(i) for i in instances],
)
elif state in ('restarted', 'rebooted'):
changed, failed, instances, failure_reason = change_instance_state(
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-reboot.html
# The Ansible behaviour of issuing a stop/start has a minor impact on user billing
# This will need to be changelogged if we ever change to ec2.reboot_instance
_changed, _failed, _instances, _failure_reason = change_instance_state(
Comment on lines +1466 to +1469
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be interesting if this behavior changed because stopping and starting an instance gives new underlying hardware, but a reboot call doesn't. That's an important distinction when there are hardware issues on the instance (if you run long running instances like I do, this is quite a common occurrence).

I hadn't realized that this is what the module did so in playbooks I have for recovering, I do actually call the module twice, once to stop it and once to start. I wonder if anyone has settled on using using the restarted/rebooted statuses in Ansible to recover, only to find that it won't work anymore if this changes. Probably a niche thing...

But it does make me think.. a dedicated integrated "stop then start" mode (what restarted actually does now) would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut inclination was to change this to actually call `reboot' since it's not really doing what it advertises right now, and we'll have a narrow window before 2.0 where we can make non-backwards-compatible changes. But I don't want to overload this PR with too many changes.

Maybe @tremble can weigh in if it would be useful to make this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, if it advertises reboot and a reboot call is available, that's what should be called. I'm only mentioning a case (albeit one that might be rare) where that change could be breaking. And that another option that explicitly does what the current behavior is might have uses. The new option value should be introduced before changing the behavior imo so that there's an overlap where people could make forward-compatible changes if they wanted to keep that behavior.

But I dunno maybe that's all overthinking.

(I also agree it's not really related to this PR)

filters=module.params.get('filters'),
desired_state='STOPPED',
ec2=ec2)
Expand All @@ -1475,7 +1481,7 @@ def ensure_instance_state(state, ec2):
reboot_success=list(changed),
reboot_failed=failed)

module.exit_json(
results = dict(
msg='Instances restarted',
reboot_success=list(changed),
changed=bool(len(changed)),
Expand All @@ -1487,14 +1493,13 @@ def ensure_instance_state(state, ec2):
filters=module.params.get('filters'),
desired_state='STOPPED',
ec2=ec2)

if failed:
module.fail_json(
msg="Unable to stop instances: {0}".format(failure_reason),
stop_success=list(changed),
stop_failed=failed)

module.exit_json(
results = dict(
msg='Instances stopped',
stop_success=list(changed),
changed=bool(len(changed)),
Expand All @@ -1512,13 +1517,14 @@ def ensure_instance_state(state, ec2):
msg="Unable to terminate instances: {0}".format(failure_reason),
terminate_success=list(terminated),
terminate_failed=terminate_failed)
module.exit_json(
results = dict(
msg='Instances terminated',
terminate_success=list(terminated),
changed=bool(len(terminated)),
terminate_failed=[],
instances=[pretty_instance(i) for i in instances],
)
return results


def change_instance_state(filters, desired_state, ec2):
Expand Down Expand Up @@ -1550,7 +1556,6 @@ def change_instance_state(filters, desired_state, ec2):
if module.check_mode:
changed.add(inst['InstanceId'])
continue

resp = ec2.stop_instances(aws_retry=True, InstanceIds=[inst['InstanceId']])
[changed.add(i['InstanceId']) for i in resp['StoppingInstances']]
if desired_state == 'RUNNING':
Expand Down Expand Up @@ -1596,16 +1601,22 @@ def determine_iam_role(name_or_arn):


def handle_existing(existing_matches, changed, ec2, state):
to_change_state = False
state_results = []
alter_config_result = []
if state in ('running', 'started') and [i for i in existing_matches if i['State']['Name'] != 'running']:
ins_changed, failed, instances, failure_reason = change_instance_state(filters=module.params.get('filters'), desired_state='RUNNING', ec2=ec2)
if failed:
module.fail_json(msg="Couldn't start instances: {0}. Failure reason: {1}".format(instances, failure_reason))
module.exit_json(
changed=bool(len(ins_changed)) or changed,
instances=[pretty_instance(i) for i in instances],
instance_ids=[i['InstanceId'] for i in instances],
)
to_change_state = True
elif state == 'stopped' and [i for i in existing_matches if i['State']['Name'] != 'stopped']:
to_change_state = True
elif state in ('restarted', 'rebooted'):
to_change_state = True
if to_change_state:
# TODO: this breaks check_mode, because change_instance_state handles checkmode but not ensure
# because ensure passes off to change...
Comment on lines +1614 to +1615
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's scary, I hope check mode will work properly before the final version 🙏

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one might be a lingering comment from an earlier stage of the work-in-progress actually, I'll test to confirm - thanks for catching that!

state_results = ensure_instance_state(state, ec2)

changes = diff_instance_and_params(existing_matches[0], module.params, ec2)

for c in changes:
if not module.check_mode:
try:
Expand All @@ -1616,24 +1627,21 @@ def handle_existing(existing_matches, changed, ec2, state):
changed |= add_or_update_instance_profile(existing_matches[0], module.params.get('instance_role'), ec2)
changed |= change_network_attachments(existing_matches[0], module.params, ec2)
altered = find_instances(ec2, ids=[i['InstanceId'] for i in existing_matches])
module.exit_json(
alter_config_result = dict(
changed=bool(len(changes)) or changed,
instances=[pretty_instance(i) for i in altered],
instance_ids=[i['InstanceId'] for i in altered],
changes=changes,
)

if len(state_results):
result = {**state_results, **alter_config_result}
else:
result = alter_config_result
return result


def ensure_present(existing_matches, changed, ec2, state):
if len(existing_matches):
try:
handle_existing(existing_matches, changed, ec2, state)
except (botocore.exceptions.BotoCoreError, botocore.exceptions.ClientError) as e:
module.fail_json_aws(
e, msg="Failed to handle existing instances {0}".format(', '.join([i['InstanceId'] for i in existing_matches])),
# instances=[pretty_instance(i) for i in existing_matches],
# instance_ids=[i['InstanceId'] for i in existing_matches],
)
try:
instance_spec = build_run_instance_spec(module.params, ec2)
# If check mode is enabled,suspend 'ensure function'.
Expand Down Expand Up @@ -1733,6 +1741,7 @@ def main():
],
supports_check_mode=True
)
result = dict()

if module.params.get('network'):
if module.params.get('network').get('interfaces'):
Expand All @@ -1748,10 +1757,6 @@ def main():
# all states except shutting-down and terminated
'instance-state-name': ['pending', 'running', 'stopping', 'stopped']
}
if state == 'stopped':
# only need to change instances that aren't already stopped
filters['instance-state-name'] = ['stopping', 'pending', 'running']

if isinstance(module.params.get('instance_ids'), string_types):
filters['instance-id'] = [module.params.get('instance_ids')]
elif isinstance(module.params.get('instance_ids'), list) and len(module.params.get('instance_ids')):
Expand Down Expand Up @@ -1799,11 +1804,15 @@ def main():
tags['Name'] = name
changed |= manage_tags(match, tags, module.params.get('purge_tags', False), ec2)

if state in ('present', 'running', 'started'):
ensure_present(existing_matches=existing_matches, changed=changed, ec2=ec2, state=state)
elif state in ('restarted', 'rebooted', 'stopped', 'absent', 'terminated'):
if state in ('present', 'running', 'started', 'restarted', 'rebooted', 'stopped'):
# If we're not removing the instance, handle any config changes and update state
if len(existing_matches):
result = handle_existing(existing_matches, changed, ec2, state)
else:
ensure_present(existing_matches=existing_matches, changed=changed, ec2=ec2, state=state)
elif state in ('absent', 'terminated'):
if existing_matches:
ensure_instance_state(state, ec2)
result = ensure_instance_state(state, ec2)
else:
module.exit_json(
msg='No matching instances found',
Expand All @@ -1813,6 +1822,8 @@ def main():
else:
module.fail_json(msg="We don't handle the state {0}".format(state))

module.exit_json(**result)


if __name__ == '__main__':
main()
1 change: 1 addition & 0 deletions tests/integration/targets/ec2_instance/inventory
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ termination_protection_wrapper
tags_and_vpc_settings
checkmode_tests
security_group
state_config_updates

[all:vars]
ansible_connection=local
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Test that configuration changes, like security groups and instance attributes,
# are updated correctly when the instance has different states, and also when
# changing the state of an instance.
# https://github.com/ansible-collections/community.aws/issues/16
- block:
- name: "Make instance with sg and termination protection enabled"
ec2_instance:
state: present
name: "{{ resource_prefix }}-test-state-param-changes"
image_id: "{{ ec2_ami_image }}"
tags:
TestId: "{{ ec2_instance_tag_TestId }}"
security_groups: "{{ sg.group_id }}"
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}"
termination_protection: False
instance_type: "{{ ec2_instance_type }}"
wait: True
register: create_result

- assert:
that:
- create_result is not failed
- create_result.changed
- '"instances" in create_result'
- '"instance_ids" in create_result'
- '"spec" in create_result'
- create_result.instances[0].security_groups[0].group_id == "{{ sg.group_id }}"
- create_result.spec.DisableApiTermination == False

- name: "Change sg and termination protection while instance is in state running"
ec2_instance:
state: running
name: "{{ resource_prefix }}-test-state-param-changes"
image_id: "{{ ec2_ami_image }}"
tags:
TestId: "{{ ec2_instance_tag_TestId }}"
security_groups: "{{ sg2.group_id }}"
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}"
termination_protection: True
instance_type: "{{ ec2_instance_type }}"
register: change_params_result

- assert:
that:
- change_params_result is not failed
- change_params_result.changed
- '"instances" in change_params_result'
- '"instance_ids" in change_params_result'
- '"changes" in change_params_result'
- change_params_result.instances[0].security_groups[0].group_id == "{{ sg2.group_id }}"
- change_params_result.changes[0].DisableApiTermination.Value == True
- change_params_result.changes[1].Groups[0] == "{{ sg2.group_id }}" # TODO fix this to be less fragile


- name: "Change instance state from running to stopped, and change sg and termination protection"
ec2_instance:
state: stopped
name: "{{ resource_prefix }}-test-state-param-changes"
image_id: "{{ ec2_ami_image }}"
tags:
TestId: "{{ ec2_instance_tag_TestId }}"
security_groups: "{{ sg.group_id }}"
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}"
termination_protection: False
instance_type: "{{ ec2_instance_type }}"
register: change_state_params_result

- assert:
that:
- change_state_params_result is not failed
- change_state_params_result.changed
- '"instances" in change_state_params_result'
- '"instance_ids" in change_state_params_result'
- '"changes" in change_state_params_result'
- '"stop_success" in change_state_params_result'
- '"stop_failed" in change_state_params_result'
- change_state_params_result.instances[0].security_groups[0].group_id == "{{ sg.group_id }}"
- change_state_params_result.changes[0].DisableApiTermination.Value == False

- name: "Change sg and termination protection while instance is in state stopped"
ec2_instance:
state: stopped
name: "{{ resource_prefix }}-test-state-param-changes"
image_id: "{{ ec2_ami_image }}"
tags:
TestId: "{{ ec2_instance_tag_TestId }}"
security_groups: "{{ sg2.group_id }}"
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}"
termination_protection: True
instance_type: "{{ ec2_instance_type }}"
register: change_params_stopped_result

- assert:
that:
- change_params_stopped_result is not failed
- change_params_stopped_result.changed
- '"instances" in change_params_stopped_result'
- '"instance_ids" in change_params_stopped_result'
- '"changes" in change_params_stopped_result'
- change_params_stopped_result.instances[0].security_groups[0].group_id == "{{ sg2.group_id }}"
- change_params_stopped_result.changes[0].DisableApiTermination.Value == True

- name: "Change instance state from stopped to running, and change sg and termination protection"
ec2_instance:
state: running
name: "{{ resource_prefix }}-test-state-param-changes"
image_id: "{{ ec2_ami_image }}"
tags:
TestId: "{{ ec2_instance_tag_TestId }}"
security_groups: "{{ sg.group_id }}"
vpc_subnet_id: "{{ testing_subnet_b.subnet.id }}"
termination_protection: False
instance_type: "{{ ec2_instance_type }}"
register: change_params_start_result

- assert:
that:
- change_params_start_result is not failed
- change_params_start_result.changed
- '"instances" in change_params_start_result'
- '"instance_ids" in change_params_start_result'
- '"changes" in change_params_start_result'
- '"reboot_success" in change_params_start_result'
- '"reboot_failed" in change_params_start_result'
- change_params_start_result.instances[0].security_groups[0].group_id == "{{ sg.group_id }}"
- change_params_start_result.changes[0].DisableApiTermination.Value == False

always:

- name: Set termination protection to false (so we can terminate instance) (cleanup)
ec2_instance:
filters:
tag:TestId: "{{ ec2_instance_tag_TestId }}"
termination_protection: False
ignore_errors: yes

- name: Terminate instance
ec2_instance:
filters:
tag:TestId: "{{ ec2_instance_tag_TestId }}"
state: absent
wait: False
ignore_errors: yes

- include_tasks: env_cleanup.yml