Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providers/aws: rework instance block devices #1045

Merged
merged 2 commits into from
Mar 19, 2015
Merged

providers/aws: rework instance block devices #1045

merged 2 commits into from
Mar 19, 2015

Conversation

phinze
Copy link
Contributor

@phinze phinze commented Feb 25, 2015

Instance block devices are now managed by three distinct sub-resources:

  • root_block_device - introduced previously
  • ebs_block_device - all additional ebs-backed volumes
  • ephemeral_block_device - instance store / ephemeral devices

The AWS API support around BlockDeviceMapping is pretty confusing. It's
a single collection type that supports these three members each of which
has different fields and different behavior.

My biggest hiccup came from the fact that Instance Store volumes do not
show up in any response BlockDeviceMapping for any EC2 Describe* AWS API
calls. They're only available from the instance meta-data service as
queried from inside the node.

Also note this removes block_device altogether for a clean break from
old configs. New configs will need to sort their block_device
declarations into the three new types.

Fixes #858

@phinze
Copy link
Contributor Author

phinze commented Feb 25, 2015

Need to follow on with docs, but this is ready for code review I think.

if d.HasChange("source_dest_check") {
opts.SetSourceDestCheck = true
opts.SourceDestCheck = d.Get("source_dest_check").(bool)
modify = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this accidentally revert the set/unset case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah bad rebase there. Fixed locally at EOD but didn't push. On it!

@mitchellh
Copy link
Contributor

A couple questions but LGTM

@pmoust
Copy link
Contributor

pmoust commented Feb 25, 2015

Currently we attach ephemeral block devices that come in as a pack offered by AWS in certain instance types i.e. c3.2xlarge (they offer 2 gp2 type ephemeral_block_devices). The device mapping/mountpoint-set is done via cloud-init as follows:

resize_rootfs: true
mounts:
  - [ ephemeral0, /var/www, ext4, "defaults" ]
  - [ ephemeral1, /var/log, ext4, "defaults" ]

I would prefer if this was done via terraform of course.

Those ephemeral volumes are not are not in state, whereas root_block_device is i.e.

                "aws_instance.pph_webserver.1": {
                    "type": "aws_instance",
                    "depends_on": [
                        "aws_security_group.pph_admins",
                        "aws_security_group.pph_allow_elb",
                        "aws_security_group.pph_allow_jarvis",
                        "aws_security_group.pph_allow_vpn",
                        "aws_security_group.pph_webserver",
                        "aws_subnet.pph"
                    ],
                    "primary": {
                        "id": "i-587df0a2",
                        "attributes": {
                            "ami": "ami-08f8b160",
                            "associate_public_ip_address": "true",
                            "availability_zone": "us-east-1b",
                            "block_device.#": "0",
                            "id": "i-587df0a2",
                            "instance_type": "c3.2xlarge",
                            "key_name": "",
                            "private_dns": "<snipped>",
                            "private_ip": "10.30.1.227",
                            "public_dns": "<snipped>",
                            "public_ip": "<snipped>",
                            "root_block_device.#": "1",
                            "root_block_device.0.delete_on_termination": "true",
                            "root_block_device.0.device_name": "/dev/sda1",
                            "root_block_device.0.volume_size": "8",
                            "root_block_device.0.volume_type": "standard",
                            "security_groups.#": "5",
                            "security_groups.1610004074": "sg-d1b443b5",
                            "security_groups.1886755183": "sg-d9b443bd",
                            "security_groups.2391182664": "sg-d0b443b4",
                            "security_groups.390598185": "sg-dab443be",
                            "security_groups.3985752899": "sg-b4b443d0",
                            "subnet_id": "subnet-13b90264",
                            "tags.#": "1",
                            "tags.Name": "web01",
                            "tenancy": "default",
                            "user_data": "56901ddbd9cd1ba3940b0c3210f62abd54e340a1"
                        }
                    }
                },

Perhaps those type of cases should be handled in a manner similar to root_block_device and not perform any calls on AWS API to list their state after creation but keep them in state. After they are added the should be only linked to the aws_instance entity that summoned them and that's that.
I might be off, but I think plan/refresh rechecks the status of aforementioned ephemeral_block_devices?

@phinze
Copy link
Contributor Author

phinze commented Feb 25, 2015

Perhaps those type of cases should be handled in a manner similar to root_block_device and not perform any calls on AWS API to list their state after creation but keep them in state. After they are added the should be only linked to the aws_instance entity that summoned them and that's that.

That is definitely the intent in this PR.

I might be off, but I think plan/refresh rechecks the status of aforementioned ephemeral_block_devices?

In the guard here:

https://github.com/hashicorp/terraform/blob/f-block-devices/builtin/providers/aws/resource_aws_instance.go#L582-L585

I'm attempting to only set the ephemeral_block_device details in the state once, and only if they weren't already set in the config.

cc @mitchellh for a double check of the semantics here and because we're talking about changing GetOk soon, though i think this particular call would still work even with the changes we're considering.

My desired behavior for ephemeral_block_device is:

  • On create, if config provided, that gets written to the state
  • On create, if no config provided, we read from AMI details once for the state info
  • Plan, refresh, etc, should never touch/update the value in the state
  • Any update to the field in the config results in a ForceNew destroy/create, which just reapplies these same rules to a fresh instance

I'm not sure if that's what I've achieved, but that's what I was going for.

@mitchellh
Copy link
Contributor

@phinze I may be missing something but it looks like nothing in seenBlockDeviceNames is ever set.

Can you explain what that is guarding against? It isn't clear to me.

@phinze
Copy link
Contributor Author

phinze commented Feb 25, 2015

I may be missing something but it looks like nothing in seenBlockDeviceNames is ever set.

Can you explain what that is guarding against? It isn't clear to me.

Sure - so the rules for ephemeral device assignment are as follows:

  • If no runtime block device mapping (BDM), use BDM from AMI
  • If runtime BDM includes ephemeral devices, use that and ignore AMI's BDM
  • If runtime BDM does not include ephemeral devices, use AMI's BDM except when device names collide, in which case the runtime BDM trumps the AMI.

So seenDeviceNames is needed to track that last caveat.

We initialize a map[string]bool, record device names we see, grab the list out from the helper function's return value, then ignore devices we've already seen.

It's entirely possible I'm missing something though! Let me know if this makes sense or if there's a way for me to make the code clearer.

@mitchellh
Copy link
Contributor

@phinze Got it got it. This LGTM then.

I recommend just making sure we have good acceptance test coverage here and I think we do.

@phinze
Copy link
Contributor Author

phinze commented Feb 27, 2015

After talking with @sparkprime in IRC about this (and related changes he'd like to make in the Google Cloud provider), I don't think it's a good idea to merge this without state migration support in core.

Without state migration, UX around terraform 0.4 upgrade looks like:

  • Upgrade to terraform 0.4
  • Get config errors about block_device not being valid config
  • Read docs about breaking changes with instructions on how to update config
  • Make appropriate config updates
  • Watch as Terraform requires that you destroy and recreate every instance in your infrastructure 😲

I suppose we could add docs around manual state munging to prevent a massive rebuild from happening, but for my money this is a core feature we're going to need over and over again, so it's better to get it done sooner than later.

@pmoust
Copy link
Contributor

pmoust commented Feb 27, 2015

@phinze or, add a terraform version node in the JSON of tfstate that indicates which TF version built said state. If major and/or minor versions have mismatch with current terraform version that is executing a plan, then stop and prompt the user to check the changelog. That is if we keep a type of SemVer.
-f would allow performing a plan despite version mismatch.

When you changed added root_block_device our plan marked all our ec2 instaces to be recreated, thus removing them from plan. We have the state under revision control, so we stashed that state change, checked the docs and applied the new DSL change. New plan did not require recreation.

@phinze
Copy link
Contributor Author

phinze commented Feb 27, 2015

When you changed added root_block_device our plan marked all our ec2 instaces to be recreated, thus removing them from plan. We have the state under revision control, so we stashed that state change, checked the docs and applied the new DSL change. New plan did not require recreation.

Thanks for providing this real-world context! Can you elaborate a little more: in order to get back to a "clean plan" did you need to partially modify the state? I can't tell if "stashed that state change" refers to the whole file or just parts of it.

The thing I'd like to avoid is an upgrade story that requires manually munging the statefile.

@pmoust
Copy link
Contributor

pmoust commented Feb 27, 2015

@phinze by "We have the state under revision control, so we stashed that state change" I only mean I 've just issued git checkout -- terraform.tfstate after the breaking change removed our instances from plan. Changed to root_block_device where applicable, planned again and there was no need to change as the refresh matched the instructions of the .tf.

Of course this applies to our case specifically, one might not keep a state in VCS but use other provided ways of storing it I suppose.

If you can keep everything BC that would be great, but some stuff just add more complexity like this. Your teams engineering skills beat mine by miles though, and if a state migrator is something that is easily added then disregard my input. But I do believe that in some cases a proper state migration is easier said than done.

@phinze
Copy link
Contributor Author

phinze commented Mar 5, 2015

This is chilling until core gets support for deprecation, a feature that I'm starting today!

@phinze phinze force-pushed the f-block-devices branch 4 times, most recently from 00acd9b to 9c5bbe6 Compare March 13, 2015 21:01
@phinze
Copy link
Contributor Author

phinze commented Mar 13, 2015

I'm close I promise. 😀

Acceptance tests are passing, just need to get state migration finished up and this can land... stay tuned! 📺

@phinze
Copy link
Contributor Author

phinze commented Mar 17, 2015

Still sussing out a few remaining issues with ephemeral devices in the upgrade case. I haven't forgotten about you, oh block devices! I'll follow up at EOD today with an update.

We were previously only recording the schema version on refresh. This
caused the state to be incorrectly written after a `terraform apply`
causing subsequent commands to run the state through an unnecessary
migration.
@phinze
Copy link
Contributor Author

phinze commented Mar 19, 2015

😅 😅 😅 😅 😅 FINALLY I THINK IT MIGHT BE DONE 😅 😅 😅 😅 😅

@mitchellh @catsby give me some 👀

I'll make a few inline comments pointing out some details.

Type: schema.TypeMap,
Optional: true,
Removed: "Split out into three sub-types; see Changelog and Docs",
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First use of Removed - intended UX here for user that's upgraded:

  1. Receive this message
  2. Update config
  3. Catch state migration on next plan or apply which, if all goes well, should result in an empty plan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phinze Interesting idea: make a web page (unlinked from docs) dedicated to explaining the change and how to migrate, and just link to that in this message. i.e. /docs/providers/aws/changes/instance_1.html (resource type + version)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo I like this idea. I'll follow up with it in another PR.

@@ -151,7 +151,7 @@ func (r *Resource) Apply(
err = r.Update(data, meta)
}

return data.State(), err
return r.recordCurrentSchemaVersion(data.State()), err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the bug that wasted me a nice little bit of time. 😭 See the first commit's msg for details.

@mitchellh
Copy link
Contributor

LGTM! I made a few comments but none that really require any action.

Instance block devices are now managed by three distinct sub-resources:

 * `root_block_device` - introduced previously
 * `ebs_block_device` - all additional ebs-backed volumes
 * `ephemeral_block_device` - instance store / ephemeral devices

The AWS API support around BlockDeviceMapping is pretty confusing. It's
a single collection type that supports these three members each of which
has different fields and different behavior.

My biggest hiccup came from the fact that Instance Store volumes do not
show up in any response BlockDeviceMapping for any EC2 `Describe*` API
calls. They're only available from the instance meta-data service as
queried from inside the node.

This removes `block_device` altogether for a clean break from old
configs. New configs will need to sort their `block_device`
declarations into the three new types. The field has been marked
`Removed` to indicate this to users.

With the new block device format being introduced, we need to ensure
Terraform is able to properly read statefiles written in the old format.
So we use the new `helper/schema` facility of "state migrations" to
transform statefiles in the old format to something that the current
version of the schema can use.

Fixes #858
@phinze
Copy link
Contributor Author

phinze commented Mar 19, 2015

It's happening!

phinze added a commit that referenced this pull request Mar 19, 2015
providers/aws: rework instance block devices
@phinze phinze merged commit 46b6307 into master Mar 19, 2015
@phinze phinze deleted the f-block-devices branch March 19, 2015 14:09
blockDevices = append(blockDevices, rootBlockDevices...)
// if err := d.Set("ephemeral_block_device", vL); err != nil {
// return err
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah whoops I was too excited to land. Fixing right on master.

@ghost
Copy link

ghost commented May 4, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators May 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS ephemeral storage
4 participants