Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS assume role not working (regression?) #6566

Open
markchalloner opened this issue Nov 23, 2018 · 12 comments
Open

AWS assume role not working (regression?) #6566

markchalloner opened this issue Nov 23, 2018 · 12 comments
Labels
bug Addresses a defect in current functionality. provider Pertains to the provider itself, rather than any interaction with AWS.

Comments

@markchalloner
Copy link

markchalloner commented Nov 23, 2018

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

$ terraform -v
Terraform v0.11.10
+ provider.aws v1.46.0

Affected Resource(s)

  • aws_XXXXX

Terraform Configuration Files

Sourced from #472 (comment)

# Grab the ARN of the current logged in user
data "aws_caller_identity" "current" {}

# create a role which allows the current user to assume it
resource "aws_iam_role" "terraform_11270" {
  name = "terraform_11270"
  path = "/test/"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "${data.aws_caller_identity.current.arn}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

resource "aws_iam_role_policy" "terraform_11270" {
  name = "terraform_11270"
  role = "${aws_iam_role.terraform_11270.id}"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*",
        "ec2:*"
      ],
      "Resource": "*"
    }
  ]
}
EOF
}

# configure this provider alias to only use the IAM Role created above
provider "aws" {
  alias = "iamrole"

  assume_role {
    role_arn = "${aws_iam_role.terraform_11270.arn}"
  }
}

resource "aws_security_group" "primary" {
  name = "primary"
}

# Create a security group with the above IAM Role assumed
resource "aws_security_group" "secondary" {
  provider = "aws.iamrole"
  name     = "secondary"
}

Expected Behavior

Security group secondary should have been created.

Actual Behavior

Error thrown when trying to assume created role:

Error: Error applying plan:

1 error(s) occurred:

* provider.aws.iamrole: The role "arn:aws:iam::<account>:role/test/terraform_11270" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid

Replaying the plan (after ~10 seconds) succeeds in creating the security group:

$ terraform apply
aws_security_group.primary: Refreshing state... (ID: sg-<primary_id>)
data.aws_caller_identity.current: Refreshing state...
aws_iam_role.terraform_11270: Refreshing state... (ID: terraform_11270)
aws_iam_role_policy.terraform_11270: Refreshing state... (ID: terraform_11270:terraform_11270)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_security_group.secondary
      id:                     <computed>
      arn:                    <computed>
      description:            "Managed by Terraform"
      egress.#:               <computed>
      ingress.#:              <computed>
      name:                   "secondary"
      owner_id:               <computed>
      revoke_rules_on_delete: "false"
      vpc_id:                 <computed>


Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_security_group.secondary: Creating...
  arn:                    "" => "<computed>"
  description:            "" => "Managed by Terraform"
  egress.#:               "" => "<computed>"
  ingress.#:              "" => "<computed>"
  name:                   "" => "secondary"
  owner_id:               "" => "<computed>"
  revoke_rules_on_delete: "" => "false"
  vpc_id:                 "" => "<computed>"
aws_security_group.secondary: Creation complete after 1s (ID: sg-<secondary_id>)

Steps to Reproduce

  1. terraform apply

Important Factoids

References

@markchalloner
Copy link
Author

For other's running into the same issue I worked around this by using an external data provider to supply STS credentials:

#!/usr/bin/env python3

import json
import os
import select
import sys
from time import sleep

import boto3
import botocore.exceptions


def error(message):
    """
    Errors must create non-zero status codes and human-readable, ideally one-line, messages on stderr.
    """
    print(message, file=sys.stderr)
    sys.exit(1)


def validate(data):
    """
    Query data and result data must have keys who's values are strings.
    """
    if not isinstance(data, dict):
        error('Data must be a dictionary.')
    for value in data.values():
        if not isinstance(value, str):
            error('Values must be strings.')


def assume_role():
    if not select.select([sys.stdin,], [], [], 0.0)[0]:
        error("No stdin data.")

    query = json.loads(sys.stdin.read())

    if not isinstance(query, dict):
        error("Data must be a dictionary.")

    validate(query)

    if "role_arn" not in query:
        error("Data parameter must define 'role_arn'.")

    session = boto3.Session()
    if "access_key" in query and "secret_key" in query:
        session = boto3.Session(
            aws_access_key_id=query["access_key"],
            aws_secret_access_key=query["secret_key"],
        )

    if "wait" in query:
        sleep(int(query["wait"]))

    sts = session.client("sts")
    response = {}
    try:
        response = sts.assume_role(RoleArn=query["role_arn"], RoleSessionName=os.path.basename(sys.argv[0]))
    except botocore.exceptions.ClientError as e:
        error(f"Error from AWS API: {e.response['Error']['Message']}")

    sys.stdout.write(json.dumps({
        "access_key": response["Credentials"]["AccessKeyId"],
        "secret_key": response["Credentials"]["SecretAccessKey"],
        "token": response["Credentials"]["SessionToken"],
    }))


if __name__ == '__main__':
    assume_role()

And the following HCL configuration

data "external" "aws_assume_role" {
  program = ["python3", "terraform_aws_assume_role.py"]
  query {
    role_arn = "${aws_iam_role.terraform_11270.arn}"
    wait = 10
  }
  depends_on = ["aws_iam_role.terraform_11270",  "aws_iam_role_policy.terraform_11270"]
}

# configure this provider alias to only use the IAM Role created above
provider "aws" {
  alias = "iamrole"

  access_key = "${data.external.aws_assume_role.result["access_key"]}"
  secret_key = "${data.external.aws_assume_role.result["secret_key"]}"
  token = "${data.external.aws_assume_role.result["token"]}"
}

@bflad bflad added the provider Pertains to the provider itself, rather than any interaction with AWS. label Nov 25, 2018
@ybiconviva
Copy link

I met the same issue under version:

/terraform-plan/dev/application # terraform -v
Terraform v0.11.11

  • provider.aws v1.59.0

but i could not see the behavior like "Replaying the plan (after ~10 seconds) succeeds in creating the security group:", the error exists always.

@YakDriver
Copy link
Member

I believe this is resulting from the same bug addressed here: hashicorp/aws-sdk-go-base#5

@drmason13
Copy link

drmason13 commented Aug 15, 2019

I have had success using the python program provided by @markchalloner - thank you :) I use profiles to choose which user to assume role as so I added the following check for a query["profile"] before the default call to Boto3.Session()

    if "profile" in query:
        session = boto3.Session(profile_name=query["profile"])

Seems to work for me with the following HCL configuration:

data "external" "aws_assume_role" {
  program = ["python3", "terraform_aws_assume_role.py"]
  query {
    role_arn = "<insert role_arn here>"
    profile = "<insert profile name to assume role with here>"
    wait = 3
  }
}

provider "aws" {
  access_key = "${data.external.aws_assume_role.result["access_key"]}"
  secret_key = "${data.external.aws_assume_role.result["secret_key"]}"
  token = "${data.external.aws_assume_role.result["token"]}"
}

@YakDriver
Copy link
Member

YakDriver commented Aug 24, 2019

@aeschright @bflad I've reproduced this issue. It results from eventual consistency. After the creation of a role, it cannot be assumed for 10-30 seconds.

I messed with a wait state for this (see my branch) but the IAM role goes through 2 states before being ready. For 10-20 seconds, the API returns AccessDenied and then UnauthorizedOperation and finally you can successfully assume the role.

@markchalloner An easy, ugly workaround for this is to use a local-exec provisioner with a sleep (timeout on Windows): see a reproducible test of the workaround

resource "aws_iam_role" "tf-test-6d3868d9bed3" {
  name = var.role_name
  path = "/test/"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "${data.aws_caller_identity.current.arn}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

  provisioner "local-exec" {
    command = "sleep 30"
  }
}

@YakDriver
Copy link
Member

I've created a repo with tests to easily reproduce credential-related issues. Visit and contribute. The test to reproduce this issue is here: https://github.com/YakDriver/terraform-cred-tests/tree/master/tests/assume_after_create

@aeschright aeschright added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Oct 24, 2019
@tg12
Copy link

tg12 commented Apr 16, 2020

I have had success using the python program provided by @markchalloner - thank you :) I use profiles to choose which user to assume role as so I added the following check for a query["profile"] before the default call to Boto3.Session()

    if "profile" in query:
        session = boto3.Session(profile_name=query["profile"])

Seems to work for me with the following HCL configuration:

data "external" "aws_assume_role" {
  program = ["python3", "terraform_aws_assume_role.py"]
  query {
    role_arn = "<insert role_arn here>"
    profile = "<insert profile name to assume role with here>"
    wait = 3
  }
}

provider "aws" {
  access_key = "${data.external.aws_assume_role.result["access_key"]}"
  secret_key = "${data.external.aws_assume_role.result["secret_key"]}"
  token = "${data.external.aws_assume_role.result["token"]}"
}

Is this still an issue? Do you have a link to the Python program provided by Mark. It would be of great use! Thank you.

@drmason13
Copy link

It's higher up in the comments 😂
#6566 (comment)

Unsure if it's still an issue

@tg12
Copy link

tg12 commented Apr 16, 2020

It's higher up in the comments 😂
#6566 (comment)

Unsure if it's still an issue

Sorry so it is haha! Anyway yes it appears to be an issue for me.

@GerardoGR
Copy link

GerardoGR commented Jun 28, 2021

I'm experiencing a similar issue of assuming roles, but with the aws_iot_topic_rule:

Error: error creating IoT Topic Rule (redacted): InvalidRequestException: iot.amazonaws.com is unable to perform: sts:AssumeRole on resource arn:aws:iam::***:role/(redacted)

I initially tried a version >= 3.35.0 (specifically 3.47.0), which includes a fix for the read-after create eventual consistency:

resource/aws_iam_role: Handle read-after-create eventual consistency (#18435)

(3.35.0 changelog)

However that didn't fix the issue. Though the fix proposed by @YakDriver works as expected: #6566 (comment). So I guess this issue (#6566) it is still relevant(?).

Edit: Leaving this comment in case someone else goes through the same troubleshooting path.

@sstaley-sparkpost
Copy link

I seem to be unable to assume a role with the following config:

provider "aws" {
  profile = var.aws_profile
  region  = "us-west-2"
  assume_role {
    role_arn     = var.assume_role_arn
    session_name = "sre_iam_mgmt-${timestamp()}"
  }
}

So this does appear to still be an issue. What's odd is that I'm able to run plan, just not apply. I'm thinking the issue is the role is assumed during the plan stage, and then during apply it's already been assumed and so cannot assume itself.

@NapalmCodes
Copy link

Same issue occurring for IoT rules that need to assume the rule. The workaround from @YakDriver seems to be working well but as already discussed kinda "hacky".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. provider Pertains to the provider itself, rather than any interaction with AWS.
Projects
None yet
Development

No branches or pull requests

10 participants