Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error creating Integration on First Deploy #254

Open
jamie1911 opened this issue Jul 24, 2023 · 2 comments
Open

Error creating Integration on First Deploy #254

jamie1911 opened this issue Jul 24, 2023 · 2 comments
Labels
kind/bug Some behavior is incorrect or out of spec

Comments

@jamie1911
Copy link

What happened?

We always seem to get the following error during the first pulumi up. When I run pulumi up again after the failure, it completes fine.

My guess is, when creating objects, Splunk does some additional steps in the background to set up their side of the AWS role and the additional time is needed for this to happen. Might it make sense to add a retry or something?

@ Updating....
    pulumi:pulumi:Stack splunk-99999999999  warning: use_get_metric_data_method is deprecated: This field will be removed
 +  pulumi:pulumi:Stack splunk-99999999999 creating (0s) warning: use_get_metric_data_method is deprecated: This field will be removed
@ Updating....
 +  signalfx:aws:ExternalIntegration aws-NAME_observability_external_integration creating (0s) 
@ Updating.....
 +  signalfx:aws:ExternalIntegration aws-NAME_observability_external_integration created (2s) 
@ Updating.......
 +  aws:iam:Role splunk-observability-role creating (0s) 
@ Updating....
 +  aws:iam:Role splunk-observability-role created (0.54s) 
@ Updating....
    signalfx:aws:Integration aws-NAME_observability_integration  warning: urn:pulumi:99999999999::splunk::signalfx:aws/integration:Integration::aws-NAME_observability_integration verification warning: "use_get_metric_data_method": [DEPRECATED] This field will be removed
 +  signalfx:aws:Integration aws-NAME_observability_integration creating (0s) warning: urn:pulumi:99999999999::splunk::signalfx:aws/integration:Integration::aws-NAME_observability_integration verification warning: "use_get_metric_data_method": [DEPRECATED] This field will be removed
 +  signalfx:aws:Integration aws-NAME_observability_integration creating (0s) error: 1 error occurred:
 +  signalfx:aws:Integration aws-NAME_observability_integration **creating failed** error: 1 error occurred:
 +  pulumi:pulumi:Stack splunk-99999999999 creating (9s) error: update failed
@ Updating....
 +  pulumi:pulumi:Stack splunk-99999999999 **creating failed (8s)** 1 error; 1 warning
Diagnostics:
  pulumi:pulumi:Stack (splunk-99999999999):
    warning: use_get_metric_data_method is deprecated: This field will be removed
    error: update failed
  signalfx:aws:Integration (aws-NAME_observability_integration):
    warning: urn:pulumi:99999999999::splunk::signalfx:aws/integration:Integration::aws-NAME_observability_integration verification warning: "use_get_metric_data_method": [DEPRECATED] This field will be removed
    error: 1 error occurred:
    	* creating urn:pulumi:99999999999::splunk::signalfx:aws/integration:Integration::aws-NAME_observability_integration: Unexpected status code: 400: {
      "code" : 400,
      "errorType" : "validation",
      "failedRegions" : [ "us-east-1" ],
      "message" : "Error validating AWS / Cloudwatch credentials\nValidation failed for following region(s):\nus-east-1\n[ec2] software.amazon.awssdk.services.sts.model.StsException: User: arn:aws:sts::562691491210:assumed-role/eks-us1-cloud-metric-syncer/aws-sdk-java-1690199570814 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::99999999999:role/splunk/splunk-observability (Service: Sts, Status Code: 403, Request ID: eac90aa6-0013-4b9a-9000-cc7be53ca1ea)\n[monitoring] software.amazon.awssdk.services.sts.model.StsException: User: arn:aws:sts::562691491210:assumed-role/eks-us1-cloud-metric-syncer/aws-sdk-java-1690199570814 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::99999999999:role/splunk/splunk-observability (Service: Sts, Status Code: 403, Request ID: 1c994551-421c-4672-a58e-21924cc1f6aa)",
      "successRegions" : [ ]
    }
    Please verify you are using an admin token when working with integrations
Outputs:
    splunk_observability_role_arn: "arn:aws:iam::99999999999:role/splunk/splunk-observability"
Resources:
    + 3 created
Duration: 11s

Expected Behavior

The expected behavior would be that the pulumi_signalfx.aws.ExternalIntegration and pulumi_signalfx.aws.Integration resources both create in a timely\successful manner

Steps to reproduce

Code to reproduce minus some of the parameter setup for pulumi_signalfx.aws.Integration

account_prefix = config.require("account-prefix")

observability_external_integration = pulumi_signalfx.aws.ExternalIntegration(
    f"{account_prefix}_observability_external_integration"
)

observability_role = aws.iam.Role(
    "splunk-observability-role",
    name="splunk-observability",
    path="/splunk/",
    assume_role_policy=pulumi.Output.all(
        observability_external_integration.signalfx_aws_account, observability_external_integration.external_id
    ).apply(
        lambda args: json.dumps(
            {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Sid": "SplunkAssumeRole",
                        "Effect": "Allow",
                        "Principal": {"AWS": args[0]},
                        "Action": "sts:AssumeRole",
                        "Condition": {"StringEquals": {"sts:ExternalId": args[1]}},
                    }
                ],
            }
        )
    ),
    inline_policies=[
        aws.iam.RoleInlinePolicyArgs(
            name="publishpolicy",
            policy=json.dumps(
                {
                    "Version": "2012-10-17",
                    "Statement": [
                        {
                            "Effect": "Allow",
                            "Action": [
                                "apigateway:GET",
                                "autoscaling:DescribeAutoScalingGroups",
                                "cloudcontrol:ListResources",
                                "cloudcontrol:GetResource",
                                "cloudfront:GetDistributionConfig",
                                "cloudfront:ListDistributions",
                                "cloudfront:ListTagsForResource",
                                "cloudwatch:DescribeAlarms",
                                "cloudwatch:GetMetricData",
                                "cloudwatch:GetMetricStatistics",
                                "cloudwatch:ListMetrics",
                                "directconnect:DescribeConnections",
                                "dynamodb:DescribeTable",
                                "dynamodb:ListTables",
                                "dynamodb:ListTagsOfResource",
                                "ec2:DescribeInstances",
                                "ec2:DescribeInstanceStatus",
                                "ec2:DescribeNatGateways",
                                "ec2:DescribeRegions",
                                "ec2:DescribeReservedInstances",
                                "ec2:DescribeReservedInstancesModifications",
                                "ec2:DescribeTags",
                                "ec2:DescribeVolumes",
                                "ecs:DescribeClusters",
                                "ecs:DescribeServices",
                                "ecs:DescribeTasks",
                                "ecs:ListClusters",
                                "ecs:ListServices",
                                "ecs:ListTagsForResource",
                                "ecs:ListTaskDefinitions",
                                "ecs:ListTasks",
                                "eks:DescribeCluster",
                                "eks:ListClusters",
                                "elasticache:DescribeCacheClusters",
                                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                                "elasticloadbalancing:DescribeLoadBalancers",
                                "elasticloadbalancing:DescribeTags",
                                "elasticloadbalancing:DescribeTargetGroups",
                                "elasticmapreduce:DescribeCluster",
                                "elasticmapreduce:ListClusters",
                                "es:DescribeElasticsearchDomain",
                                "es:ListDomainNames",
                                "kinesis:DescribeStream",
                                "kinesis:ListShards",
                                "kinesis:ListStreams",
                                "kinesis:ListTagsForStream",
                                "kinesisanalytics:ListApplications",
                                "kinesisanalytics:DescribeApplication",
                                "lambda:GetAlias",
                                "lambda:ListFunctions",
                                "lambda:ListTags",
                                "logs:DeleteSubscriptionFilter",
                                "logs:DescribeLogGroups",
                                "logs:DescribeSubscriptionFilters",
                                "logs:PutSubscriptionFilter",
                                "organizations:DescribeOrganization",
                                "rds:DescribeDBInstances",
                                "rds:DescribeDBClusters",
                                "rds:ListTagsForResource",
                                "redshift:DescribeClusters",
                                "redshift:DescribeLoggingStatus",
                                "s3:GetBucketLocation",
                                "s3:GetBucketLogging",
                                "s3:GetBucketNotification",
                                "s3:GetBucketTagging",
                                "s3:ListAllMyBuckets",
                                "s3:ListBucket",
                                "s3:PutBucketNotification",
                                "sqs:GetQueueAttributes",
                                "sqs:ListQueues",
                                "sqs:ListQueueTags",
                                "states:ListActivities",
                                "states:ListStateMachines",
                                "tag:GetResources",
                                "workspaces:DescribeWorkspaces",
                            ],
                            "Resource": "*",
                        }
                    ],
                }
            ),
        ),
    ],
    opts=pulumi.ResourceOptions(depends_on=[observability_external_integration])
)

pulumi_signalfx.aws.Integration(
        f"{account_prefix}_observability_integration",
        enabled=True,
        use_get_metric_data_method=True,
        named_token=token_name,
        integration_id=observability_external_integration.id,
        external_id=observability_external_integration.external_id,
        role_arn=pulumi.Output.all(observability_role.arn).apply(lambda args: str(args[0])),
        regions=regions,
        poll_rate=300,
        enable_check_large_volume=True,
        import_cloud_watch=True,
        enable_aws_usage=False,
        namespace_sync_rules=namespace_sync_rules,
        sync_custom_namespaces_only=False,
        custom_namespace_sync_rules=custom_namespace_sync_rules,
        opts=pulumi.ResourceOptions(depends_on=[observability_role, observability_external_integration]),
)

Output of pulumi about

pulumi-3.76.0 pulumi-aws-5.42.0 pulumi-docker-3.6.1 pulumi-gitlab-6.1.1 pulumi-signalfx-5.10.0

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@jamie1911 jamie1911 added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Jul 24, 2023
@rquitales
Copy link
Member

@jamie1911 Thanks for reporting this issue and sorry you're facing this. I'm still trying to repo this on my side and will update once I do. To clarify, is this issue of creating an Integration something that occurs frequently? From the logs you provided, it does appear to be a timeout related issue, so attempting a retry might be a potential solution for this.

@rquitales rquitales removed the needs-triage Needs attention from the triage team label Jul 26, 2023
@jamie1911
Copy link
Author

@jamie1911 Thanks for reporting this issue and sorry you're facing this. I'm still trying to repo this on my side and will update once I do. To clarify, is this issue of creating an Integration something that occurs frequently? From the logs you provided, it does appear to be a timeout related issue, so attempting a retry might be a potential solution for this.

Hello @rquitales, the issue I am facing happens only during the first pulumi_signalfx.aws.ExternalIntegration and first aws.iam.Role to support the first pulumi_signalfx.aws.Integration. Essentially, we create AWS accounts somewhat regularly for different projects or developers. When we create an AWS account, someone goes and adds this new AWS account to splunk observability via a new pulumi stack in the project that uses the code referenced in the issue.

it ALWAYS fails the first time we run pulumi up. normally once it fails with an error as shown above. However, when we rerun pulumi up, it then succeeds.

My guess is, the initial creation of the IAM role in our account and Splunk doesn't have the role in their IAM permissions just yet as Integration has role_arn which tells Splunk what role to assume. I'm thinking there is some delay on Splunk side while it sets the role, however pulumi or the provider is checking if its complete too soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

2 participants