Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving cloudwatch endpoint for all regions #4191

Merged
merged 1 commit into from
Jul 24, 2024
Merged

Conversation

mye956
Copy link
Contributor

@mye956 mye956 commented May 23, 2024

Summary

This PR will aim to have agent resolve the correct AWS Cloudwatch endpoint for all regions in the case where tasks are using awslogs as the log driver type. Previously, we ran into an issue where the current docker version we're relying on is unable to resolve the correct Cloudwatch endpoints for the new regions and as a immediate remediation we're now relying on agent to resolve the endpoints. We should have this behavior be consistent across all regions.

Note: We will be relying on AWS SDK Go V1 to attempt to resolve the correct endpoint (similar to moby). This will need to be updated once we upgrade to AWS SDK Go V2.

Implementation details

  • Removed specific region checks to in order to obtain the cloudwatch endpoint
  • Relying on awslogs-region to obtain the correct region of the Cloudwatch endpoint

Testing

Manual testing

Used the following task definition:

{
    "family": "test",
    "containerDefinitions": [
        {
            "name": "awslogs-test",
            "image": "busybox",
            "cpu": 256,
            "memory": 64,
            "portMappings": [],
            "essential": true,
            "command": [
                "sh",
                "-c",
                "echo hello world"
            ],
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "test-log-group",
                    "awslogs-region": "ca-central-1",
                    "awslogs-stream-prefix": "cw-test"
                }
            },
            "systemControls": []
        }
    ]
}

Task transitioned to running

evel=debug time=2024-05-23T18:38:03Z msg="Transitioned container" task="78959252ccc04cdd88aec25340bb824d" container="awslogs-test" runtimeID="528be3477391117131c42722d6089a0430852f5177fad30a2e949907e2bf51be" nextState="RUNNING" error=<nil>
level=debug time=2024-05-23T18:38:03Z msg="Received non-transition events" task="78959252ccc04cdd88aec25340bb824d"
level=debug time=2024-05-23T18:38:03Z msg="Updating task's known status" task="78959252ccc04cdd88aec25340bb824d"
level=debug time=2024-05-23T18:38:03Z msg="Found container with earliest known status" knownStatus=RUNNING desiredStatus=RUNNING task="78959252ccc04cdd88aec25340bb824d" container="awslogs-test"
level=debug time=2024-05-23T18:38:03Z msg="Updating task's desired status" taskFamily="test" taskVersion="1" taskArn="arn:aws:ecs:us-west-2:113424923516:task/default/78959252ccc04cdd88aec25340bb824d" taskKnownStatus="RUNNING" taskDesiredStatus="RUNNING" nContainers=1 nENIs=0

Task container exited gracefully

[ec2-user@ip-172-31-46-255 amazon-ecs-agent]$ docker ps -a
CONTAINER ID   IMAGE                            COMMAND                  CREATED              STATUS                        PORTS     NAMES
528be3477391   busybox                          "sh -c 'echo hello w…"   15 seconds ago       Exited (0) 14 seconds ago               ecs-test-1-awslogs-test-82d4d4a0989bc08b1400

Task was able to successfully write to Cloudwatch log group in ca-central-1 from us-west-2

2024-05-23T18:38:03.408Z	hello world

New tests cover the changes: Yes

Description for the changelog

enhancement: Resolving Cloudwatch endpoint in all regions

Does this PR include breaking model changes? If so, Have you added transformation functions?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mye956 mye956 force-pushed the aws-cw-endpoint branch 3 times, most recently from 5e34cef to 9ea4d3c Compare May 23, 2024 19:48
@mye956 mye956 marked this pull request as ready for review May 23, 2024 19:49
@mye956 mye956 requested a review from a team as a code owner May 23, 2024 19:49
@mye956 mye956 changed the title [WIP] Resolving cloudwatch endpoint for all regions Resolving cloudwatch endpoint for all regions May 23, 2024
}
if endpoint == "" {
endpoint = fmt.Sprintf("https://logs.%s.%s", region, dnsSuffix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFO: Just asking to learn, is this the default format of the endpoint if its not found in partition.EndpointFor? Does this mean it's a special type of region/not-opt in region?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work in Non-Commercial regions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is typically the format for all service endpoints. We're adding this in the case that AWS SDK Go V1 can't resolve it on it's own (i.e. the information for a new region aren't present).

https://docs.aws.amazon.com/general/latest/gr/rande.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work in Non-Commercial regions?

As in isolated regions? If so then yes they do. This is more of a follow up to #4143.

"defaultDNSSuffix": ep.AwsPartition().DNSSuffix(),
})
dnsSuffix = ep.AwsPartition().DNSSuffix()
region := hostConfig.LogConfig.Config["awslogs-region"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would hostConfig.LogConfig.Config["awslogs-region"] ever be empty or null? If yes, should fallback to use region := engine.cfg.AWSRegion before moving to line 1863?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIght so awslogs-region shouldn't be empty/null ever. If a customer were to omit this option from the task definition then it would be considered invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thought process was to also default to engine.cfg.AWSRegion but after testing manually I couldn't find a scenario where this is not aparent from the task payload. Although I'm open to have this as a safety net or maybe fail the task.

},
{
name: "test container that uses awslogs log driver in BJS",
region: "cn-north-1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: cn-north-1 and us-gov-east-1 are both available/published in https://github.com/aws/aws-sdk-go-v2, so we should be ok to use them in our test cases.

@mye956 mye956 merged commit f0bbe10 into aws:dev Jul 24, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants