Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ecs: service.serviceConnectConfiguration generates bogus CloudMap namespaces #25616

Closed
franck102 opened this issue May 16, 2023 · 17 comments · Fixed by #25891
Closed

ecs: service.serviceConnectConfiguration generates bogus CloudMap namespaces #25616

franck102 opened this issue May 16, 2023 · 17 comments · Fixed by #25891
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@franck102
Copy link

Describe the bug

This cdk code:

this.cluster = new ecs.Cluster(this, "MyCluster", {
            clusterName: cdk.PhysicalName.GENERATE_IF_NEEDED,
            vpc: this.vpc,
            defaultCloudMapNamespace: {
                name: "passmate.private",
                useForServiceConnect: true,
                type: cloudmap.NamespaceType.DNS_PRIVATE,
                vpc: this.vpc
            },
        });

... and in a different stack:

keycloakContainer.addPortMappings({
            name: "keycloak_http",
            containerPort: this.webPort});
...
this.service = new ecs.FargateService(this, 'Service', {
            cluster: props.cluster,
           ...
            serviceConnectConfiguration: {
                services: [{
                    portMappingName: "keycloak_http",
                    dnsName: "keycloak",
                    port: _keycloakPublicPort,
                }]
            }
        });

generates two namespaces in CloudMap:

Domain name       | Description | Instance discovery                           | Namespace ID
passmate.private | -                   | API calls and DNS queries in VPCs | ns-jfyvbvl33oocytvv
passmate.private | -                   | API calls                                             | ns-pge2n6bdb7z4k72x

The "DNS Queries" contains no services, the second namespace (API calls only) contains the keycloak service defined above.

Expected Behavior

I expected a private DNS entry to be created for the keycloak service, with the DNS name keycloak.passmate.private.

Current Behavior

No DNS record gets created.

The first namespace ("API calls and DNS queries in VPCs") has the expected aws:cloudformation:stack-id, aws:cloudformation:stack-name and aws:cloudformation:logical-id tags.

The second namespace ("API calls" only) which contains the service has a single "AmazonECSManaged" tag.

Reproduction Steps

Create a simple stack with cluster with a default CloudMap namespace & a ServiceConnect-enabled service as described above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.79.1 (build 2e7f8b7)

Framework Version

"version": "2.79.0",

Node.js Version

v18.3.0

OS

macOS 12.6

Language

Typescript

Language Version

5.0.4

Other information

No response

@franck102 franck102 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 16, 2023
@github-actions github-actions bot added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label May 16, 2023
@pahud pahud changed the title aws-ecs/ServiceConnect: service.serviceConnectConfiguration generates bogus CloudMap namespaces ecs: service.serviceConnectConfiguration generates bogus CloudMap namespaces May 16, 2023
@pahud
Copy link
Contributor

pahud commented May 16, 2023

Hi

I tried this in my environment and looks like it only create one namespace?

export class Demo2Stack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const vpc = getOrCreateVpc(this);
    const cluster = new ecs.Cluster(this, "MyCluster", {
      clusterName: PhysicalName.GENERATE_IF_NEEDED,
      vpc,
      defaultCloudMapNamespace: {
          name: "passmate.private",
          useForServiceConnect: true,
          type: servicediscovery.NamespaceType.DNS_PRIVATE,
          vpc
      },
    });
    const task = new ecs.FargateTaskDefinition(this, 'Task', {
      cpu: 256,
      memoryLimitMiB: 512,
    });
    task.addContainer('keycloak', {
      image: ecs.ContainerImage.fromRegistry('quay.io/keycloak/keycloak'),
      portMappings: [
        { name: 'keycloak_http', containerPort: 8080 },
      ]
    });
    new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition: task,
      serviceConnectConfiguration: {
          services: [{
              portMappingName: "keycloak_http",
              dnsName: "keycloak",
              port: 80,
          }]
      }
    });
  };
};

Resources
[+] AWS::ECS::Cluster MyCluster MyCluster4C1BA579
[+] AWS::ServiceDiscovery::PrivateDnsNamespace MyCluster/DefaultServiceDiscoveryNamespace MyClusterDefaultServiceDiscoveryNamespace4A9016C4
[+] AWS::IAM::Role Task/TaskRole TaskTaskRoleE98524A1
[+] AWS::ECS::TaskDefinition Task Task79114B6B
[+] AWS::ECS::Service Service/Service ServiceD69D759B
[+] AWS::EC2::SecurityGroup Service/SecurityGroup ServiceSecurityGroupC96ED6A7

@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels May 16, 2023
@pahud
Copy link
Contributor

pahud commented May 16, 2023

BTW, if you are building keycloak service on ECS or Fargate, you probably can refer to this aws-sample which does not use cloudmap but JDBC_PING and stickinessCookie instead. It's not perfect but worth a look.

https://github.com/aws-samples/cdk-keycloak/blob/main/src/keycloak.ts

@franck102
Copy link
Author

I suspect the issue arises because the cluster and the service are defined in different stacks (see the Fn::ImportValue below)?
Everything indeed looks correct in the generated template - a single private dns namespace in the cluster, and service refers to it by name.
However at deploy time the service configuration causes AWS to instantiate a new namespace instead of reusing the cluster's default one.

Cluster template:

 PassmateCluster89C7783C:
    Type: AWS::ECS::Cluster
    Properties:
      ServiceConnectDefaults:
        Namespace: passmate.private
    Metadata:
      aws:cdk:path: PassmateStack/Infra/PassmateCluster/Resource
  PassmateClusterDefaultServiceDiscoveryNamespace9D590D92:
    Type: AWS::ServiceDiscovery::PrivateDnsNamespace
    Properties:
      Name: passmate.private
      Vpc:
        Ref: PassmateVpcE47AF4D1
    Metadata:
      aws:cdk:path: PassmateStack/Infra/PassmateCluster/DefaultServiceDiscoveryNamespace/Resource

Service template:

KeycloakClusterServiceDC58B5EE:
    Type: AWS::ECS::Service
    Properties:
      Cluster:
        Fn::ImportValue: PassmateStackInfraECB4C246:ExportsOutputRefPassmateCluster89C7783C280B02BA
    ...
      NetworkConfiguration:
        AwsvpcConfiguration:
         ...
      ServiceConnectConfiguration:
        Enabled: true
        Namespace: passmate.private
        Services:
          - ClientAliases:
              - DnsName: keycloak
                Port: 8088
                PortName: keycloak_http
        TaskDefinition:
          Ref: KeycloakClusterTaskDefinitionE5269269

@franck102
Copy link
Author

BTW, if you are building keycloak service on ECS or Fargate, you probably can refer to this aws-sample which does not use cloudmap but JDBC_PING and stickinessCookie instead. It's not perfect but worth a look.

https://github.com/aws-samples/cdk-keycloak/blob/main/src/keycloak.ts

I am using DNS_PING for the cluster itself, I need Cloudmap for Keycloak integration with by backend services (Quarkus admin client). And I am trying to avoid the cost of an internal load balancer...

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 16, 2023
@pahud
Copy link
Contributor

pahud commented May 16, 2023

I think you probably should configure it like this(using nginx as a example):

    const vpc = getOrCreateVpc(this);
    const cluster = new ecs.Cluster(this, `MyCluster${id}`, {
      vpc,
      defaultCloudMapNamespace: {
          name: `${id}-ns`,
          useForServiceConnect: true,
          type: servicediscovery.NamespaceType.DNS_PRIVATE,
          vpc
      },
    });
    const task = new ecs.FargateTaskDefinition(this, 'Task', {
      cpu: 256,
      memoryLimitMiB: 512,
    });
    task.addContainer('nginx', {
      image: ecs.ContainerImage.fromRegistry('nginx'),
      portMappings: [
        { name: 'nginx_http', containerPort: 80 },
      ],
    });
    const service = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition: task,
      serviceConnectConfiguration: {
          services: [{
              portMappingName: "nginx_http",
              dnsName: "nginx",
              port: 80,
          }],
      },
    });
    service.node.addDependency(cluster.defaultCloudMapNamespace!);

defaultCloudMapNamespace creates the default CloudMap Namespace and serviceConnectConfiguration by default uses the default namespace if name is undefined. The addDependency() ensures their dependency.

Let me know if it works for you.

@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 16, 2023
@pahud
Copy link
Contributor

pahud commented May 16, 2023

OK this looks like a random bug.

I deployed 10 times and 5 out of 10 I got duplicate namespaces. This doesn't seem to be a CDK bug but probably ECS or cloudmap. I've reported internally. V907831514

image

@pahud pahud added blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels May 16, 2023
@franck102
Copy link
Author

Any workaround you can suggest?

If I use the CloudMap APIs directly to create the keycloak record, do I need to additionally configure something on the service to enable ServiceConnect?

@franck102
Copy link
Author

franck102 commented May 17, 2023

The duplicate namespace is in fact created by the Infra stack. If I manually delete the spurious "API only" namespace before deploying the service stack, the service deployment fails with this errror:

8:24:56 AM | CREATE_FAILED        | AWS::ECS::Service              | PassmateStack/Keyc...er/Service/Service
Resource handler returned message: "Failed to retrieve namespace for passmate.private (Service: AmazonECS; Status Code: 400; Error Code: ClientException; Request ID: f10b9440-e002-4254-809b-2fcd
913988d9; Proxy: null)" (RequestToken: d9130384-16a7-9dac-dd6a-c83475c10a3d, HandlerErrorCode: GeneralServiceException)
8:24:57 AM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack     | PassmateStackKeycloak9267A274
The following resource(s) failed to create: [KeycloakClusterTaskDefinitionExecutionRoleDefaultPolicyABB500FF, KeycloakClusterServiceDC58B5EE]. Rollback requested by user.
8:24:57 AM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack     | PassmateStackKeycloak9267A274

I can see the namespace in the console, its resource name though is "passmate.private-fm7n2lxfyexstqq5", not passmate.private.
Could it be that cluster.defaultCloudMapNamespace?.namespaceName, which currently contains "passmate.private", should instead contain the resource name "passmate.private-fm7n2lxfyexstqq5" ?

@franck102
Copy link
Author

franck102 commented May 17, 2023

Yep, that seems to be the issue, the service deployment works as expected if I configure service connect by manually specifying the namespace resource name:

this.service.enableServiceConnect({
            namespace: "passmate.private-fm7n2lxfyexstqq5",
            services: [{
                portMappingName: "keycloak_http",
                dnsName: "keycloak",
                port: _keycloakPublicPort,
            }]
        })

Is there any way to obtain that resource name from cdk after deploying the Infra stack but before deploying the service stack?

@franck102
Copy link
Author

The console is a bit of a mess when it comes to namespaces? For the same namespace, I have this in [Amazon Elastic Container Service] -> [Namespaces]:

ARN: arn:aws:servicediscovery:us-east-1:782843724418:namespace/ns-kpfljha2hj2yic3m
Name: passmate.private-fm7n2lxfyexstqq5 

... but in [AWS Cloud Map] -> [Namespaces] I have this instead:

Name:  passmate.private
Namespace ID:  ns-kpfljha2hj2yic3m
HTTP name: passmate.private-fm7n2lxfyexstqq5

So the ECS name becomes the CloudMap HTTP Name, would this be the origin of the bug?

I can't figure out how to lookup the actual namespace from the service stack even though I have its ARN, because PrivateDnsNamespace.fromPrivateDnsNamespaceAttributes wants arn, id and name all supplied.

@franck102
Copy link
Author

Using the ARN to setup the service ServiceConnect options works:

const namespace = this.cluster.addDefaultCloudMapNamespace({
            ...
        });
        this.namespaceArn = namespace.namespaceArn;

then

this.service.enableServiceConnect({
            namespace: props.namespaceArn,
            services: [{
                portMappingName: "keycloak_http",
                dnsName: "keycloak",
                port: _keycloakPublicPort,
            }]
        })

This was painful :(

@franck102
Copy link
Author

It looks like I may have to give up on ServiceConnect... with the steps above the PrivateDnsNamespace has the service, but it doesn't have any DNS record for it.
Some documentation and/or examples would really help...

@omrta-dev
Copy link

omrta-dev commented May 23, 2023

I would like to point out that @franck102 is not the only one facing this issue:

image

As shown in the image I am also getting two namespaces generated while using EC2 Services with very similar timestamps

My CDK code is below:
Cluster:

      defaultCloudMapNamespace: {
        type: NamespaceType.DNS_PUBLIC,
        name: this.domainZone.zoneName,
        useForServiceConnect: true,
      },

ApplicationLoadBalancedEc2Service Construct:

new ApplicationLoadBalancedEc2Service(
      this,
      'test'
      {
        cluster: this.cluster,
        serviceName: nginxPM,
        sslPolicy: SslPolicy.RECOMMENDED_TLS,
        certificate: this.certificate,
        domainName: `nginx.${this.domainZone.zoneName}`,
        domainZone: this.domainZone,
        cloudMapOptions: {
          cloudMapNamespace: this.cluster.defaultCloudMapNamespace,
          name: "nginx-dns",
        },
        targetProtocol: ApplicationProtocol.HTTP,
        taskDefinition: nginx.taskDefinition,
        memoryLimitMiB: 256,
        loadBalancer: this.alb,
        publicLoadBalancer: true,
        redirectHTTP: true,
      }
    );

Ec2Service:

new Ec2Service(
      this.serviceProps.scope,
      `${this.serviceProps.name}Service`,
      {
        cluster: this.serviceProps.cluster,
        taskDefinition: this.getTaskDefinition(),
        cloudMapOptions: {
          cloudMapNamespace: this.serviceProps.cluster.defaultCloudMapNamespace,
          name: `${this.serviceProps.name.toLowerCase()}-dns`,
        },
        enableExecuteCommand: true,
        serviceName: this.serviceProps.name,
      }
    );

Note that I first create the cluster, then use cluster.defaultCloudMapNamespace everywhere, so it should only have created 1 namespace and used 1 namespace, but for some reason I get both a Api Calls + Public DNS and API Calls namespaces.

I've checked the Cloud Formation Template and it also only shows that 1 namespace is getting created at the cft level:

ClusterDefaultServiceDiscoveryNamespace26741755:
    Type: AWS::ServiceDiscovery::PublicDnsNamespace
    Properties:
      Name: <xxxxx>dev.com

The API Call only namespace has the AmazonECSManaged = True Tag set while the other namespace(public DNS + api calls) has the actual records and services added to it.

I also +1 the need for documentation, I've spent many hours trying to figure out how service discovery, cloudmap, and service connect all fit into each other and how it all works with the CDK.

@chessels
Copy link

This question on StackOverflow explains a lot about how Service Connect works. Apparently Service Connect isn't meant to be used together with DNS.

@Seiya6329
Copy link
Contributor

Thanks for reporting a bug and we apologize for any inconvenience caused by an issue.

We are investigating this issue and will update this thread once we have a findings.
ETA for next update: 6/7 (Wed)

@bvtujo
Copy link
Contributor

bvtujo commented Jun 9, 2023

This issue looks like it's caused by Service Connect's default behavior of creating a namespace with the given name if one does not exist.

To mitigate this issue, I've cut a PR (#25891) which implicitly creates a dependency between the default Cloudmap namespace and the cluster by using the namespace ARN instead of it's string id. This will avoid the problem at cluster creation.

It's still possible to create a namespace accidentally by passing the string name to enableServiceConnect() on an ecs.*Service construct. I'd recommend that in this case you pass the namespace by ARN (namespace.namespaceArn) to create a dependency, or add an explicit dependency between the namespace and the service. For example:

declare const ns cloudmap.INamespace; // example.com
declare const svc ecs.FargateService;

svc.node.addDependency(ns);
svc.enableServiceConnect({
  namespace: 'example.com',
});

A good next step to stop this from happening in the future altogether would be to deprecate the namespace: string field in favor of a new cloudMapNamespace: INamespace field, allowing us to resolve this issue completely within the construct.

@mergify mergify bot closed this as completed in #25891 Jun 13, 2023
mergify bot pushed a commit that referenced this issue Jun 13, 2023
…ervice connect (#25891)

This PR should fix #25616, where service connect accidentally creates a duplicate HTTP namespace when a customer sets a service connect default namespace on the cluster. 

Closes #25616 

However, I think that a broader fix for this issue should include deprecation of the `namespace` parameter in `ServiceConnectProps` in favor of a `cloudmapNamespace: INamespace` parameter; that way, we can force resolution by ARN under the hood of the construct and never trigger the namespace duplication behavior. 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container blocked Work is blocked on this issue for this codebase. Other labels or comments may indicate why. bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants