Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AWS Cloud Map as service-discovery #53

Merged
merged 7 commits into from
Sep 5, 2019

Conversation

kiranmeduri
Copy link
Collaborator

Issue #, if available:
#52

Description of changes:
Add support for AWS Cloud Map as service-discovery

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kiranmeduri kiranmeduri force-pushed the cloudmap branch 8 times, most recently from 1027e64 to 444d6e3 Compare August 9, 2019 18:29
@kiranmeduri kiranmeduri marked this pull request as ready for review August 9, 2019 18:48
- Updated virtual-node controller to populate CloudMap service-discovery
and create service in CloudMap
- New pod controller to register/deregister endpoints behind cloudmap
service
AttrK8sPod = "k8s.io/pod"
//AttrK8sNamespace is a custom attribute injected by app-mesh controller
AttrK8sNamespace = "k8s.io/namespace"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some valuable instance (cloudmap) instance metadata would include:

  1. Region & AZ
  2. Subnet
  3. VPC
  4. Cluster Name
  5. EC2 instance ID
  6. Owners? I.e. following owner refs to get deployment/etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html

👍 for REGION and AVAILABILITY_ZONE. This will be helpful for AZ-aware routing: aws/aws-app-mesh-roadmap#94

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are useful. Should we look into adding them as labels to pods and let register instance take over from there?

Copy link
Contributor

@nckturner nckturner Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see those labels being useful for pods--how does the cluster admin decide which pods get them. All pods that are injected for App Mesh? We could do this at inject time, which involves adding API calls into the pod startup path (but is probably fine), or we could do it late by the controller.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we can lookup the node corresponding to pod and get that information. Below is the information from node. However, controller needs permissions to get node info. We can follow up this work in upcoming versions.

labels:
    alpha.eksctl.io/cluster-name: cluster-20190821
    alpha.eksctl.io/instance-id: i-08dd903c3cbe5db66
    alpha.eksctl.io/nodegroup-name: m5-l-1
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: m5.large
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: us-west-2
    failure-domain.beta.kubernetes.io/zone: us-west-2a
    kubernetes.io/hostname: ip-192-168-118-179.us-west-2.compute.internal
    role: workers

pkg/controller/pod.go Outdated Show resolved Hide resolved
pkg/controller/pod.go Outdated Show resolved Hide resolved
var meshName string
var virtualNodeName string

//TODO: Remove this hack and always expect proper annotations to be injected into the pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this comment as a reminder for one of us to create an issue in inject.

c.handleServiceDiscovery(ctx, vnode, copyForUpdate)
}

//TODO: delete unused cloudmap services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving comment as a reminder to discuss this/create issue.


//It is okay to call Create multiple times for same service-name.
//It is also cheaper than calling get and then figuring out to create.
cloudmapService, err := c.cloud.CloudMapCreateService(ctx, cloudmapConfig, c.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the case where the cloudmap namespace doesn't exist? I guess its probably evident with the error returned from the cloudmap create service call, which is already logged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should but creating namespace on the fly seems bad. CloudMap namespace should be created by admin, same can be said of mesh though. For now I will add a TODO to improve the experience.

Copy link
Contributor

@nckturner nckturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a situation where things don't reconcile properly, or we are ignoring my namespace for some reason. I had example items created before updating to a controller built from this branch. I updated to the new controller, modified a virtual node to specify service discovery == cloudmap. The controller prints cloudmap.go:273] No namespace found with name color-mesh, and then even after creating the cloudmap namespace, and redeploying the controller, I'm seeing the same error.

Update: this was just me not having the correct cloudmap permissions on my nodes, which I realized after far too much debugging.

pkg/aws/appmesh.go Outdated Show resolved Hide resolved
DnsConfig: &servicediscovery.DnsConfig{
NamespaceId: awssdk.String(namespaceSummary.NamespaceID),
RoutingPolicy: awssdk.String(servicediscovery.RoutingPolicyMultivalue),
DnsRecords: []*servicediscovery.DnsRecord{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our meeting with Cloud Map, it was recommended that if we are creating namespaces for users we should default to HTTP only and allow the user to request DNS. Although we aren't creating namespaces for users, if the namespace that the user provides is not DNS enabled, is this field ignored?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add this conditionally, I am getting this error:

E0903 15:44:25.848882       1 controller.go:443] error syncing 'appmesh-demo/colorteller-red': Error handling cloudmap service discovery for virtual node colorteller-red-appmesh-demo: InvalidInput: When you create a service using a namespace that has a type of HTTP, you can't include a DnsConfig element.
        status code: 400, request id: 8c76f6af-1c79-4aa7-a24a-a2b2c3de0e5b

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP namespace will not work with AppMesh because those service-names cannot be resolved by VPC DNS. In the current setup, application container still resolves the name using DNS resolver which gives an endpoint that is ignored by Envoy and further resolved. envoyproxy/envoy#6748 will help in avoiding DNS altogether in future. Until then we should be using PrivateDnsNamespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok we can call this out as its a little confusing. Arguably, though, as long as the application gets some DNS response, then does everything work fine? Whether its kube dns, route53, or some custom DNS solution the customer is running.

Copy link
Contributor

@lavignes lavignes Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nckturner

as long as the application gets some DNS response, then does everything work fine

Yeah that is case right now. We're also looking into adding some work-around we can give customers since some people may not be allowed to create a PrivateDnsNamespace.

Copy link
Collaborator Author

@kiranmeduri kiranmeduri Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR with better error message when provided namespace is non DNS_PRIVATE. Marked it with a TODO to support HTTP namespaces. That should unblock this PR.

E0905 19:34:36.794398       1 controller.go:443] error syncing 'howto-k8s-cloudmap/colorapp-red': Error handling cloudmap service discovery for virtual node colorapp-red-howto-k8s-cloudmap: Cannot create service under namespace howto-k8s-cloudmap.pvt.aws.local with type HTTP, only namespaces with type DNS_PRIVATE are supported

@kiranmeduri
Copy link
Collaborator Author

I think there is a situation where things don't reconcile properly, or we are ignoring my namespace for some reason. I had example items created before updating to a controller built from this branch. I updated to the new controller, modified a virtual node to specify service discovery == cloudmap. The controller prints cloudmap.go:273] No namespace found with name color-mesh, and then even after creating the cloudmap namespace, and redeploying the controller, I'm seeing the same error.

Update: this was just me not having the correct cloudmap permissions on my nodes, which I realized after far too much debugging.

I will fix the error handling. We should also update documentation so that users do not trip on this. Also worth to look into https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants