Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/loadbalancing] Adding AWS Cloud Map for Load Balancer Exporter discovery #27588

Merged

Conversation

andretong
Copy link
Contributor

Description:
Implementation of 27241

Adding AWS Service discovery system (CloudMap) support to resolve the Collector's Backend.
This would allow users to use this exporter when there is the case of using ECS over EKS in an AWS infrastructure.

Link to tracking Issue:

Testing:

Setting the following configuration with the proper values, we can start the Load Balancer which will connect to the AWS CloudMap service and start sending traces to the available collectors.

exporters:
  loadbalancing:
    protocol:
      otlp:
        timeout: 3s
    resolver:
      awsCloudMap:
        namespace: <aws-namespace>
        serviceName: <aws-otel-col-service-name>
        interval: <interval-in-seconds>

Documentation:

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 10, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions bot requested a review from jpkrohling October 10, 2023 12:55
@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch from 436e611 to 56a9501 Compare October 10, 2023 12:59
@andretong andretong marked this pull request as ready for review October 10, 2023 13:03
@andretong andretong requested a review from a team October 10, 2023 13:03
@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 2 times, most recently from 10bedf5 to 00aa631 Compare October 10, 2023 15:29
@@ -53,7 +53,7 @@ This also supports service name based exporting for traces. If you have two or m
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor.

* The `otlp` property configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that the `endpoint` property should not be set and will be overridden by this exporter with the backend endpoint.
* The `resolver` accepts a `static` node, a `dns` or a `k8s` service. If all three are specified, `k8s` takes precedence.
* The `resolver` accepts a `static` node, a `dns`, a `k8s` service or `awsCloudMap`. If all four are specified, `k8s` takes precedence.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:
Define what to do in this case since currently when static and dns are defined, then throws a errMultipleResolversProvided

if oCfg.Resolver.DNS != nil && oCfg.Resolver.Static != nil {
return nil, errMultipleResolversProvided
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to return an error, and users should have this expectation as well. Can you please change this line in the doc to reflect what the code does?

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 2 times, most recently from 9575682 to caa3636 Compare October 14, 2023 17:54
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you for your PR. @Aneurysm9, @bryan-aguilar, would someone from AWS be available to review this PR?

@@ -53,7 +53,7 @@ This also supports service name based exporting for traces. If you have two or m
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor.

* The `otlp` property configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that the `endpoint` property should not be set and will be overridden by this exporter with the backend endpoint.
* The `resolver` accepts a `static` node, a `dns` or a `k8s` service. If all three are specified, `k8s` takes precedence.
* The `resolver` accepts a `static` node, a `dns`, a `k8s` service or `awsCloudMap`. If all four are specified, `k8s` takes precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to return an error, and users should have this expectation as well. Can you please change this line in the doc to reflect what the code does?

* `namespace` The CloudMap namespace where the service is register, e.g. `cloudmap`. If no `namespace` is specified, this will fail to start the Load Balancer exporter.
* `serviceName` The name of the service that you specified when you registered the instance, e.g. `otelcollectors`. If no `serviceName` is specified, this will fail to start the Load Balancer exporter.
* `interval` resolver interval in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `30s` will be used.
* `timeout` resolver timeout in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `1s` will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 1s a reasonable value here? If the default interval is 30s, the timeout can certainly be at around 5s, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes all the sense, I've updated this property and documentation

@@ -63,6 +63,12 @@ Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using th
* The `k8s` node accepts the following optional properties:
* `service` Kubernetes service to resolve, e.g. `lb-svc.lb-ns`. If no namespace is specified, an attempt will be made to infer the namespace for this collector, and if this fails it will fall back to the `default` namespace.
* `ports` port to be used for exporting the traces to the addresses resolved from `service`. If `ports` is not specified, the default port 4317 is used. When multiple ports are specified, two backends are added to the load balancer as if they were at different pods.
* The `awsCloudMap` node accepts the following properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know nothing about AWS Cloud Map, is there anything users have to be aware before configuring it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only relevant topic I can think of is that usage of the SDK hits the API every polling interval, so users have to take into account the AWS API Rate limits in their configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this comment be added to the readme? Or is it so obvious that it doesn't have to?

* `serviceName` The name of the service that you specified when you registered the instance, e.g. `otelcollectors`. If no `serviceName` is specified, this will fail to start the Load Balancer exporter.
* `interval` resolver interval in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `30s` will be used.
* `timeout` resolver timeout in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `1s` will be used.
* `port` port to be used for exporting the traces to the addresses resolved from `service`. By default, the port is set in Cloud Map, but can be be overridden with a static value in this config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a healthStatus property as well. Should it be documented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this property, in this initial iteration, I'd rather to leave this config as default since we filter only for Healthy instances

Reference: https://docs.aws.amazon.com/cloud-map/latest/api/API_GetInstancesHealthStatus.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but the value the user is providing is indeed being used, and therefore, deserves to be documented.

@@ -52,7 +52,20 @@ type loadBalancerImp struct {
func newLoadBalancer(params exporter.CreateSettings, cfg component.Config, factory componentFactory) (*loadBalancerImp, error) {
oCfg := cfg.(*Config)

if oCfg.Resolver.DNS != nil && oCfg.Resolver.Static != nil {
var count = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, and I like that it provides a short-circuit feature to the function. But I propose two alternatives:

  1. externalize this into a func checkMultipleResolversProvided(Config) error, and call this function here. It should make your code easier to test and leave the existing function more readable.
  2. alternatively, you can create a boolean configured, and set it to true whenever a resolver is not nil. On the next resolver, you check if configured is true and return the errMultipleResolversProvided error. You won't have the fail-fast behavior, though.

exporter/loadbalancingexporter/resolver_aws_cloudmap.go Outdated Show resolved Hide resolved

func (r *cloudMapResolver) start(ctx context.Context) error {
if _, err := r.resolve(ctx); err != nil {
r.logger.Warn("Failed initial resolve", zap.Error(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r.logger.Warn("Failed initial resolve", zap.Error(err))
r.logger.Warn("failed initial resolve", zap.Error(err))


_ = stats.RecordWithTags(ctx, awsResolverSuccessTrueMutators, mNumResolutions.M(1))

r.logger.Debug("DiscoverInstance",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves a better message.

} else {
endpoint = fmt.Sprintf("%s:%d", ipAddr, *r.port)
}
r.logger.Debug("Resolved instance",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r.logger.Debug("Resolved instance",
r.logger.Debug("resolved instance",

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 3 times, most recently from 62747e7 to e7f13f2 Compare November 7, 2023 09:15
@github-actions github-actions bot added the cmd/configschema configschema command label Nov 7, 2023
@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch from e7f13f2 to 573268f Compare November 7, 2023 09:17
@0x006EA1E5
Copy link

Hey, this looks great. Can we expect this to be available in the next release?


func createDiscoveryFunction(client *servicediscovery.Client) func(params *servicediscovery.DiscoverInstancesInput) (*servicediscovery.DiscoverInstancesOutput, error) {
return func(params *servicediscovery.DiscoverInstancesInput) (*servicediscovery.DiscoverInstancesOutput, error) {
return client.DiscoverInstances(context.TODO(), params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was brought to my attention that this returns "only" the first 100 hosts, and that the results to this API are paginated. Can you handle this in this PR? If you prefer to handle that in a future PR, please open an issue and leave a reference to it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific case, I'll rather add this pagination as a new iteration.
Will raise the issue as requested.

Thanks for the review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue raised described to handle this limitation. #29771

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to know whether there are more than 100 hosts available? If so, perhaps it could warn a message in the logs?

@@ -198,3 +208,27 @@ func (lb *loadBalancerImp) Exporter(endpoint string) (component.Component, error

return exp, nil
}

func checkMultipleResolversProvided(cfg component.Config) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely comfortable with this change in this PR. I think it has to be on its own PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll raise a new PR once this is merge with the increment.

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 2 times, most recently from a065ed4 to 64f1eaf Compare November 30, 2023 14:48
@jpkrohling
Copy link
Member

@andretong, once this is ready for another review, let me know but note that I'm unable to do so this year. I'm back on Jan 15, please ping me again on Jan 18 if I haven't reviewed this here by then.

@andretong
Copy link
Contributor Author

@andretong, once this is ready for another review, let me know but note that I'm unable to do so this year. I'm back on Jan 15, please ping me again on Jan 18 if I haven't reviewed this here by then.

@jpkrohling thanks for the notice.
All the suggestions were applied and waiting for a review, looking forward to get back with you regarding this feature.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 3 times, most recently from bee7b81 to 840f2ef Compare March 12, 2024 09:08
@andretong
Copy link
Contributor Author

Another failure :-/

Error: exporter/awskinesisexporter/exporter.go:10:2: missing go.sum entry for module providing package github.com/aws/aws-sdk-go-v2/aws (imported by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awskinesisexporter); to add:
	go get github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]

@jpkrohling I think I've fixed the errors, can you run the pipeline please?

I've check locally and the distribution is built when running make otelcontribcol

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 4 times, most recently from 51cb8ef to c9776bf Compare March 13, 2024 07:34
@andretong
Copy link
Contributor Author

image
@jpkrohling I've uploaded the missing artifact in the last commit.

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch 2 times, most recently from a2fa3b8 to 70eddfa Compare March 13, 2024 13:17
@andretong
Copy link
Contributor Author

@jpkrohling seems is OK but a need rebase was needed :(
Screenshot 2024-03-13 at 14 16 50

@andretong
Copy link
Contributor Author

@jpkrohling all checks have pass :D

@andretong andretong force-pushed the feature/loadbalancingexporter_aws_support branch from 70eddfa to 0ac5581 Compare March 13, 2024 18:24
@mx-psi
Copy link
Member

mx-psi commented Mar 13, 2024

Merging based on #27588 (comment) cc @jpkrohling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd/configschema configschema command cmd/otelcontribcol otelcontribcol command exporter/loadbalancing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants