[exporter/datadog] source provider loading takes too long and timeouts lambda initialization #16442

RangelReale · 2022-11-22T21:08:51Z

Component(s)

exporter/datadog

What happened?

Description

When using the Datadog exporter in a lambda environment, I get these messages on both the trace and metrics initialization:

2022-11-22T20:57:53.513Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "config",
    "error": "empty configuration hostname"
}
2022-11-22T20:57:54.573Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "azure",
    "error": "failed to query Azure IMDS: Get \"http://169.254.169.254/metadata/instance/compute?api-version=2020-09-01&format=json\": dial tcp 169.254.169.254:80: connect: connection refused"
}
2022-11-22T20:57:54.573Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "ecs",
    "error": "not running on ECS Fargate"
}

Then the lambda initialization times out, and never starts successfully.

If I disable one of the exporter, for example, the traces one, leaving only the metrics one enabled, then it takes less time and ends up being able to initialize.

I already use a resource detector, why does this exporters needs to detect things by itself?

Collector version

v0.64.1

Environment information

Environment

OS: Debian Bullseye
Compiler: go 1.17

OpenTelemetry Collector configuration

No response

Log output

2022-11-22T20:57:53.513Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "config",
    "error": "empty configuration hostname"
}
2022-11-22T20:57:54.573Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "azure",
    "error": "failed to query Azure IMDS: Get \"http://169.254.169.254/metadata/instance/compute?api-version=2020-09-01&format=json\": dial tcp 169.254.169.254:80: connect: connection refused"
}
2022-11-22T20:57:54.573Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "ecs",
    "error": "not running on ECS Fargate"
}
2022-11-22T20:30:46.545Z	debug	ec2/ec2.go:67	EC2 Metadata not available	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog"
}
2022-11-22T20:30:46.545Z	debug	provider/provider.go:43	Unavailable source provider	
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "datadog",
    "provider": "ec2",
    "error": "instance ID is unavailable"
}

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2022-11-22T21:09:13Z

Pinging code owners:

exporter/datadog: @KSerrania @mx-psi @gbbr @knusbaum @amenasria @dineshg13

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mx-psi · 2022-11-23T10:36:36Z

Thanks for reporting @RangelReale! A workaround for this is to set the hostname setting to hardcode the fallback hostname to a custom value and prevent the source provider from running, but we acknowledge that the current running time can be too long in some setups. Could I ask what lambda environment are you using?

mx-psi · 2022-11-23T10:44:16Z

I already use a resource detector, why does this exporters needs to detect things by itself?

To expand on this: in the general case we don't know what your pipeline looks like and how your data is transformed before reaching the Datadog exporter. Since a missing source identifier (be it a hostname or a task id) results in incomplete data and a bad experience we want to have a fallback value so that if there is no resource processor or it is misconfigured, we can still add some id to your metrics/traces/logs. There is some work we can do here to improve speed, but we almost always need to run it at some point, no matter what your pipeline looks like.

RangelReale · 2022-11-23T12:09:25Z

Could I ask what lambda environment are you using?
I'm running Go and Python services in docker images, using provided.al2.

RangelReale · 2022-11-23T12:11:09Z

I already use a resource detector, why does this exporters needs to detect things by itself?

To expand on this: in the general case we don't know what your pipeline looks like and how your data is transformed before reaching the Datadog exporter. Since a missing source identifier (be it a hostname or a task id) results in incomplete data and a bad experience we want to have a fallback value so that if there is no resource processor or it is misconfigured, we can still add some id to your metrics/traces/logs. There is some work we can do here to improve speed, but we almost always need to run it at some point, no matter what your pipeline looks like.

Probably there should be a configuration on what source providers you want to run, or a client-side resource detector that adds these fields, and the exporter checks to see if these fields are available.

mx-psi · 2022-11-23T12:44:52Z

Probably there should be a configuration on what source providers you want to run, or a client-side resource detector that adds these fields, and the exporter checks to see if these fields are available.

That is a reasonable suggestion, but we first need to figure out how to provide this flexibility in a way that does not have lots of papercuts. For now, setting the hostname option will skip the source resolution process entirely, so it should be an acceptable workaround.

kedare · 2023-01-10T14:58:53Z

Does it makes sense to log those as debug level ?
It's quite impacting and cause the liveness/readiness probes to fail on the opentelemetry helm chart for example (and no clue why without forcing the debug logs)

kedare · 2023-01-10T15:42:42Z

Also it may be interesting to be able to force a specific way to get the host metadata (I would like to force to take it via the kubernetes API for example), it doesn't looks like this is possible right now

github-actions · 2023-03-13T03:31:13Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/datadog: @mx-psi @gbbr @dineshg13 @liustanley @songy23

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2023-05-15T03:29:53Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/datadog: @mx-psi @gbbr @dineshg13 @liustanley @songy23

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Make Datadog exporter source providers run in parallel to reduce start times. With the new `Chain` implementation, we start checking all sources in parallel instead of waiting for the previous one to fail. This makes the Datadog exporter call all cloud provider endpoints in all cloud providers, so it may increase spurious logs such as those reported in #24072. **Link to tracking Issue:** Updates #16442 (at least it should substantially improve start time in some environments) --------- Co-authored-by: Yang Song <[email protected]> Co-authored-by: Alex Boten <[email protected]>

github-actions · 2023-07-17T03:35:01Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/datadog: @mx-psi @gbbr @dineshg13 @liustanley @songy23

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mackjmr · 2023-07-17T08:39:20Z

Looks like the following PR: #24234 addresses this, so updating to 0.82.0 should reduce the start time. @mx-psi is this correct ?

mx-psi · 2023-07-17T08:44:03Z

Thanks @mackjmr this PR should improve the situation indeed. We will still keep an eye to see if the start time is reasonable after the change in all environments.

github-actions · 2023-09-18T03:29:37Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/datadog: @mx-psi @gbbr @dineshg13 @liustanley @songy23 @mackjmr

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mx-psi · 2023-09-18T08:03:20Z

We have had user reports stating that #24234 significantly improved start times. I will therefore go ahead and close this; if you run into this issue again feel free to comment so that we can reopen

RangelReale added bug Something isn't working needs triage New item requiring triage labels Nov 22, 2022

github-actions bot added the exporter/datadog Datadog components label Nov 22, 2022

mx-psi added priority:p2 Medium and removed needs triage New item requiring triage labels Nov 23, 2022

kedare mentioned this issue Jan 10, 2023

Potential fixes for datadog exporter (very long initialization) open-telemetry/opentelemetry-helm-charts#592

Closed

github-actions bot added the Stale label Mar 13, 2023

mx-psi removed the Stale label Mar 13, 2023

github-actions bot added the Stale label May 15, 2023

mx-psi removed the Stale label May 15, 2023

This was referenced Jul 11, 2023

aws ressource detectors are excuted even if not configured #24072

Closed

[exporter/datadog] Run source providers in parallel #24234

Merged

github-actions bot added the Stale label Jul 17, 2023

mx-psi removed the Stale label Jul 17, 2023

github-actions bot added the Stale label Sep 18, 2023

mx-psi closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/datadog] source provider loading takes too long and timeouts lambda initialization #16442

[exporter/datadog] source provider loading takes too long and timeouts lambda initialization #16442

RangelReale commented Nov 22, 2022 •

edited

Loading

github-actions bot commented Nov 22, 2022

mx-psi commented Nov 23, 2022

mx-psi commented Nov 23, 2022 •

edited

Loading

RangelReale commented Nov 23, 2022

RangelReale commented Nov 23, 2022

mx-psi commented Nov 23, 2022

kedare commented Jan 10, 2023

kedare commented Jan 10, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented May 15, 2023

github-actions bot commented Jul 17, 2023

mackjmr commented Jul 17, 2023

mx-psi commented Jul 17, 2023

github-actions bot commented Sep 18, 2023

mx-psi commented Sep 18, 2023

[exporter/datadog] source provider loading takes too long and timeouts lambda initialization #16442

[exporter/datadog] source provider loading takes too long and timeouts lambda initialization #16442

Comments

RangelReale commented Nov 22, 2022 • edited Loading

Component(s)

What happened?

Description

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Nov 22, 2022

mx-psi commented Nov 23, 2022

mx-psi commented Nov 23, 2022 • edited Loading

RangelReale commented Nov 23, 2022

RangelReale commented Nov 23, 2022

mx-psi commented Nov 23, 2022

kedare commented Jan 10, 2023

kedare commented Jan 10, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented May 15, 2023

github-actions bot commented Jul 17, 2023

mackjmr commented Jul 17, 2023

mx-psi commented Jul 17, 2023

github-actions bot commented Sep 18, 2023

mx-psi commented Sep 18, 2023

RangelReale commented Nov 22, 2022 •

edited

Loading

mx-psi commented Nov 23, 2022 •

edited

Loading