Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812

cgilling · 2023-07-21T20:14:39Z

Is your feature request related to a problem? Please describe.

Currently the way the collector works according to the design docs (as referenced in this comment), the lambda execution must block on the force flush call while the collector pushes all the data to its backends and waits for responses. Depending on the backends and the execution time of the lambda this could significantly increase the response time of the lambda to the caller of said lambda.

Describe the solution you'd like

Lambda Extensions are allowed to keep running after the main runtime process has finished executing and returned its response to the lambda system (see this diagram. It seems like it would be ideal if the force flush from the main lambda process could kick off the process of sending the data, but we could wait for the response from all the backends after the main lambda code has finished executing. This would look something like having some additional code after this statement in the main event loop that would be able to block until those responses are received. The loop would then go around to asking for the next event from the extensions API indicating that the extension has finished executing.

To be fair, I have not investigated further into the collector code to see how easy/hard this would be to actually implement.

RangelReale · 2023-08-07T14:34:52Z

Would this also work for metrics or only spans? Metrics don't seem to have an easy way to flush on every call.

cgilling · 2023-09-19T02:43:00Z

@RangelReale sorry for the delayed response. My assumption is that it would be both. My only experience is with the prometheusremotewrite metrics stuff which has a way to turn off queueing. My assumption based on my understanding of the design doc is that the intention is that writing out traces and metrics is supposed to be synchronous from the lambda, to the collector, to the destinations. Hence my hesitancy as this could add a good amount of latency to the response times of the lambdas

lukep-coxauto · 2023-11-06T20:02:28Z

@cgilling we've been watching for something similar, and this PR just came through: #959 . It sounds like it may do a lot of what you want. Are there nuances to what you want that aren't solved by that PR?

lukep-coxauto · 2023-11-06T20:03:44Z

@adcharre does your PR address this? One difference I see here is that @cgilling mentioned the need to do it as a Lambda Extension (as apposed to a Lambda Layer) which is what I think this otel collector is. Was it not necessary in the end to do it as an Extension?

cgilling · 2023-11-06T20:26:26Z

Thanks for bringing that to my attention, I'll have to take a look at the PR. My understanding is that the ADOT layer is a packaging of the otel collector that registers itself as an extention with the lambda system. So I think its basically one and the same

adcharre · 2023-11-07T10:28:18Z

@adcharre does your PR address this? One difference I see here is that @cgilling mentioned the need to do it as a Lambda Extension (as apposed to a Lambda Layer) which is what I think this otel collector is. Was it not necessary in the end to do it as an Extension?

Yes PR #959 covers this by adding a new processor, decouple. When used this allows the lambda to send flush any pending trace/metrics/logs and return while the otel collector (which runs as lambda extension) carries on running to send the remaining data before allowing the lambda environment to process the next request or be frozen.

cgilling · 2024-05-01T21:20:34Z

@adcharre for imlementing this. I'm just getting back into investigating performance now and looked at you PR and it does indeed look like what I was hoping for as a resolution to this issue 🎉 , so I'll close this issue

cgilling added the enhancement New feature or request label Jul 21, 2023

cgilling closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812

Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812

cgilling commented Jul 21, 2023 •

edited

Loading

RangelReale commented Aug 7, 2023

cgilling commented Sep 19, 2023

lukep-coxauto commented Nov 6, 2023

lukep-coxauto commented Nov 6, 2023

cgilling commented Nov 6, 2023

adcharre commented Nov 7, 2023

cgilling commented May 1, 2024

Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812

Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812

Comments

cgilling commented Jul 21, 2023 • edited Loading

RangelReale commented Aug 7, 2023

cgilling commented Sep 19, 2023

lukep-coxauto commented Nov 6, 2023

lukep-coxauto commented Nov 6, 2023

cgilling commented Nov 6, 2023

adcharre commented Nov 7, 2023

cgilling commented May 1, 2024

cgilling commented Jul 21, 2023 •

edited

Loading