Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812

Closed
cgilling opened this issue Jul 21, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@cgilling
Copy link

cgilling commented Jul 21, 2023

Is your feature request related to a problem? Please describe.

Currently the way the collector works according to the design docs (as referenced in this comment), the lambda execution must block on the force flush call while the collector pushes all the data to its backends and waits for responses. Depending on the backends and the execution time of the lambda this could significantly increase the response time of the lambda to the caller of said lambda.

Describe the solution you'd like

Lambda Extensions are allowed to keep running after the main runtime process has finished executing and returned its response to the lambda system (see this diagram. It seems like it would be ideal if the force flush from the main lambda process could kick off the process of sending the data, but we could wait for the response from all the backends after the main lambda code has finished executing. This would look something like having some additional code after this statement in the main event loop that would be able to block until those responses are received. The loop would then go around to asking for the next event from the extensions API indicating that the extension has finished executing.

To be fair, I have not investigated further into the collector code to see how easy/hard this would be to actually implement.

@cgilling cgilling added the enhancement New feature or request label Jul 21, 2023
@RangelReale
Copy link

Would this also work for metrics or only spans? Metrics don't seem to have an easy way to flush on every call.

@cgilling
Copy link
Author

@RangelReale sorry for the delayed response. My assumption is that it would be both. My only experience is with the prometheusremotewrite metrics stuff which has a way to turn off queueing. My assumption based on my understanding of the design doc is that the intention is that writing out traces and metrics is supposed to be synchronous from the lambda, to the collector, to the destinations. Hence my hesitancy as this could add a good amount of latency to the response times of the lambdas

@lukep-coxauto
Copy link

@cgilling we've been watching for something similar, and this PR just came through: #959 . It sounds like it may do a lot of what you want. Are there nuances to what you want that aren't solved by that PR?

@lukep-coxauto
Copy link

@adcharre does your PR address this? One difference I see here is that @cgilling mentioned the need to do it as a Lambda Extension (as apposed to a Lambda Layer) which is what I think this otel collector is. Was it not necessary in the end to do it as an Extension?

@cgilling
Copy link
Author

cgilling commented Nov 6, 2023

Thanks for bringing that to my attention, I'll have to take a look at the PR. My understanding is that the ADOT layer is a packaging of the otel collector that registers itself as an extention with the lambda system. So I think its basically one and the same

@adcharre
Copy link
Contributor

adcharre commented Nov 7, 2023

@adcharre does your PR address this? One difference I see here is that @cgilling mentioned the need to do it as a Lambda Extension (as apposed to a Lambda Layer) which is what I think this otel collector is. Was it not necessary in the end to do it as an Extension?

Yes PR #959 covers this by adding a new processor, decouple. When used this allows the lambda to send flush any pending trace/metrics/logs and return while the otel collector (which runs as lambda extension) carries on running to send the remaining data before allowing the lambda environment to process the next request or be frozen.

@cgilling
Copy link
Author

cgilling commented May 1, 2024

@adcharre for imlementing this. I'm just getting back into investigating performance now and looked at you PR and it does indeed look like what I was hoping for as a resolution to this issue 🎉 , so I'll close this issue

@cgilling cgilling closed this as completed May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants