-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Remove waiting for backend response of receiving traces/metrics from critical path of lambda returning to user #812
Comments
Would this also work for metrics or only spans? Metrics don't seem to have an easy way to flush on every call. |
@RangelReale sorry for the delayed response. My assumption is that it would be both. My only experience is with the prometheusremotewrite metrics stuff which has a way to turn off queueing. My assumption based on my understanding of the design doc is that the intention is that writing out traces and metrics is supposed to be synchronous from the lambda, to the collector, to the destinations. Hence my hesitancy as this could add a good amount of latency to the response times of the lambdas |
Thanks for bringing that to my attention, I'll have to take a look at the PR. My understanding is that the ADOT layer is a packaging of the otel collector that registers itself as an extention with the lambda system. So I think its basically one and the same |
Yes PR #959 covers this by adding a new processor, decouple. When used this allows the lambda to send flush any pending trace/metrics/logs and return while the otel collector (which runs as lambda extension) carries on running to send the remaining data before allowing the lambda environment to process the next request or be frozen. |
@adcharre for imlementing this. I'm just getting back into investigating performance now and looked at you PR and it does indeed look like what I was hoping for as a resolution to this issue 🎉 , so I'll close this issue |
Is your feature request related to a problem? Please describe.
Currently the way the collector works according to the design docs (as referenced in this comment), the lambda execution must block on the force flush call while the collector pushes all the data to its backends and waits for responses. Depending on the backends and the execution time of the lambda this could significantly increase the response time of the lambda to the caller of said lambda.
Describe the solution you'd like
Lambda Extensions are allowed to keep running after the main runtime process has finished executing and returned its response to the lambda system (see this diagram. It seems like it would be ideal if the force flush from the main lambda process could kick off the process of sending the data, but we could wait for the response from all the backends after the main lambda code has finished executing. This would look something like having some additional code after this statement in the main event loop that would be able to block until those responses are received. The loop would then go around to asking for the next event from the extensions API indicating that the extension has finished executing.
To be fair, I have not investigated further into the collector code to see how easy/hard this would be to actually implement.
The text was updated successfully, but these errors were encountered: