-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing spans from lambda #201
Comments
Related issue: aws-observability/aws-otel-lambda#203 (comment) |
Hi @Ancient-Dragon, thanks for raising this issue. I would start by asking what is your system's environment before and after Lambda? For example do you have Next, when you use the NodeJS layer, do you use any custom Collector configuration or configure OpenTelemetry in any way other than the auto instrumentation method we provide with the layer? OpenTelemetry JS should by default be using the Finally, does your Lambda Layer have active tracing enabled or is it disabled? This decides the value of I'm curious if this is related to open-telemetry/opentelemetry-python-contrib#649 (comment)? In that issue a user was finding their traces were not being propagated. This was because the trace header (which Please let me know if this makes sense and if you know the answer to my questions above! |
Hi @NathanielRN. Before the lambda we have a aws sqs queue and nothing after it (the point fo this lambda is to read the messages and create a span with an event in open-telemetry. Full flow is dynamodb -> lambda -> SQS -> lambda (this is the one failing) I linked the issue we posted in the aws repository which has a zip of our code which is failing, it also contains a collector.yaml file which looks like:
As far as I know we are using the parentbased_alwayson sampler. We haven't done anything specific to change it ourselves. We haven't done any configuration to the lambda layer ourselves (it is the one provided by aws). The only config is what we have above, so I would assume that active tracing is enabled. Especially given the 950/1000 have it enabled it is just the odd one that invocation of the lambda that doesn't have it enabled. Currently none of our traces are joined up (we're not entirely sure why but they don't propegate through sns/sqs so we are in the process of implmenting a work around but we haven't got to either this function or the function that puts messages into the sqs before this function. So I wouldn't expect it to be sending any headers that might cause it like the issue you posted. Unforunately I can't find a way to find what is the current sampler, could you point me in the right direction if you still think this is the cause of the issue? |
Thanks for your reply! I reviewed the PDF you provided and it's super clear about what the problem is. Thought Process
It would be a good idea to check this is set up in your Lambda function. Enabling active tracing is one of the steps we call out specifically in the ADOT Documentation for the NodeJS Layer in Step 5. This is a very important step, because by default it is turned off. With active tracing "off" the I recognize that you said
So if they are arriving at HoneyComb (as per the linked issue in However, I know you said the
If However, even if It would be good to ask if there is any chance that the Lambda is being destroyed and created again through terraform or CloudFormation templatesn which could cause active tracing to be turned off? But I assume not.
If you aren't trying to change it then you probably have the const tracer = api.trace.getTracer('example-basic-tracer-node');
console.log(`Tracer sampler information: ${tracer['_sampler'].toString()}`) in your Lambda function which outputs
When I did this I got some collector errors, but you should be able to ignore them because they are probably trying to serialize this message here. Actually this was very helpful for my own investigation, I noticed that the message above has some defaults to the There are two samplers that the if (parentContext.isRemote) {
if (parentContext.traceFlags & TraceFlags.SAMPLED) { // FOCUS ON THIS
return this._remoteParentSampled.shouldSample(
context,
traceId,
spanName,
spanKind,
attributes,
links
);
}
return this._remoteParentNotSampled.shouldSample(
context,
traceId,
spanName,
spanKind,
attributes,
links
);
} Based on your described system, you are using our Lambda Layers. Our Lambda Layers have the AWS Propagator. The AWS Propagator follows the OpenTelemetry Specification on determining the parent span context in an AWS Lambda environment. If you look at the AWS Propagator code it will always create a new context with Based on the code path above, since we know the trace So... if you are confident your Lambda's Active Tracing settings is set to ON. Then the other solution we can do is change from the All you need to do is set your environment variable
By referencing an issue with SQS -> Lambda you might be describing this issue aws/aws-xray-sdk-node#208. Unfortunately this is a very complicated problem in the tracing community which doesn't have a concrete answer due to scalability questions. It's difficult to match several SQS spans to 1 Lambda span. OpenTelemetry is actively trying to solve this problem as a community and their progress is mentioned in the OpenTelemetry Specification for messaging.
Just to confirm, can you or @simonmarshall post a copy of what the Lambda event looks like? The one that it receives when it polls from SQS. I'm looking for any information there that could change the tracing header. From the other issue...
This is precisely what I want to check by asking you to double check Lambda's active tracing settings. As I just recently discovered above, in this Lambda layer, having ConclusionTL;DR Can you try setting I believe the problem is in your Lambda having |
hi @NathanielRN, fantastic, thanks. it looks like |
@Ancient-Dragon @simonmarshall I just brought your findings up to my team today! They pointed me to this paragraph in the 'using X-Ray with AWS Lambda' documentation':
Giving that you were sending 1000 requests, I feel like you could have easily hit the limit and then get sampled at 5 percent of additional requests. But hopefully the |
Hey @simonmarshall, anything left to follow up on for this issue or aws-observability/aws-otel-lambda#203? |
Hi @NathanielRN , all is good, so I think both can be closed, thanks |
Hi there,
We're facing an issue when using the lambda layer for node js (this can be replicated across multiple versions). What we're seeing is that for about 1000 spans created we're losing about 20-50 of them. It seems to always be the case that they are grouped by invocation of the lambda and therefore are all under the same trace id. Our team has noticed that when the spans are created it has isRecording set to false instead of true (which we're assuming is why they're not being sent to the collector). The other thing is that the traceFlags is set to 0 instead of 1 (although changing this has no impact). Has anyone seen this before / know how to resolve this issue?
The text was updated successfully, but these errors were encountered: