Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add instrumentation of AWS Lambda handlers to generate FaaS spans #399

Merged
merged 17 commits into from
Apr 12, 2021

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Mar 30, 2021

Which problem is this PR solving?

  • Allow AWS Lambda users to have spans generated for function invocations

Short description of the changes

This instrumentation patches the AWS Lambda handler (found via the _HANDLER env variable which is always set by Lambda) to generate spans around invocations. It is the first in a series of PRs implementing the spec defined here

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/instrumentation/aws-lambda.md

This currently only decorates the invocations with spans following FaaS conventions - future PRs will add reading of HTTP for proxy requests and messaging for SQS requests with appropriate parenting.

I actually am not sure if this approach works I will try it out with an actual lambda function, sending out for an initial look at the code and maybe suggestions on how to work around this problem. I noticed that the patching infrastructure, via require-in-middle can only patch core modules or modules in node_modules - it literally checks the path of a required file to see if it is in node_modules and bails if it isn't.

https://github.com/watson/module-details-from-path/blob/master/index.js#L8

I have worked around it in testing here by copying handlers into node_modules - I need to check whether Lambda actually houses a user handler function inside a folder called node_modules. There's a reasonable chance it doesn't - if anyone can provide some tips on whether it's possible to work around this, that would be great.

@anuraaga anuraaga requested a review from a team March 30, 2021 07:44
@anuraaga
Copy link
Contributor Author

@willarmiros @wangzlei

@anuraaga
Copy link
Contributor Author

anuraaga commented Apr 1, 2021

I found how lambda requires modules (absolute path) and the way to get it to work with the patching infrastructure open-telemetry/opentelemetry-js#2069

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyladan @willarmiros Updated, now the tests pass even with the HEAD opentelemetry-js code.


return [
new InstrumentationNodeModuleDefinition(
// NB: The patching infrastructure seems to match names backwards, this must be the filename, while
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this strange pattern that did allow patching the lambda handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean that we pass the filename to a class called InstrumentationNodeModuleDefinition, but pass a module name to a class called InstrumentationNodeModuleFile. I'd expect file to refer to a filename :)

Copy link
Member

@vmarchaud vmarchaud Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thats weird, i'm not sure if it's because of the resolving done by lamda. With other instrumentations we do the opposite (module is a name and file is an relative path to the module): https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/plugins/node/opentelemetry-instrumentation-mongodb/src/mongodb.ts#L59

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indeed looks strange as it should be a module name if user requires for example http then the module name is http so in this case it should be aws-lambda not a file name. If you need to patch some additional files then this is what InstrumentationNodeModuleFile is for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I figured. Well I'd like to leave this as is for now since it seems to be the pattern that works here :)

@codecov
Copy link

codecov bot commented Apr 6, 2021

Codecov Report

Merging #399 (14f0f8a) into main (c7fb34b) will not change coverage.
The diff coverage is n/a.

❗ Current head 14f0f8a differs from pull request most recent head c42d413. Consider uploading reports for the commit c42d413 to get more accurate results

@@           Coverage Diff           @@
##             main     #399   +/-   ##
=======================================
  Coverage   94.21%   94.21%           
=======================================
  Files          12       12           
  Lines         432      432           
  Branches       48       48           
=======================================
  Hits          407      407           
  Misses         25       25           

@anuraaga
Copy link
Contributor Author

anuraaga commented Apr 7, 2021

By the way I could confirm this instrumentation works to instrumentent lambda invocations by building into a lambda layer in open-telemetry/opentelemetry-lambda#25

Copy link
Member

@vmarchaud vmarchaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@obecny obecny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change names everywhere from awslambda to aws-lambda including the folder name

@@ -0,0 +1,71 @@
{
"name": "@opentelemetry/instrumentation-awslambda",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name should be

Suggested change
"name": "@opentelemetry/instrumentation-awslambda",
"name": "@opentelemetry/instrumentation-aws-lambda",

you are patching aws-lambda and those are 2 words anyway

() => original.apply(this, [event, context, wrappedCallback]),
error => {
if (error != null) {
span.setStatus({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use also span.recordException to save the whole stack of error

errMessage = err.message;
}
if (errMessage) {
span.setStatus({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • span.recordException

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recordException I don't think sets status does it? You may need to do both if that is the behavior you want.


return [
new InstrumentationNodeModuleDefinition(
// NB: The patching infrastructure seems to match names backwards, this must be the filename, while
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indeed looks strange as it should be a module name if user requires for example http then the module name is http so in this case it should be aws-lambda not a file name. If you need to patch some additional files then this is what InstrumentationNodeModuleFile is for

attributes: {
[FaasAttribute.FAAS_EXECUTION]: context.awsRequestId,
[FaasAttribute.FAAS_ID]: context.invokedFunctionArn,
// TODO(anuraaga): Add cloud.account.id when there is a JS accessor for it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please create a ticket for that so it is not lost ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be extracted from the ARN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to getting account from ARN

return function wrappedCallback(this: {}, err, res) {
diag.debug('executing wrapped lookup callback function');

let errMessage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part of code is the same as in line 135. Please create a private method for that something like
this._addError(span)
please also use recordException if you have an object error.


import { Handler } from 'aws-lambda';

export interface LambdaModule {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export interface LambdaModule {
export type LambdaModule = Record<string, Handler>;

@@ -0,0 +1,5 @@
{
"name": "lambdatest",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"name": "lambdatest",
"name": "lambda-test",

});
});

context('sync success handler', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add block describe('sync success handler') and similar block to async

Copy link
Member

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you handling flushing? Are you just hoping that the lambda runtime isn't paused at an inconvenient time?

@vmarchaud
Copy link
Member

Please change names everywhere from awslambda to aws-lambda including the folder name

Is this a requirement that i missed for vendor components or just a better naming suggestion ?

@obecny
Copy link
Member

obecny commented Apr 7, 2021

aws-lambda

better naming and consistency, aws-lambda is the official name of the package so I think we should keep the exact name as we do with other things. It is also 2 words so one more reason for having a dash like we do with everything else with regards to naming.

Copy link
Contributor

@willarmiros willarmiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general

attributes: {
[FaasAttribute.FAAS_EXECUTION]: context.awsRequestId,
[FaasAttribute.FAAS_ID]: context.invokedFunctionArn,
// TODO(anuraaga): Add cloud.account.id when there is a JS accessor for it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to getting account from ARN

});

// Lambda seems to pass a callback even if handler is of Promise form, so we wrap all the time before calling
// the handler and see if the result is a Promise or not. In such a case, the callback is usually ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such a case, the callback is usually ignored.

What if the callback isn't ignored? Would the span be ended twice/would that matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah anything after the first end is ignored as per otel spec

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all applied cleanups


return [
new InstrumentationNodeModuleDefinition(
// NB: The patching infrastructure seems to match names backwards, this must be the filename, while
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I figured. Well I'd like to leave this as is for now since it seems to be the pattern that works here :)

});

// Lambda seems to pass a callback even if handler is of Promise form, so we wrap all the time before calling
// the handler and see if the result is a Promise or not. In such a case, the callback is usually ignored.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah anything after the first end is ignored as per otel spec

@anuraaga
Copy link
Contributor Author

anuraaga commented Apr 8, 2021

How are you handling flushing? Are you just hoping that the lambda runtime isn't paused at an inconvenient time?

Thanks a lot for the reminder, added a forceFlush at the end of the function invocation.

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Flarna - also, do you happen to know how to ignore a test on nodejs 8 in CI? If it's not easy I'm thinking of just going with then(() => callback(), () => callback())

@Flarna
Copy link
Member

Flarna commented Apr 8, 2021

do you happen to know how to ignore a test on nodejs 8 in CI?

Sorry, don't know. One simple solution would be to call this.skip() in the test suite for old node versions. But maybe there is a better solution.

@anuraaga
Copy link
Contributor Author

anuraaga commented Apr 8, 2021

@Flarna Thanks! Skipping sounds good

Co-authored-by: Bartlomiej Obecny <[email protected]>
@anuraaga
Copy link
Contributor Author

anuraaga commented Apr 8, 2021

@obecny >< Thanks for the help on cleanup!

@anuraaga
Copy link
Contributor Author

@obecny I ended up switching from finally to .then(repeat, repeat) because skipping the test caused code coverage to not be computed correctly. Hopefully it's not a big deal.

I have a couple of important followups planned, including implementing the parenting mechanism from the spec and HTTP attributes for proxy events. If this looks OK appreciate a merge :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants