- Stage: 2 (candidate)
- Date: 2021-09-14
Using APM agents in the context of serverless environments (e.g. AWS Lambda, Azure Functions, etc.) allows to capture function as a service (faas) specific context that can be of great value for the end users and provide correlation points with other sources of data.
Extending ECS with a dedicated fields group or embedding it into exsting cloud
fields would allow to capture this data in a meaningful, semantically aligned way and correlate the data accross different use cases (e.g. correlating AWS Lambda traces with corresponding Lambda metrics and logs).
The existing specification in OpenTelemetry can serve as a good orientation: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md#example
Discussing the initial proposal with Andrew Wilkins, we came up with an adapted proposal (compared to the proposal for stage 0) that would reuse as many as possible existing ECS fields:
Field | Type | Example | Description | Use case |
---|---|---|---|---|
faas.id | keyword | arn:aws:lambda:us-west-2:123456789012:function:my-function |
The unique identifier of a serverless function. For AWS Lambda it's the function ARN (Amazon Resource Name) without a version or alias suffix. | Correlation of traces, logs and metrics for a specific serverless function. |
faas.name | keyword | my-function |
The name of a serverless function. | Display name of a serverless function. |
faas.version | keyword | 123 |
The version of a serverless function. | Group / differentiate data by the version of a serverless function. |
faas.coldstart | boolean | true | Boolean value indicating a cold start of a function | Can be used in the UI denote function coldstarts. |
faas.execution | keyword | "af9d5aa4-a685-4c5f-a22b-444f80b3cc28" | The execution ID of the current function execution. | Allows correlation with CloudWatch logs and metrics |
faas.trigger.type | keyword | "http" | one of http ,pubsub ,datasource , timer , other |
Allows differentiating different function types |
faas.trigger.request_id | keyword | e.g. 123456789 |
The iD of the trigger request , message, event, etc. | Correlation of metrics and logs with the corresponding trigger request |
For the initially proposed fields faas.name
, faas.id
, faas.version
and faas.instance
we decided to reuse the existing fields service.name
, service.id
, service.version
and service.node.name
.
We identified a big overlap between the initially proposed faas.trigger.*
fields with the already existing cloud.*
and service.
fields.
Allowing to self-nest cloud and service fields under cloud.origin.*
/ cloud.target.*
and service.origin.*
/ service.target.*
, respectively, would allow to cover most of the faas.trigger.*
fields.
Moreover, the proposal for nesting cloud fields would resolve other use cases as well (e.g. #1282).
Initially proposed | New proposed nested cloud or service field |
---|---|
faas.trigger.name | service.origin.name |
faas.trigger.id | service.origin.id |
faas.trigger.version | service.origin.version |
faas.trigger.account.name | cloud.origin.account.name |
faas.trigger.account.id | cloud.origin.account.id |
faas.trigger.region | cloud.origin.region |
Done.
Allows for correlating traces, logs and metrics for individual serverless functions and versions. faas.name
will be used as the display name of serverless functions in the UI.
Will be used in the APM UI to mark function invocations that resultet from a coldstart. This is a useful information for the end users to differentiate coldstart behaviour from warmstart function invocations.
These IDs will be used to correlate APM data (traces / transactions), logs and metrics of the faas function (e.g. from CloudWatch) as well as logs and metrics from the corresponding trigger for individual invocations.
Indicates the type of the function trigger. Allows to group different function types.
Provides meta information on the origin service that triggered the faas function. End users can use this information to better understand the context, dependencies and causalities when analyzing and troubleshooting faas-related observability scenarios. For example, this information could provide insights on analysis questions like this: "Do function invocations that are triggered from cloud region us-east-1 behave similar to invocations from region eu-west-1?", etc.
Faas functions provide meta-information in their execution environment. APM agents use instrumentation techniques to read this information. For instance, AWS Lambda provides an event
and a context
object with each function invocation: https://docs.aws.amazon.com/lambda/latest/dg/python-context.html
The above fields will be derived by the APM agents from the AWS Lambda context object
and the event object
that are passed with an invocation of a Lambda function. Below is an example for the context and event object.
The mapping to the proposed fields for this example is layed out in the following table
target ECS field | source field |
---|---|
faas.id | context.invokedFunctionArn |
faas.name | context.functionName |
faas.version | context.functionVersion |
faas.coldstart | No source field. Determined by the APM agent on the first Lambda function invocation. |
faas.execution | context.awsRequestId |
faas.trigger.type | No source field. Determined by the APM agent based on the event object type. Would be http in this example. |
faas.trigger.request_id | event.requestContext.requestId |
service.origin.name | ${event.requestContext.httpMethod} ${event.requestContext.resourcePath}/${event.requestContext.stage} -> GET /fetch_all/dev |
service.origin.id | event.requestContext.apiId |
service.origin.version | No source field. Determined by the APM agent based on the event object type whether it's API version 1.0 or 2.0 . |
cloud.origin.service.name | api gateway |
cloud.origin.account.id | event.requestContext.accountId |
Description available here.
context:
{
"callbackWaitsForEmptyEventLoop": true,
"functionVersion": "$LATEST",
"functionName": "the-function-name",
"memoryLimitInMB": "128",
"logGroupName": "/aws/lambda/the-function-name",
"logStreamName": "2021/08/13/[$LATEST]08834acf4e4f463b95b7b99aa8b34aff",
"invokedFunctionArn": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:the-function-name",
"awsRequestId": "649bf7d0-c6ae-432d-899d-da44ccd7ee95"
}
Description available here.
event:
{
"resource": "/fetch_all",
"path": "/fetch_all",
"httpMethod": "GET",
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"CloudFront-Forwarded-Proto": "https",
"CloudFront-Is-Desktop-Viewer": "true",
"CloudFront-Is-Mobile-Viewer": "false",
"CloudFront-Is-SmartTV-Viewer": "false",
"CloudFront-Is-Tablet-Viewer": "false",
"CloudFront-Viewer-Country": "US",
"Host": "02plqthge2.execute-api.us-east-1.amazonaws.com",
"upgrade-insecure-requests": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0",
"Via": "2.0 969f35f01b6eddd92239a3e818fc1e0d.cloudfront.net (CloudFront)",
"X-Amz-Cf-Id": "eDbpfDwO-CRYymEFLkW6CBCsU_H_PS8R93_us53QWvXWLS45v3NvQw==",
"X-Amzn-Trace-Id": "Root=1-5e502af4-fd0c1c6fdc164e1d6361183b",
"X-Forwarded-For": "76.76.241.57, 52.46.47.139",
"X-Forwarded-Port": "443",
"X-Forwarded-Proto": "https"
},
"multiValueHeaders": {
"Accept": [
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
],
"Accept-Encoding": [
"gzip, deflate, br"
],
"Accept-Language": [
"en-US,en;q=0.5"
],
"CloudFront-Forwarded-Proto": [
"https"
],
"CloudFront-Is-Desktop-Viewer": [
"true"
],
"CloudFront-Is-Mobile-Viewer": [
"false"
],
"CloudFront-Is-SmartTV-Viewer": [
"false"
],
"CloudFront-Is-Tablet-Viewer": [
"false"
],
"CloudFront-Viewer-Country": [
"US"
],
"Host": [
"02plqthge2.execute-api.us-east-1.amazonaws.com"
],
"upgrade-insecure-requests": [
"1"
],
"User-Agent": [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0"
],
"Via": [
"2.0 969f35f01b6eddd92239a3e818fc1e0d.cloudfront.net (CloudFront)"
],
"X-Amz-Cf-Id": [
"eDbpfDwO-CRYymEFLkW6CBCsU_H_PS8R93_us53QWvXWLS45v3NvQw=="
],
"X-Amzn-Trace-Id": [
"Root=1-5e502af4-fd0c1c6fdc164e1d6361183b"
],
"X-Forwarded-For": [
"76.76.241.57, 52.46.47.139"
],
"X-Forwarded-Port": [
"443"
],
"X-Forwarded-Proto": [
"https"
]
},
"queryStringParameters": null,
"multiValueQueryStringParameters": null,
"pathParameters": null,
"stageVariables": null,
"requestContext": {
"resourceId": "y3tkf7",
"resourcePath": "/fetch_all",
"httpMethod": "GET",
"extendedRequestId": "IQumRELJIAMF6fQ=",
"requestTime": "21/Feb/2020:19:09:40 +0000",
"path": "/dev/fetch_all",
"accountId": "571481734049",
"protocol": "HTTP/1.1",
"stage": "dev",
"domainPrefix": "02plqthge2",
"requestTimeEpoch": 1582312180890,
"requestId": "6f3dffca-46f8-4c8b-800b-6bc1ea2554ec",
"identity": {
"cognitoIdentityPoolId": null,
"accountId": null,
"cognitoIdentityId": null,
"caller": null,
"sourceIp": "76.76.241.57",
"principalOrgId": null,
"accessKey": null,
"cognitoAuthenticationType": null,
"cognitoAuthenticationProvider": null,
"userArn": null,
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0",
"user": null
},
"domainName": "02plqthge2.execute-api.us-east-1.amazonaws.com",
"apiId": "02plqthge2"
},
"body": null,
"isBase64Encoded": false
}
- Ingestion mechanisms:
- APM server will extend the intake V2 API to accept the new fields and store them with the transaction documents
- APM server will extend OpenTelemetry field mapping to account for these new fields
- Usage mechanisms:
- APM UI may utilize the new fields to provide Lambda / serverless specific visualizations (e.g. indicating cold starts on transactions in the waterfall view, showing meta information on lambda service views)
- ECS project
- the concept of self-nesting service and cloud fields under origin and target needs clear documentation that avoids confusion around when to use which of the fields. Tried to address this with the description in the schema for those fields in this PR.
During stage 1 review @ebeahan identied the potential confusion over an established ECS pattern
where the root entity defines the do'er
and *.target.*
the affected entity.
This proposal extends this pattern as there are 3 active parties involved. This puts the onus on ECS documentation being extremely clear on which field a user needs to query to get their intended results.
- extended descriptio / footnote for service and cloud fields in this PR to avoid confusion about origin and target nesting of service and cloud fields
The following are the people that consulted on the contents of this RFC.
- @AlexanderWert | author, sponsor
- @axw | subject matter expert
- @Mpdreamz | subject matter expert