-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Decouple and Batch Processors to Collector #959
Changes from 6 commits
a3656b8
656d11b
e51c41d
828b649
2a8812c
fa15461
5bf3d08
8802fbd
34bb381
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../Makefile.Common |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
module github.com/open-telemetry/opentelemetry-lambda/collector/lambdalifecycle | ||
|
||
go 1.20 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package lambdalifecycle | ||
|
||
// Listener interface used to notify objects of Lambda lifecycle events. | ||
type Listener interface { | ||
// FunctionInvoked is called after the extension receives a "Next" notification. | ||
FunctionInvoked() | ||
// FunctionFinished is called after the extension is notified that the function has completed, but before the environment is frozen. | ||
// The environment is only frozen once all listeners have returned. | ||
FunctionFinished() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably this is where some additional comments are required, I'll look at adding them tomorrow. |
||
// EnvironmentShutdown is called when the extension is notified that the environment is about to shut down. | ||
// Shutting down of the collector components only happens after all listeners have returned. | ||
EnvironmentShutdown() | ||
} | ||
|
||
type Notifier interface { | ||
AddListener(listener Listener) | ||
} | ||
|
||
var ( | ||
notifier Notifier | ||
) | ||
|
||
func SetNotifier(n Notifier) { | ||
notifier = n | ||
} | ||
|
||
func GetNotifier() Notifier { | ||
return notifier | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../Makefile.Common |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Decouple Processor | ||
|
||
| Status | | | ||
| ------------------------ |-----------------------| | ||
| Stability | [in development] | | ||
| Supported pipeline types | traces, metrics, logs | | ||
| Distributions | [extension] | | ||
|
||
This processor decouples the receiver and exporter ends of the pipeline. This allows the lambda function to finish before traces/metrics/logs have been exported by the collector. The processor is aware of the Lambda [lifecycle] and will prevent the environment from being frozen or shutdown until any pending traces/metrics/logs have been exported. | ||
In this way the response times of the Lambda function is not impacted by the need to export data, however the billed duration will include the time taken to export data as well as runtime of the lambda function. | ||
|
||
The decouple processor should always be the last processor in the list to ensure that there are no issues with data being sent while the environment is about to be frozen, which could result in lost data. | ||
|
||
When combined with the batch processor, the number of exports required can be significantly reduced and therefore the cost of running the lambda. This is with the trade-off that the data will not be available at your chosen endpoint until some time after the invocation, up to a maximum of 5 minutes (the timeout that the environment is shutdown when no further invocations are received). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if the Another question that comes to mind is whether the batch processor alone would work here since it would in theory be able to send all the data left in any existing batches through the normal shutdown process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Actually I was going to update the README to indicate that the batch processor must come before the decouple processor otherwise you run into problems (as mentioned below). It would be possible to rewrite this as you say, however the simplest approach was to do one thing and one thing well and re-use the existing batch processor in front of the decouple processor.
Not on it's on no, I did also consider it. The problem is that the collector is only shutdown when the environment is about to be destroyed. However it's frozen after each function invocation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @adcharre I'm not very familiar with extensions (or the collector)... could you highlight for me how the code you added handles the frozen scenario? (How does it prevent from being frozen.) Perhaps some additional comments in your code would be helpful here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added additional information to the Listener interface here https://github.com/adcharre/opentelemetry-lambda/blob/2a8812c4aa78cbe7a27149532ec78ce6081a7234/collector/lambdalifecycle/notifier.go#L21 |
||
|
||
## Processor Configuration | ||
|
||
```yaml | ||
processors: | ||
decouple: | ||
# max_queue_size allows you to control how many spans etc. are accepted before the pipeline blocks | ||
# until an export has been completed. Default value is 200. | ||
max_queue_size: 20 | ||
``` | ||
|
||
[in development]: https://github.com/open-telemetry/opentelemetry-collector#development | ||
[extension]: https://github.com/open-telemetry/opentelemetry-lambda/collector | ||
[lifecycle]: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html#runtimes-extensions-api-lifecycle |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package decoupleprocessor // import "github.com/open-telemetry/opentelemetry-lambda/collector/processor/decoupleprocessor" | ||
import "errors" | ||
|
||
// Config defines the configuration for the various elements of the processor. | ||
type Config struct { | ||
MaxQueueSize uint32 `mapstructure:"max_queue_size"` | ||
} | ||
|
||
var invalidMaxQueueSizeError = errors.New("max_queue_size must be greater than 0") | ||
|
||
// Validate validates the configuration by checking for missing or invalid fields | ||
func (cfg *Config) Validate() error { | ||
if cfg.MaxQueueSize == 0 { | ||
return invalidMaxQueueSizeError | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerbenson - here's a comment to explain that we notifier other components that the function has finished and therefore need to be ready to freeze.
The new code I've added hooks into the existing lifecycle code,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good understanding of the extension lifecycle. If the lambda function returns and starts waiting for the next event, are you sure it waits for events to finish processing before freezing? Is this documented somewhere? How did you verify this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifecycle docs say it waits until the extension also calls
Next
.But does the lambda function get the next event only after all extensions have called
Next
? If that were the case then wouldn't it still be waiting until the extension finishes, unless the extension callsNext
before doing the export, but then it might be frozen before actually exporting.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerbenson according to the lifecycle documentation (https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html#runtimes-extensions-api-lifecycle)
@tsloughter the lambda runtime will only get the next invocation once all extensions have also signalled they are ready.
The sequence of events is:
Next
eventAlso see the "Invoke phase" in the same link:
and
As to the comment about verifying this behaviour - from Cloudwatch logs to confirm what's documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, cool, this is great.