Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asynchronous ACK handling to S3 and SQS inputs #40699

Merged
merged 41 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
e9ccebe
working on S3 ack tracking
faec May 21, 2024
b17f7b5
handle S3 acks asynchronously
faec May 22, 2024
d023027
fix loop variable scope
faec May 22, 2024
aedb5a2
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec May 31, 2024
ed4953a
remove leftover print
faec May 31, 2024
12a39b4
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Jun 6, 2024
7e70ea6
cleanup
faec Jun 6, 2024
ff4f57c
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Jun 20, 2024
2d18b28
restore struct field
faec Jun 24, 2024
6bc5e42
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Jul 11, 2024
b4cc30e
updates in progress
faec Aug 1, 2024
d4ecc1f
leftover file
faec Aug 1, 2024
64a96d4
use a common ack helper in s3 and sqs
faec Aug 6, 2024
7a20250
in progress
faec Aug 7, 2024
977e4c5
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Aug 23, 2024
5282671
test updates
faec Aug 30, 2024
1548ac2
fix/update error SQS error handling logic
faec Sep 5, 2024
2358241
close ack handler in s3 worker
faec Sep 5, 2024
8c447bc
Pass through all processing parameters
faec Oct 1, 2024
2deefa7
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Oct 1, 2024
8cde326
regenerate mock interfaces
faec Oct 3, 2024
c2131e1
update mock calls
faec Oct 3, 2024
1a07561
clean up dead code
faec Oct 3, 2024
c99b94d
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Oct 3, 2024
da1c5ed
Merge branch 'main' into awss3-ack-handling
pierrehilbert Oct 4, 2024
70821fa
Merge branch 'main' into awss3-ack-handling
pierrehilbert Oct 4, 2024
5977072
finish config/doc updates, address review comments
faec Oct 14, 2024
6fe1183
remove helper that was replaced with shared lib
faec Oct 14, 2024
ba6ecfe
make check
faec Oct 14, 2024
f1732f8
re-correct doc yml
faec Oct 14, 2024
83ba548
make check again
faec Oct 14, 2024
577759c
update changelog
faec Oct 14, 2024
7ba1374
Merge branch 'awss3-ack-handling' of github.com:faec/beats into awss3…
faec Oct 14, 2024
cf90022
Merge branch 'main' of github.com:elastic/beats into awss3-ack-handling
faec Oct 14, 2024
a743c08
make update
faec Oct 14, 2024
48025b5
linter / CI fixes
faec Oct 15, 2024
10ecfd3
Fix benchmark test ack counting
faec Oct 15, 2024
7a38f55
edit changelog
faec Oct 15, 2024
84cb422
more mock test fixes
faec Oct 15, 2024
b088ca7
fix more mock details
faec Oct 15, 2024
bbbd1f9
fix race condition in mock sequencing
faec Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Added `container.image.name` to `journald` Filebeat input's Docker-specific translated fields. {pull}40450[40450]
- Change log.file.path field in awscloudwatch input to nested object. {pull}41099[41099]
- Remove deprecated awscloudwatch field from Filebeat. {pull}41089[41089]
- The performance of ingesting SQS data with the S3 input has improved by up to 60x for queues with many small events. `max_number_of_messages` config for SQS mode is now ignored, as the new design no longer needs a manual cap on messages. Instead, use `number_of_workers` to scale ingestion rate in both S3 and SQS modes. The increased efficiency may increase network bandwidth consumption, which can be throttled by lowering `number_of_workers`. It may also increase number of events stored in memory, which can be throttled by lowering the configured size of the internal queue. {pull}40699[40699]
- System module events now contain `input.type: systemlogs` instead of `input.type: log` when harvesting log files. {pull}41061[41061]


Expand Down
32 changes: 32 additions & 0 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23112,6 +23112,38 @@ Contents of probable licence file $GOMODCACHE/github.com/xdg-go/[email protected]/LIC
of your accepting any such warranty or additional liability.


--------------------------------------------------------------------------------
Dependency : github.com/zyedidia/generic
Version: v1.2.1
Licence type (autodetected): MIT
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/zyedidia/[email protected]/LICENSE:

MIT License

Copyright (c) 2021: Zachary Yedidia.

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
Dependency : go.elastic.co/apm/module/apmelasticsearch/v2
Version: v2.6.0
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@ require (
github.com/shirou/gopsutil/v3 v3.22.10
github.com/tklauser/go-sysconf v0.3.10
github.com/xdg-go/scram v1.1.2
github.com/zyedidia/generic v1.2.1
go.elastic.co/apm/module/apmelasticsearch/v2 v2.6.0
go.elastic.co/apm/module/apmhttp/v2 v2.6.0
go.elastic.co/apm/v2 v2.6.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -941,6 +941,8 @@ github.com/zeebo/assert v1.3.0 h1:g7C04CbJuIDKNPFHmsk4hwZDO5O+kntRxzaUoNXj+IQ=
github.com/zeebo/assert v1.3.0/go.mod h1:Pq9JiuJQpG8JLJdtkwrJESF0Foym2/D9XMU5ciN/wJ0=
github.com/zeebo/xxh3 v1.0.2 h1:xZmwmqxHZA8AI603jOQ0tMqmBr9lPeFwGg6d+xy9DC0=
github.com/zeebo/xxh3 v1.0.2/go.mod h1:5NWz9Sef7zIDm2JHfFlcQvNekmcEl9ekUZQQKCYaDcA=
github.com/zyedidia/generic v1.2.1 h1:Zv5KS/N2m0XZZiuLS82qheRG4X1o5gsWreGb0hR7XDc=
github.com/zyedidia/generic v1.2.1/go.mod h1:ly2RBz4mnz1yeuVbQA/VFwGjK3mnHGRj1JuoG336Bis=
go.einride.tech/aip v0.67.1 h1:d/4TW92OxXBngkSOwWS2CH5rez869KpKMaN44mdxkFI=
go.einride.tech/aip v0.67.1/go.mod h1:ZGX4/zKw8dcgzdLsrvpOOGxfxI2QSk12SlP7d6c0/XI=
go.elastic.co/apm/module/apmelasticsearch/v2 v2.6.0 h1:ukMcwyMaDXsS1dRK2qRYXT2AsfwaUy74TOOYCqkWJow=
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@
# SQS queue URL to receive messages from (required).
#queue_url: "https://sqs.us-east-1.amazonaws.com/1234/test-aws-s3-logs-queue"

# Maximum number of SQS messages that can be inflight at any time.
#max_number_of_messages: 5
# Number of workers on S3 bucket or SQS queue
#number_of_workers: 5

# Maximum duration of an AWS API call (excluding S3 GetObject calls).
#api_timeout: 120s
Expand Down
14 changes: 1 addition & 13 deletions x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -307,18 +307,6 @@ The maximum number of bytes that a single log message can have. All bytes after
multiline log messages, which can get large. This only applies to non-JSON logs.
The default is `10 MiB`.

[float]
==== `max_number_of_messages`

The maximum number of SQS messages that can be inflight at any time. Defaults
to 5. Setting this parameter too high can overload Elastic Agent and cause
ingest failures in situations where the SQS messages contain many S3 objects
or the S3 objects themselves contain large numbers of messages.
We recommend to keep the default value 5 and use the `Balanced` or `Optimized for
Throughput` setting in the
{fleet-guide}/es-output-settings.html#es-output-settings-performance-tuning-settings[preset]
options to tune your Elastic Agent performance.

[id="input-{type}-parsers"]
[float]
==== `parsers`
Expand Down Expand Up @@ -504,7 +492,7 @@ Prefix to apply for the list request to the S3 bucket. Default empty.
[float]
==== `number_of_workers`

Number of workers that will process the S3 objects listed. (Required when `bucket_arn` is set).
Number of workers that will process the S3 or SQS objects listed. Required when `bucket_arn` is set, otherwise (in the SQS case) defaults to 5.


[float]
Expand Down
34 changes: 8 additions & 26 deletions x-pack/filebeat/filebeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ filebeat.modules:
# Bucket list interval on S3 bucket
#var.bucket_list_interval: 300s

# Number of workers on S3 bucket
# Number of workers on S3 bucket or SQS queue
#var.number_of_workers: 5

# Process CloudTrail logs
Expand Down Expand Up @@ -188,9 +188,6 @@ filebeat.modules:
# Enabling this option changes the service name from `s3` to `s3-fips` for connecting to the correct service endpoint.
#var.fips_enabled: false

# The maximum number of messages to return from SQS. Valid values: 1 to 10.
#var.max_number_of_messages: 5

# URL to proxy AWS API calls
#var.proxy_url: http://proxy:3128

Expand All @@ -212,7 +209,7 @@ filebeat.modules:
# Bucket list interval on S3 bucket
#var.bucket_list_interval: 300s

# Number of workers on S3 bucket
# Number of workers on S3 bucket or SQS queue
#var.number_of_workers: 5

# Filename of AWS credential file
Expand Down Expand Up @@ -249,9 +246,6 @@ filebeat.modules:
# Enabling this option changes the service name from `s3` to `s3-fips` for connecting to the correct service endpoint.
#var.fips_enabled: false

# The maximum number of messages to return from SQS. Valid values: 1 to 10.
#var.max_number_of_messages: 5

# URL to proxy AWS API calls
#var.proxy_url: http://proxy:3128

Expand All @@ -273,7 +267,7 @@ filebeat.modules:
# Bucket list interval on S3 bucket
#var.bucket_list_interval: 300s

# Number of workers on S3 bucket
# Number of workers on S3 bucket or SQS queue
#var.number_of_workers: 5

# Filename of AWS credential file
Expand Down Expand Up @@ -310,9 +304,6 @@ filebeat.modules:
# Enabling this option changes the service name from `s3` to `s3-fips` for connecting to the correct service endpoint.
#var.fips_enabled: false

# The maximum number of messages to return from SQS. Valid values: 1 to 10.
#var.max_number_of_messages: 5

# URL to proxy AWS API calls
#var.proxy_url: http://proxy:3128

Expand All @@ -334,7 +325,7 @@ filebeat.modules:
# Bucket list interval on S3 bucket
#var.bucket_list_interval: 300s

# Number of workers on S3 bucket
# Number of workers on S3 bucket or SQS queue
#var.number_of_workers: 5

# Filename of AWS credential file
Expand Down Expand Up @@ -371,9 +362,6 @@ filebeat.modules:
# Enabling this option changes the service name from `s3` to `s3-fips` for connecting to the correct service endpoint.
#var.fips_enabled: false

# The maximum number of messages to return from SQS. Valid values: 1 to 10.
#var.max_number_of_messages: 5

# URL to proxy AWS API calls
#var.proxy_url: http://proxy:3128

Expand All @@ -395,7 +383,7 @@ filebeat.modules:
# Bucket list interval on S3 bucket
#var.bucket_list_interval: 300s

# Number of workers on S3 bucket
# Number of workers on S3 bucket or SQS queue
#var.number_of_workers: 5

# Filename of AWS credential file
Expand Down Expand Up @@ -432,9 +420,6 @@ filebeat.modules:
# Enabling this option changes the service name from `s3` to `s3-fips` for connecting to the correct service endpoint.
#var.fips_enabled: false

# The maximum number of messages to return from SQS. Valid values: 1 to 10.
#var.max_number_of_messages: 5

# URL to proxy AWS API calls
#var.proxy_url: http://proxy:3128

Expand All @@ -456,7 +441,7 @@ filebeat.modules:
# Bucket list interval on S3 bucket
#var.bucket_list_interval: 300s

# Number of workers on S3 bucket
# Number of workers on S3 bucket or SQS queue
#var.number_of_workers: 5

# Filename of AWS credential file
Expand Down Expand Up @@ -493,9 +478,6 @@ filebeat.modules:
# Enabling this option changes the service name from `s3` to `s3-fips` for connecting to the correct service endpoint.
#var.fips_enabled: false

# The maximum number of messages to return from SQS. Valid values: 1 to 10.
#var.max_number_of_messages: 5

# URL to proxy AWS API calls
#var.proxy_url: http://proxy:3128

Expand Down Expand Up @@ -3013,8 +2995,8 @@ filebeat.inputs:
# SQS queue URL to receive messages from (required).
#queue_url: "https://sqs.us-east-1.amazonaws.com/1234/test-aws-s3-logs-queue"

# Maximum number of SQS messages that can be inflight at any time.
#max_number_of_messages: 5
# Number of workers on S3 bucket or SQS queue
#number_of_workers: 5

# Maximum duration of an AWS API call (excluding S3 GetObject calls).
#api_timeout: 120s
Expand Down
106 changes: 106 additions & 0 deletions x-pack/filebeat/input/awss3/acks.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

package awss3

import (
"github.com/zyedidia/generic/queue"

"github.com/elastic/beats/v7/libbeat/beat"
"github.com/elastic/beats/v7/libbeat/common/acker"
)

type awsACKHandler struct {
pending *queue.Queue[pendingACK]
ackedCount int

pendingChan chan pendingACK
ackChan chan int
}

type pendingACK struct {
eventCount int
ackCallback func()
}

func newAWSACKHandler() *awsACKHandler {
handler := &awsACKHandler{
pending: queue.New[pendingACK](),

// Channel buffer sizes are somewhat arbitrary: synchronous channels
// would be safe, but buffers slightly reduce scheduler overhead since
// the ack loop goroutine doesn't need to wake up as often.
//
// pendingChan receives one message each time an S3/SQS worker goroutine
// finishes processing an object. If it is full, workers will not be able
// to advance to the next object until the ack loop wakes up.
//
// ackChan receives approximately one message every time an acknowledged
// batch of events contains at least one event from this input. (Sometimes
// fewer if messages can be coalesced.) If it is full, acknowledgement
// notifications for inputs/queue will stall until the ack loop wakes up.
// (This is a much worse consequence than pendingChan, but ackChan also
// receives fewer messages than pendingChan by a factor of ~thousands,
// so in practice it's still low-impact.)
pendingChan: make(chan pendingACK, 10),
ackChan: make(chan int, 10),
Comment on lines +46 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum I think we need a comment on why 10. But I've got a couple of questions. Do these always need to be in sync? Would we ever want to change these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, documenting why we picked up 10, and if they need to stay in sync would be useful. Do we ever need to make this configurable?

}
go handler.run()
return handler
}

func (ah *awsACKHandler) Add(eventCount int, ackCallback func()) {
ah.pendingChan <- pendingACK{
eventCount: eventCount,
ackCallback: ackCallback,
}
}

// Called when a worker is closing, to indicate to the ack handler that it
// should shut down as soon as the current pending list is acknowledged.
func (ah *awsACKHandler) Close() {
close(ah.pendingChan)
}

func (ah *awsACKHandler) pipelineEventListener() beat.EventListener {
return acker.TrackingCounter(func(_ int, total int) {
// Notify the ack handler goroutine
ah.ackChan <- total
})
}

// Listener that handles both incoming metadata and ACK
// confirmations.
func (ah *awsACKHandler) run() {
for {
select {
case result, ok := <-ah.pendingChan:
if ok {
ah.pending.Enqueue(result)
} else {
// Channel is closed, reset so we don't receive any more values
ah.pendingChan = nil
}
case count := <-ah.ackChan:
ah.ackedCount += count
}

// Finalize any objects that are now completed
for !ah.pending.Empty() && ah.ackedCount >= ah.pending.Peek().eventCount {
result := ah.pending.Dequeue()
ah.ackedCount -= result.eventCount
// Run finalization asynchronously so we don't block the SQS worker
// or the queue by ignoring the ack handler's input channels. Ordering
// is no longer important at this point.
if result.ackCallback != nil {
go result.ackCallback()
}
}

// If the input is closed and all acks are completed, we're done
if ah.pending.Empty() && ah.pendingChan == nil {
return
}
}
}
Loading
Loading