Add max bytes in line limit #19552

aleksmaus · 2020-07-01T13:34:18Z

What does this PR do?

Handles OOM on long lines gracefully, writes the error into log.

Addresses Filebeat OOMs on very long lines #19500

Why is it important?

Handle bad data gracefully without OOM.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

I just recently started looking at the beats code so might not have enough knowledge of any possible side-effects of this change. Please review with that in mind.

How to test this PR locally

Refer to #19500 for test case.
The filebeat doesn't get killed due to OOM with this change and the error is logged (see the screenshot attached).

Related issues

Addresses Filebeat OOMs on very long lines #19500

Screenshots

Logs

2020-07-01T13:15:17.630Z	INFO	[crawler]	beater/crawler.go:108	Loading and starting Inputs completed. Enabled inputs: 1
2020-07-01T13:15:17.633Z	INFO	log/harvester.go:297	Harvester started for file: /test/readme.log
2020-07-01T13:15:17.635Z	ERROR	log/harvester.go:330	Read line error: exceeded max bytes in line limit; File: /test/readme.log

botelastic · 2020-07-01T13:34:23Z

This pull request doesn't have a Team:<team> label.

elasticmachine · 2020-07-01T13:34:48Z

Pinging @elastic/integrations-services (Team:Services)

aleksmaus · 2020-07-01T13:47:06Z

hmm, seems like a problem with CI build

[2020-07-01T13:40:28.732Z] + mage package

[2020-07-01T13:40:28.732Z] /var/lib/jenkins/workspace/Beats_packaging_PR-19552/src/github.com/elastic/beats/auditbeat@tmp/durable-fb3d78f4/script.sh: 1: /var/lib/jenkins/workspace/Beats_packaging_PR-19552/src/github.com/elastic/beats/auditbeat@tmp/durable-fb3d78f4/script.sh: mage: not found

script returned exit code 127

elasticmachine · 2020-07-01T15:09:52Z

💔 Build Failed

Expand to view the summary

Build stats

Build Cause: [Pull request #19552 updated]
Start Time: 2020-07-20T12:10:57.393+0000
Duration: 90 min 27 sec

Test stats 🧪

Test	Results
Failed	0
Passed	15721
Skipped	1890
Total	17611

Steps errors

Expand to view the steps failures

Name: Report to Codecov
- Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done
- Duration: 1 min 27 sec
- Start Time: 2020-07-20T12:54:11.019+0000
- log
Name: Mage build test
- Description: mage build test
- Duration: 7 min 13 sec
- Start Time: 2020-07-20T12:44:17.854+0000
- log
Name: Report to Codecov
- Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done
- Duration: 2 min 22 sec
- Start Time: 2020-07-20T12:45:48.055+0000
- log
Name: Report to Codecov
- Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done
- Duration: 2 min 22 sec
- Start Time: 2020-07-20T12:49:47.502+0000
- log
Name: Report to Codecov
- Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done
- Duration: 2 min 22 sec
- Start Time: 2020-07-20T12:48:03.100+0000
- log
Name: Report to Codecov
- Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done
- Duration: 2 min 22 sec
- Start Time: 2020-07-20T13:19:53.442+0000
- log

Log output

Expand to view the last 100 lines of log output

[2020-07-20T13:40:55.293Z] + FILE=winlogbeat/build/coverage/full.cov
[2020-07-20T13:40:55.293Z] + [ -f winlogbeat/build/coverage/full.cov ]
[2020-07-20T13:40:55.293Z] + FILE=journalbeat/build/coverage/full.cov
[2020-07-20T13:40:55.293Z] + [ -f journalbeat/build/coverage/full.cov ]
[2020-07-20T13:40:55.846Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats
[2020-07-20T13:40:56.162Z] + find . -type f -name TEST*.xml -path */build/* -delete
[2020-07-20T13:40:56.174Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Lint
[2020-07-20T13:40:56.265Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Elastic-Agent-x-pack
[2020-07-20T13:40:56.354Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Winlogbeat-oss
[2020-07-20T13:40:56.442Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Journalbeat-oss
[2020-07-20T13:40:56.532Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Dockerlogbeat
[2020-07-20T13:40:56.613Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-crosscompile
[2020-07-20T13:40:56.693Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Functionbeat-x-pack
[2020-07-20T13:40:56.844Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Generators-Metricbeat-Linux
[2020-07-20T13:40:56.949Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Packetbeat-Linux
[2020-07-20T13:40:57.070Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Elastic-Agent-Mac-OS-X
[2020-07-20T13:40:57.176Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Elastic-Agent-x-pack-Windows
[2020-07-20T13:40:57.291Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-OSS-Unit-tests
[2020-07-20T13:40:57.405Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Libbeat-oss
[2020-07-20T13:40:57.515Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Heartbeat-oss
[2020-07-20T13:40:57.606Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-x-pack
[2020-07-20T13:40:57.704Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-oss-Mac-OS-X
[2020-07-20T13:40:57.816Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Filebeat-x-pack-Mac-OS-X
[2020-07-20T13:40:57.909Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Winlogbeat-Windows-x-pack
[2020-07-20T13:40:57.990Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-x-pack-Mac-OS-X
[2020-07-20T13:40:58.086Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-x-pack-Windows
[2020-07-20T13:40:58.173Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-oss-Linux
[2020-07-20T13:40:58.262Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Libbeat-x-pack
[2020-07-20T13:40:58.351Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Auditbeat-oss-Windows
[2020-07-20T13:40:58.433Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Filebeat-Mac-OS-X
[2020-07-20T13:40:58.545Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-crosscompile
[2020-07-20T13:40:58.645Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Filebeat-x-pack-Windows
[2020-07-20T13:40:58.743Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Filebeat-Windows
[2020-07-20T13:40:58.844Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Winlogbeat-Windows
[2020-07-20T13:40:58.935Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack-Windows
[2020-07-20T13:40:59.035Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack-Mac-OS-X
[2020-07-20T13:40:59.139Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-Mac-OS-X
[2020-07-20T13:40:59.228Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-Windows
[2020-07-20T13:40:59.332Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Functionbeat-Mac-OS-X-x-pack
[2020-07-20T13:40:59.424Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Heartbeat-Mac-OS-X
[2020-07-20T13:40:59.509Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Generators-Beat-Linux
[2020-07-20T13:40:59.602Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Packetbeat-Mac-OS-X
[2020-07-20T13:40:59.708Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Filebeat-x-pack
[2020-07-20T13:40:59.807Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Filebeat-oss
[2020-07-20T13:40:59.904Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Functionbeat-Windows
[2020-07-20T13:41:00.011Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Heartbeat-Windows
[2020-07-20T13:41:00.104Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-OSS-Python-Integration-tests
[2020-07-20T13:41:00.215Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-OSS-Go-Integration-tests
[2020-07-20T13:41:00.322Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Packetbeat-Windows
[2020-07-20T13:41:00.412Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Generators-Metricbeat-Mac-OS-X
[2020-07-20T13:41:00.505Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Generators-Beat-Mac-OS-X
[2020-07-20T13:41:00.607Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack
[2020-07-20T13:41:01.146Z] + cat
[2020-07-20T13:41:01.146Z] + /usr/local/bin/runbld ./runbld-script
[2020-07-20T13:41:01.146Z] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
[2020-07-20T13:41:07.740Z] runbld>>> runbld started
[2020-07-20T13:41:07.740Z] runbld>>> 1.6.12/f45d832f2ba0aa2722ab4ec1fda8ad140f027f8b
[2020-07-20T13:41:09.651Z] runbld>>> The following profiles matched the job 'Beats/beats/PR-19552' in order of occurrence in the config (last value wins).
[2020-07-20T13:41:10.591Z] runbld>>> Debug logging enabled.
[2020-07-20T13:41:10.591Z] runbld>>> Storing result
[2020-07-20T13:41:10.852Z] runbld>>> Store result: created {:total 2, :successful 2, :failed 0} 1
[2020-07-20T13:41:10.852Z] runbld>>> BUILD: https://c150076387b5421f9154dfbf536e5c60.us-west1.gcp.cloud.es.io:9243/build-1587637540455/t/20200720134110-FABA21D5
[2020-07-20T13:41:10.852Z] runbld>>> Adding system facts.
[2020-07-20T13:41:11.867Z] runbld>>> Adding vcs info for the latest commit:  73e861c5982716a5163be7f75aa216cf40a59167
[2020-07-20T13:41:11.867Z] runbld>>> >>>>>>>>>>>> SCRIPT EXECUTION BEGIN >>>>>>>>>>>>
[2020-07-20T13:41:11.867Z] runbld>>> Adding /usr/lib/jvm/java-8-openjdk-amd64/bin to the path.
[2020-07-20T13:41:11.867Z] Processing JUnit reports with runbld...
[2020-07-20T13:41:11.867Z] + echo 'Processing JUnit reports with runbld...'
[2020-07-20T13:41:12.438Z] runbld>>> <<<<<<<<<<<< SCRIPT EXECUTION END <<<<<<<<<<<<
[2020-07-20T13:41:12.438Z] runbld>>> DURATION: 30ms
[2020-07-20T13:41:12.438Z] runbld>>> STDOUT: 40 bytes
[2020-07-20T13:41:12.438Z] runbld>>> STDERR: 49 bytes
[2020-07-20T13:41:12.438Z] runbld>>> WRAPPED PROCESS: SUCCESS (0)
[2020-07-20T13:41:12.438Z] runbld>>> Searching for build metadata in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats
[2020-07-20T13:41:13.379Z] runbld>>> Storing build metadata: 
[2020-07-20T13:41:13.379Z] runbld>>> Adding test report.
[2020-07-20T13:41:13.379Z] runbld>>> Searching for junit test output files with the pattern: TEST-.*\.xml$ in: /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats
[2020-07-20T13:41:14.320Z] runbld>>> Found 134 test output files
[2020-07-20T13:41:15.704Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-istio.xml
[2020-07-20T13:41:15.704Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-openmetrics.xml
[2020-07-20T13:41:15.704Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-tomcat.xml
[2020-07-20T13:41:15.704Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-activemq.xml
[2020-07-20T13:41:15.704Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-iis.xml
[2020-07-20T13:41:15.965Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-OSS-Go-Integration-tests/metricbeat/build/TEST-go-integration-graphite.xml
[2020-07-20T13:41:15.965Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-19552/src/github.com/elastic/beats/Metricbeat-OSS-Go-Integration-tests/metricbeat/build/TEST-go-integration-windows.xml
[2020-07-20T13:41:17.346Z] runbld>>> Test output logs contained: Errors: 0 Failures: 0 Tests: 17461 Skipped: 1629
[2020-07-20T13:41:17.346Z] runbld>>> Storing result
[2020-07-20T13:41:17.346Z] runbld>>> FAILURES: 0
[2020-07-20T13:41:17.346Z] runbld>>> Store result: updated {:total 2, :successful 2, :failed 0} 2
[2020-07-20T13:41:17.346Z] runbld>>> BUILD: https://c150076387b5421f9154dfbf536e5c60.us-west1.gcp.cloud.es.io:9243/build-1587637540455/t/20200720134110-FABA21D5
[2020-07-20T13:41:17.606Z] runbld>>> Email notification disabled by environment variable.
[2020-07-20T13:41:17.606Z] runbld>>> Slack notification disabled by environment variable.
[2020-07-20T13:41:23.331Z] Running on Jenkins in /var/lib/jenkins/workspace/Beats_beats_PR-19552
[2020-07-20T13:41:23.498Z] [INFO] getVaultSecret: Getting secrets
[2020-07-20T13:41:23.583Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-07-20T13:41:24.322Z] + chmod 755 generate-build-data.sh
[2020-07-20T13:41:24.322Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-19552/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-19552/runs/4 FAILURE 5426670
[2020-07-20T13:41:24.323Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-19552/runs/4/steps/?limit=10000 -o steps-info.json
[2020-07-20T13:41:26.184Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-19552/runs/4/tests/?status=FAILED -o tests-errors.json
[2020-07-20T13:41:26.735Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-19552/runs/4/log/ -o pipeline-log.txt

aleksmaus · 2020-07-01T15:53:30Z

The today's comment on the original issue was a suggestion to skip the long lines instead of returning the error. Will update PR later today.

aleksmaus · 2020-07-02T16:04:31Z

Updated PR to skip the long lines, logging the warning and continuing processing.

New log screenshot, running on the 100M suggested in the test case with couple of lines appended.
Shows skipping the long line. Added more unit test coverage for skipping the long lines.

Using the same MaxSize from the config that is used for the limiter at the end of the readers pipeline, multiplied by 4 since the checks and skipping happen before decoding the data.

benbuzbee

This is awesome! Thanks so much for the quick and best fix. Looking forward to deploying this.

benbuzbee · 2020-07-01T15:46:37Z

libbeat/reader/readfile/line.go

+
+		// If newLine is not found and exceeded max bytes limit
+		if idx == -1 && r.maxBytes != 0 && r.inBuffer.Len() > r.maxBytes {
+			return ErrExceededMaxBytesLimit


Will this abort reading the file forever? If other words, if max_bytes is 1024 and line 50 is 3000 bytes, will line 51+ never be read?

If so, would be it possible to, in lieu of returning an error, continue reading until the new-line but do not add what we read to r.inBuffer, effectively just ignoring the line. So in my example 50 would be skipped, but 51 could be read

I think you commented on the first commit. This code was changed by the second commit in this PR.

benbuzbee · 2020-07-02T16:41:43Z

filebeat/input/log/harvester.go

+	// for the worst case scenario where incoming UTF32 charchers are decoded to the single byte UTF-8 characters.
+	// This limit serves primarily to avoid memory bload or potential OOM with expectedly long lines in the file.
+	// The further size limiting is performed by LimitReader at the end of the readers pipeline as needed.
+	encReaderMaxBytes := h.config.MaxBytes * 4


may want to add a comment in docs for max_bytes that it can use up to 4x the number worth of memory for buffering

Sure can do that. Which docs? Before this change it was taking all available memory :-)

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

As a user I would want that information here

The maximum number of bytes that a single log message can have. All bytes after max_bytes are discarded and not sent., but up to 4x max_bytes will be buffered in-memory .This setting is especially useful for multiline log messages, which can get large. The default is 10MB (10485760).

On the second thought I'm not sure that this part of internal implementation should be documented.
Before this change the code was allocating as much as it was available and cutting the final message to the max byte limit before the output from the pipeline.
This change adds a protection from unlimited allocation.

I'll let the regular beats contributors that are going to review this PR to decide if this needs to be documented.

Fair enough. I will make a quick pitch and leave it to you and reviewers to decide: without that kind of information, it is hard to plan. For example, how can I determine how much memory I will be safe giving filebeat? I may think that if I tell it to read only 1 file at a time, and tell it to use max_bytes of 25MB, that I am very safe giving it 50MB of memory, only to be sad when it OOMs because it actually used 100MB of memory for 1 line that I thought was limited to 25.

Personally I find it already very hard to figure out how much memory it needs because i have to do match like + max_events * num_harvesters * max_size - and im scared of more hidden values (like * 4)

aleksmaus · 2020-07-06T12:57:36Z

Can reduce the memory footprint by not caching the line data before decoding and decoding every read. Need to measure how it affects the performance.
This would require a bit more changes in the exiting code.
The current change is less invasive.
Will leave up to the reviewers feedback.

kvch

Have you taken a look at the functions of streambuf? Is AppendWithCapLimits not usable here? (https://github.com/elastic/beats/blob/master/libbeat/common/streambuf/streambuf.go#L176)

aleksmaus · 2020-07-06T20:52:35Z

@kvch if I'm reading the implementation correctly

beats/libbeat/common/streambuf/streambuf.go

Line 129 in efa5eba

func (b *Buffer) doAppend(data []byte, retainable bool, newCap int) error {

it caps the the number of byte in append operation, while potentially doing huge (10MB default) allocations for backing array:

beats/libbeat/common/streambuf/streambuf.go

Line 145 in efa5eba

b.data = make([]byte, len(data), newCap)

and not really limiting the buffer size growth

beats/libbeat/common/streambuf/streambuf.go

Line 158 in efa5eba

b.data = append(b.data, data...)

It looks like the purpose of AppendWithCapLimits implementation was to reduce the number of reallocations for appends when you know the final possible buffer size.

Please explain, if I misunderstood the way you are proposing to use this.

kvch · 2020-07-09T12:14:53Z

libbeat/reader/readfile/line.go

@@ -28,12 +28,15 @@ import (
 	"github.com/elastic/beats/v7/libbeat/logp"
 )

+const Unlimited = 0


Please do not export this variable if you are not using it outside the readfile package.

This constant potentially could be used for the public NewLineReader config param.
Let me know if still want to hide it.

Personally, I am not a huge fan of exposing something we would potentially use. In general, it is easier to export something unexported, then unexport something exported.

But I do not block this PR because of a philosophical question about a small constant.

Sounds good, will update to not export.

kvch · 2020-07-09T12:23:18Z

libbeat/reader/readfile/line_test.go

@@ -31,10 +35,12 @@ import (
 )

 // Sample texts are from http://www.columbia.edu/~kermit/utf8.html
-var tests = []struct {
+type LineTestSpec struct {


Please do not export this struct as you are not using it from outside the package. Also, rather name it listTestCase. The name you have given it is not idiomatic to this repo.

kvch · 2020-07-09T12:44:50Z

libbeat/reader/readfile/line_test.go

+}
+
+func TestMaxBytesLimit(t *testing.T) {
+	const (


Could you please extract the setup part of the test out into a separate function? The way the test is structured now is hard to read and understand.

kvch

The code itself look good, I have a few minor comments. Please refactor the test you have provided.

kvch · 2020-07-09T12:48:00Z

@aleksmaus You are right. I was just wondering if we could use it.

aleksmaus · 2020-07-13T12:56:50Z

Addressed all of the comments. Let me know if anything else needs to be changed.

kvch · 2020-07-16T14:18:27Z

jenkins run tests

kvch · 2020-07-16T14:25:05Z

@aleksmaus Sorry for the long wait. The code in general looks good to me. However, when a line is truncated Filebeat flags that message by adding "truncated" to the list in log.flags. This current implementation hides this information from the user. Could you please address this problem?

kvch · 2020-07-20T08:22:19Z

@aleksmaus Indeed. Could you please rebase the PR and add a changelog entry to CHANGELOG.next.asciidoc?

ppf2 · 2020-07-20T16:50:57Z

we completely drop the line that is too long (logging it)

Have we also considered truncating the line before logging? We have seen ~1Gb of junk causing filebeat to OOM. It might be helpful to truncate first before logging. Perhaps only log the first 1024 characters or something for the dropped line?

aleksmaus · 2020-07-20T18:56:28Z

we completely drop the line that is too long (logging it)

Have we also considered truncating the line before logging? We have seen ~1Gb of junk causing filebeat to OOM. It might be helpful to truncate first before logging. Perhaps only log the first 1024 characters or something for the dropped line?

the line drop happens before decoding. would have to try to decode for that. the number of characters will be known only after decoding. also the junk will not be necessarily decodable.
thoughts?

aleksmaus · 2020-07-20T19:00:27Z

we completely drop the line that is too long (logging it)

Have we also considered truncating the line before logging? We have seen ~1Gb of junk causing filebeat to OOM. It might be helpful to truncate first before logging. Perhaps only log the first 1024 characters or something for the dropped line?

Ahh, not logging the line content itself. Just the fact that the line is dropped and the length in bytes.

kvch · 2020-07-21T08:34:07Z

@aleksmaus Thank you for the contribution!

(cherry picked from commit 0e049f0)

…-basis-when * upstream/master: [CI] run everything for branches/tags (elastic#20057) Limit the number of bytes read by LineReader in Filebeat (elastic#19552) Prefer testify.Error/NoError over NotNil/Nil (elastic#20044)

…0081) (cherry picked from commit 0e049f0) Co-authored-by: Aleksandr Maus <[email protected]>

(cherry picked from commit 0e049f0)

…0087) (cherry picked from commit 0e049f0) Co-authored-by: Aleksandr Maus <[email protected]>

chery pick from elastic#19552

* bugfix: Add max bytes in line limit elastic#19552 chery pick from elastic#19552 * feature: add file reuse harvester * bugfix: harvester cleanup & state update

Add max bytes in line limit

d76cf49

aleksmaus requested review from andresrc and kvch July 1, 2020 13:34

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2020

aleksmaus added the Team:Services (Deprecated) Label for the former Integrations-Services team label Jul 1, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2020

andresrc added [zube]: Inbox [zube]: In Review and removed [zube]: Inbox labels Jul 2, 2020

Skip over lines exceeding maxSize*4

f694f6c

benbuzbee approved these changes Jul 2, 2020

View reviewed changes

kvch reviewed Jul 6, 2020

View reviewed changes

kvch reviewed Jul 9, 2020

View reviewed changes

kvch suggested changes Jul 9, 2020

View reviewed changes

aleksmaus added 2 commits July 10, 2020 12:48

Addressed code review feedback

6b3ce9c

Make Unlimited constant private

abe3487

kvch approved these changes Jul 20, 2020

View reviewed changes

kvch added the needs_backport PR is waiting to be backported to other branches. label Jul 20, 2020

aleksmaus added 2 commits July 20, 2020 08:02

Merge branch 'master' into fix/oom

2951447

Update CHANGELOG.next.asciidoc

73e861c

kvch merged commit 0e049f0 into elastic:master Jul 21, 2020

zube bot added [zube]: Done and removed [zube]: In Review labels Jul 21, 2020

kvch mentioned this pull request Jul 21, 2020

Cherry-pick #19552 to 7.x: Add max bytes in line limit #20081

Merged

4 tasks

kvch added v7.10.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jul 21, 2020

kvch pushed a commit to kvch/beats that referenced this pull request Jul 21, 2020

Limit the number of bytes read by LineReader in Filebeat (elastic#19552)

74ce896

(cherry picked from commit 0e049f0)

kvch mentioned this pull request Jul 21, 2020

Cherry-pick #19552 to 6.8: Add max bytes in line limit #20087

Merged

4 tasks

kvch added the v6.8.11 label Jul 21, 2020

kvch added a commit that referenced this pull request Jul 21, 2020

Limit the number of bytes read by LineReader in Filebeat (#19552) (#2…

2716901

…0081) (cherry picked from commit 0e049f0) Co-authored-by: Aleksandr Maus <[email protected]>

kvch pushed a commit to kvch/beats that referenced this pull request Jul 22, 2020

Limit the number of bytes read by LineReader in Filebeat (elastic#19552)

25b3ebf

(cherry picked from commit 0e049f0)

kvch added a commit that referenced this pull request Jul 23, 2020

Limit the number of bytes read by LineReader in Filebeat (#19552) (#2…

f431ae6

…0087) (cherry picked from commit 0e049f0) Co-authored-by: Aleksandr Maus <[email protected]>

melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020

Limit the number of bytes read by LineReader in Filebeat (elastic#19552)

926d53e

zube bot removed the [zube]: Done label Oct 19, 2020

benero added a commit to benero/beats that referenced this pull request Jul 20, 2021

bugfix: Add max bytes in line limit elastic#19552

9386dbf

chery pick from elastic#19552

benero mentioned this pull request Jul 21, 2021

MaxBytes & ReuseHarvester TencentBlueKing/beats#6

Merged

Add max bytes in line limit #19552

Add max bytes in line limit #19552

Conversation

aleksmaus commented Jul 1, 2020 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Logs

botelastic bot commented Jul 1, 2020

elasticmachine commented Jul 1, 2020

aleksmaus commented Jul 1, 2020 • edited Loading

elasticmachine commented Jul 1, 2020 • edited by jenkins-beats-ci bot Loading

💔 Build Failed

Build stats

Test stats 🧪

Steps errors

Log output

aleksmaus commented Jul 1, 2020

aleksmaus commented Jul 2, 2020

benbuzbee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksmaus commented Jul 6, 2020

kvch left a comment

Choose a reason for hiding this comment

aleksmaus commented Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

aleksmaus Jul 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvch left a comment

Choose a reason for hiding this comment

kvch commented Jul 9, 2020

aleksmaus commented Jul 13, 2020

kvch commented Jul 16, 2020

kvch commented Jul 16, 2020 • edited Loading

kvch commented Jul 20, 2020

ppf2 commented Jul 20, 2020

aleksmaus commented Jul 20, 2020

aleksmaus commented Jul 20, 2020

kvch commented Jul 21, 2020

aleksmaus commented Jul 1, 2020 •

edited

Loading

aleksmaus commented Jul 1, 2020 •

edited

Loading

elasticmachine commented Jul 1, 2020 •

edited by jenkins-beats-ci bot

Loading

aleksmaus commented Jul 6, 2020 •

edited

Loading

aleksmaus Jul 10, 2020 •

edited

Loading

kvch commented Jul 16, 2020 •

edited

Loading