Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problem where parser errors would result in duplicate calls to the next operator #330

Merged
merged 3 commits into from
Jun 16, 2021

Conversation

djaglowski
Copy link
Member

Description of Changes

Transformer.HandleEntryError is responsible for the following:

  1. Log that an error occurred
  2. Send or Drop the entry which encountered the error
  3. Return the error encountered

In theory, the responsibility of this method to return the error which it was given is a little odd. However, in practice it is a helpful way for callers to bail out. (return t.HandleEntryError(..., err))

The method was incorrectly returning nil when on_error=send, which ultimately allowed some operators to continue execution on the entry and pass it along to the next operator. (even though it had already been passed along by HandleEntryError.

This PR clarifies the expectation that on_error does not affect whether or not an error is returned when encountered. on_error only affects whether or not the entry that caused the error is passed along or not.

Aside from preventing this bug, and reworking the associated expectations in test cases, there should be no effect from this change. The error is dropped when returned from operator.Process.

Please check that the PR fulfills these requirements

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Add a changelog entry (for non-trivial bug fixes / features)
  • CI passes

@codecov
Copy link

codecov bot commented Jun 15, 2021

Codecov Report

Merging #330 (fde5475) into master (2087e88) will increase coverage by 0.20%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #330      +/-   ##
==========================================
+ Coverage   69.97%   70.17%   +0.20%     
==========================================
  Files         121      121              
  Lines        6371     6370       -1     
==========================================
+ Hits         4458     4470      +12     
+ Misses       1437     1427      -10     
+ Partials      476      473       -3     
Impacted Files Coverage Δ
operator/helper/transformer.go 100.00% <ø> (ø)
operator/builtin/input/stdin/stdin.go 68.42% <0.00%> (+2.63%) ⬆️
operator/builtin/input/tcp/tcp.go 73.68% <0.00%> (+4.39%) ⬆️
operator/builtin/input/file/file.go 78.62% <0.00%> (+4.83%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2087e88...fde5475. Read the comment docs.

@djaglowski
Copy link
Member Author

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 1.3275849 -0.068962455 127.01832 -0.76104736
1 5000 4.845045 -0.08617067 137.9344 +1.5787964
1 10000 10.138268 +0.17272758 143.35223 +1.3659668
1 50000 48.017826 -1.5173798 175.66986 +1.5067444
1 100000 102.126884 +12.504196 247.08083 +25.917572
10 100 1.9310788 +0.017296553 134.79796 -0.5320587
10 500 5.9310465 +0.58607674 137.1052 -2.4935455
10 1000 11.982971 +0.13784218 147.24475 +1.9311676
10 5000 50.357838 -0.2632637 174.2442 -3.2372131
10 10000 98.83853 -0.8279724 211.03476 -13.059265

@djaglowski djaglowski requested a review from jsirianni June 15, 2021 20:21
@djaglowski
Copy link
Member Author

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 1.5344905 +0.13794315 125.603584 -2.1757812
1 5000 5.1898084 +0.2585926 137.2927 +0.937088
1 10000 10.241672 +0.27613068 145.20502 +3.21875
1 50000 50.502903 +0.96769714 175.41136 +1.2482452
1 100000 96.230995 +6.608307 237.83527 +16.672012
10 100 1.8620838 -0.051698446 130.72374 -4.6062775
10 500 5.9483886 +0.6034188 139.50363 -0.09510803
10 1000 12.827988 +0.98285866 143.03946 -2.2741241
10 5000 55.58686 +4.9657593 179.12958 +1.6481628
10 10000 109.365555 +9.699051 231.49919 +7.4051666

@jsirianni
Copy link
Member

I have tested with two configs:

pipeline:
- type: file_input
  include: ["./time.json"]
  start_at: beginning
- type: json_parser
  parse_from: $record
  timestamp:
    layout: "2006-01-02 15:04:05.999"
    layout_type: gotime
    location: America/New_York
    parse_from: Time
- type: file_output
  path: time_duplicate.out
pipeline:
- type: file_input
  include: ["./sev.json"]
  start_at: beginning
- type: json_parser
  parse_from: $record
  severity:
    parse_from: Sev
- type: file_output
  path: sev_duplicate.out

Both input files contain exactly two lines of json. One is valid for the pipeline, and the other is missing the parse_from field for timestamp and severity respectively.

Output from both using master branch (with the bug). The output has an incorrect amount of of output entries.

{"timestamp":"2021-06-15T10:43:07.402-04:00","severity":0,"labels":{"file_name":"time.json"},"record":{"key":"value"}}
{"timestamp":"2021-06-16T11:15:55.502806-04:00","severity":0,"labels":{"file_name":"time.json"},"record":{"key":"value"}}
{"timestamp":"2021-06-16T11:15:55.502806-04:00","severity":0,"labels":{"file_name":"time.json"},"record":{"key":"value"}}

==> sev_duplicate.out <==
{"timestamp":"2021-06-16T11:16:13.51545-04:00","severity":30,"severity_text":"INFO","labels":{"file_name":"sev.json"},"record":{"key":"value"}}
{"timestamp":"2021-06-16T11:16:13.515793-04:00","severity":0,"labels":{"file_name":"sev.json"},"record":{"key":"value"}}
{"timestamp":"2021-06-16T11:16:13.515793-04:00","severity":0,"labels":{"file_name":"sev.json"},"record":{"key":"value"}}

Output from this branch (no bug). The output has the correct amount of output entries.

{"timestamp":"2021-06-16T11:16:23.94746-04:00","severity":30,"severity_text":"INFO","labels":{"file_name":"sev.json"},"record":{"key":"value"}}
{"timestamp":"2021-06-16T11:16:23.947751-04:00","severity":0,"labels":{"file_name":"sev.json"},"record":{"key":"value"}}

==> time_duplicate.out <==
{"timestamp":"2021-06-15T10:43:07.402-04:00","severity":0,"labels":{"file_name":"time.json"},"record":{"key":"value"}}
{"timestamp":"2021-06-16T11:16:28.206724-04:00","severity":0,"labels":{"file_name":"time.json"},"record":{"key":"value"}}

Both situations log the same (correct) error.

jsirianni
jsirianni previously approved these changes Jun 16, 2021
@djaglowski
Copy link
Member Author

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 1.327577 -0.06897032 127.05105 -0.72831726
1 5000 5.1381407 +0.20692492 137.4961 +1.1404877
1 10000 9.862302 -0.10323906 142.4181 +0.431839
1 50000 45.10472 -4.430485 173.30025 -0.86286926
1 100000 90.77794 +1.1552505 223.58109 +2.4178314
10 100 1.8276194 -0.086162806 131.1188 -4.211212
10 500 5.9138694 +0.56889963 140.80482 +1.2060852
10 1000 11.655283 -0.18984604 146.17093 +0.8573456
10 5000 56.222263 +5.601162 180.58527 +3.1038513
10 10000 96.88391 -2.7825928 220.45918 -3.634842

@jsirianni jsirianni self-requested a review June 16, 2021 16:27
@jsirianni jsirianni merged commit 64a2552 into master Jun 16, 2021
@jsirianni jsirianni deleted the fix-duplicate-log-on-parse-error branch June 16, 2021 16:27
@jsirianni jsirianni mentioned this pull request Jun 16, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants