Skip to content

Commit

Permalink
[pkg/stanza] Fix issue where syslog octet parsing could truncate token (
Browse files Browse the repository at this point in the history
open-telemetry#27294)

This fixes a bug detected while attempting to migrate test to the new
`splittest` framework.

Generally speaking, the responsibility of a `bufio.SplitFunc` is to
parse a token from a given buffer (`[]byte`). However, the split func
does not have control over the size of the buffer, so it must be able to
ask for more data. The mechanism for asking for more data is to return
`0, nil, nil`.

A split func is also told whether there is any more data to read. This
allows it to chose whether to "give up" and return a truncated token, or
to insist that it will wait until there is more data (which may never
happen).

This particular function is parsing tokens based on a simple numerical
prefix which indicates how long the token will be.
e.g. `54 This is the actual token and it is 54 characters long.`

The problem is that the function would give up prematurely and return a
truncated token. The proper behavior is to ask for more data _unless_
the function is specifically told that there is no more data to receive.

This fixes the behavior so that whenever we are able to parse an
expected length but find there is not enough data in the buffer to
fulfill the expectation, we ask for more data. It only returns a
truncated token when there is no more data to ask for.
  • Loading branch information
djaglowski authored and jmsnll committed Nov 12, 2023
1 parent 3592b5c commit 80ea4ce
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 17 deletions.
27 changes: 27 additions & 0 deletions .chloggen/pkg-stanza-syslog-octen-split.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: syslogreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Fix issue where long tokens would be truncated prematurely

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [27294]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
28 changes: 13 additions & 15 deletions pkg/stanza/operator/input/syslog/syslog.go
Original file line number Diff line number Diff line change
Expand Up @@ -150,34 +150,32 @@ func OctetSplitFuncBuilder(_ encoding.Encoding) (bufio.SplitFunc, error) {

func newOctetFrameSplitFunc(flushAtEOF bool) bufio.SplitFunc {
frameRegex := regexp.MustCompile(`^[1-9]\d*\s`)
return func(data []byte, atEOF bool) (advance int, token []byte, err error) {
return func(data []byte, atEOF bool) (int, []byte, error) {
frameLoc := frameRegex.FindIndex(data)
if frameLoc == nil {
// Flush if no more data is expected
if len(data) != 0 && atEOF && flushAtEOF {
token = data
advance = len(data)
return
return len(data), data, nil
}
return 0, nil, nil
}

frameMaxIndex := frameLoc[1]
// delimit space between length and log
// Remove the delimiter (space) between length and log, and parse the length
frameLenValue, err := strconv.Atoi(string(data[:frameMaxIndex-1]))
if err != nil {
return 0, nil, err // read more data and try again.
// This should not be possible because the regex matched.
// However, return an error just in case.
return 0, nil, err
}

advance = frameMaxIndex + frameLenValue
// the limitation here is that we can only line split within a single buffer
// the context of buffer length cannot be pass onto the next scan
capacity := cap(data)
if advance > capacity {
return capacity, data, nil
advance := frameMaxIndex + frameLenValue
if advance > len(data) {
if atEOF && flushAtEOF {
return len(data), data, nil
}
return 0, nil, nil
}
token = data[:advance]
err = nil
return
return advance, data[:advance], nil
}
}
3 changes: 1 addition & 2 deletions pkg/stanza/operator/input/syslog/syslog_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,7 @@ func TestOctetFramingSplitFunc(t *testing.T) {
return newRaw
}(),
ExpectedTokens: []string{
`5000 ` + string(splittest.GenerateBytes(4091)),
`j`,
`5000 ` + string(splittest.GenerateBytes(4092)),
},
},
}
Expand Down

0 comments on commit 80ea4ce

Please sign in to comment.