Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Stanza Key Value Parser incorrectly parsing pairs with custom pair delimiter #31034

Closed
dpaasman00 opened this issue Feb 5, 2024 · 3 comments · Fixed by #31048
Closed
Assignees
Labels
bug Something isn't working pkg/stanza

Comments

@dpaasman00
Copy link
Contributor

dpaasman00 commented Feb 5, 2024

Component(s)

pkg/stanza

What happened?

Description

The Stanza key value parser does not respect quotes while doing pair parsing with a custom pair delimiter.

Steps to Reproduce

Perhaps easiest to reproduce by adding the following unit test to the TestParser unit test in this test file:

{
	"custom pair delimiter in quoted value",
	func(kv *Config) {
		kv.PairDelimiter = "_"
	},
	&entry.Entry{
		Body: `a=b_c="d_e"`,
	},
	&entry.Entry{
		Attributes: map[string]any{
			"a": "b",
			"c": "d_e",
		},
		Body: `a=b_c="d_e"`,
	},
	false,
	false,
},

Expected Result

In the unit test above, we set a custom pair delimiter of _ and provide an input string that has a quoted value which contains the pair delimiter (c="d_e"). I would expect the parser to respect the quotes and split the input string into 2 key value pairs (a=b, c="d_e"). Then the parser would split the pairs into their keys and values resulting in the Attributes map defined in the unit test.

Actual Result

The actual result is during pair parsing the parser does not respect quotes and splits on the quoted _. This means that pair parsing results in these values: a=b, c="d, e". When the parser then tries to split the keys and values in each pair, it fails on e" because there is no delimiter to split on. The resulting output of the test case is

Error:      	Received unexpected error: expected 'e"' to split by '=' into two items, got 1

Collector version

main (present in current commit 292f291)

Environment information

Environment

Don't believe this is necessary

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

I found this bug while adopting the parser over for a new OTTL function discussed in this issue: #30998. This bug occurs when a custom pair delimiter is used becausestrings.Split() gets used to split up the pairs, which does not respect quotes. This bug doesn't occur when using the default pair delimiter (white space) because a custom function passed to strings.FieldsFunc is used which does respect quotes.

@dpaasman00 dpaasman00 added bug Something isn't working needs triage New item requiring triage labels Feb 5, 2024
Copy link
Contributor

github-actions bot commented Feb 5, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski djaglowski removed the needs triage New item requiring triage label Feb 5, 2024
@djaglowski
Copy link
Member

Thanks @dpaasman00. Do you plan to work on this issue? LMK and I'll assign it to you.

@dpaasman00
Copy link
Contributor Author

@djaglowski Yep I plan to work on the fix.

djaglowski added a commit that referenced this issue Feb 8, 2024
…31048)

**Description:** <Describe what has changed.>
Fixes a bug that would occur where quotes were not respected while pair
parsing with a custom pair delimiter. This occurred because in the case
of a custom pair delimiter `strings.Split()` was used which does not
respect quotes. This bug was only an issue with custom pair delimiters
because the default white space delimiter used `strings.FieldsFunc`with
a function that did respect quotes.

This fix adds the function `splitPairs(input, pairDelimiter string)
([]string, error)` which will split the given input on the given
pairDelimiter.

Alternative solutions like `strings.FieldsFunc` and `csv.NewReader` were
considered but not used because they require the delimiter be a rune.
This would have worked in many cases except for when the delimiter is
more than 1 unique character long. Since the delimiter shouldn't be
limited to a single rune, those solutions were rejected.

A consequence of this new function respecting quotes is it will fail and
return an error if any quoted value is ever malformed. For example,
given the input `k1=v1 k2='v2"` the function will return an error
because the single quote used for `k2`'s value is never closed. It will
also fail on the input `k1='v1 k2=v2 k3=v3` for the same reason that the
quote is never closed. I believe this is the behavior we want so that we
don't return incorrect pairs for further parsing.

**Link to tracking Issue:** <Issue number if applicable>
Closes #31034 

**Testing:** <Describe what testing was performed and which tests were
added.>
Added in additional unit tests to verify correct behaviors.

**Documentation:** <Describe the documentation added.>
I don't believe documentation is required as the user shouldn't see any
changes to intended behavior.

---------

Co-authored-by: Daniel Jaglowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg/stanza
Projects
None yet
2 participants