Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse_ints config in json parser to support parsing int or float properly #33699

Merged
merged 11 commits into from
Jul 3, 2024

Conversation

newly12
Copy link
Contributor

@newly12 newly12 commented Jun 21, 2024

Description:

expose json iterator config in json parser

Link to tracking Issue:

Fixes #33696

Testing:

added.

Documentation:

updated

@newly12 newly12 requested a review from djaglowski as a code owner June 21, 2024 08:45
@newly12 newly12 requested a review from a team June 21, 2024 08:45
}

type JsoniterConfig struct {
IndentionStep int `mapstructure:"indention_step"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expose all these options to the users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, my second thought on this is at least at this moment we should only expose options we needed, to not bind to the json iterator so tight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently json_parser uses jsoniter.ConfigFastest which has 2 more options enabled

var ConfigFastest = [Config](https://pkg.go.dev/github.com/json-iterator/go#Config){
	EscapeHTML:                    [false](https://pkg.go.dev/builtin#false),
	MarshalFloatWith6Digits:       [true](https://pkg.go.dev/builtin#true),
	ObjectFieldMustBeSimpleString: [true](https://pkg.go.dev/builtin#true),
}.Froze()

to add the use_number config, it looks like we have to create a new jsoniter.Config object probably to have the same config so users are not surprised, I think it makes sense to allow user to be able to turn off MarshalFloatWith6Digits as well. any thoughts or comments on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should only expose what is needed, and we should consider that perhaps some day we need to switch json parsing libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about MarshalFloatWith6Digits, it doesn't affect unmarshal, only marshal, which is not related to the parser. so for now I only added config for UseNumber.

@@ -36,5 +38,47 @@ func (p *Parser) parse(value any) (any, error) {
default:
return nil, fmt.Errorf("type %T cannot be parsed as JSON", value)
}

if p.useNumber {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the conversion explicitly here makes me wonder what is the actual reason for defining the UseNumber setting in the json object's config at first place 🤔 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure either. Looking at the encoding/json it looks like not as these many options as json iterator, which makes more sense to me to only expose use_number option for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the point of enabling the UseNumber of the jsoniter.Config? Is this required for some reason? If so I suggest we document this at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33699/files#diff-158189e84f05b177451492225bc83c2c23fa140cffdd4c5f9bf7db6ada8edc3aR54.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment, please also check this test that when UseNumber is false, regardless the data is int or float, they will be parsed as float64. https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33699/files#diff-f811622a59595f3ded728a9213b5e08d0c5fbe91c4cfff23b25d037f84e1f953R142-R155

pkg/stanza/docs/operators/json_parser.md Outdated Show resolved Hide resolved
@newly12 newly12 force-pushed the json_parser_float branch from 49d68f6 to e3b3b43 Compare June 25, 2024 07:40
@newly12 newly12 changed the title expose json iterator config in json parser Add use_number config in json parser to support decode into int or float properly Jun 25, 2024
@newly12
Copy link
Contributor Author

newly12 commented Jun 28, 2024

@djaglowski @ChrsMark @crobert-1 could you please review again?

ChrsMark

This comment was marked as duplicate.

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we are switching the json library already: #33785

I suggest we wait until the switch is resolved and see what the possible implementation would look like.

newly12 added 2 commits July 1, 2024 10:15
# Conflicts:
#	pkg/stanza/operator/parser/json/config.go
#	pkg/stanza/operator/parser/json/parser.go
@newly12
Copy link
Contributor Author

newly12 commented Jul 1, 2024

updated code for the new json library, since the overhead of enabling UseNumber is high, I added the processing in a if else block, while the decode function when it is disabled remains the same as previous.

BenchmarkProcess
BenchmarkProcess-10             	   99046	     12432 ns/op	    9875 B/op	     114 allocs/op
BenchmarkProcessUseNumber
BenchmarkProcessUseNumber-10    	   74754	     15924 ns/op	   16730 B/op	     158 allocs/op

@ChrsMark
Copy link
Member

ChrsMark commented Jul 1, 2024

Thank's @newly12.

since the overhead of enabling UseNumber is high

Would that make sense to mention this in the docs so as users know that enabling this option comes with a price?

Copy link
Contributor

@tiffany76 tiffany76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small documentation suggestion. Thanks.

| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
| `timestamp` | `nil` | An optional [timestamp](../types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |
| `use_number` | `false` | Numbers like `int` and `float` are parsed as `float64` by default, when `use_number` is enabled, numbers are parsed as `json.Number` and then converted to `int64` or `float64` based on the value. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `use_number` | `false` | Numbers like `int` and `float` are parsed as `float64` by default, when `use_number` is enabled, numbers are parsed as `json.Number` and then converted to `int64` or `float64` based on the value. |
| `use_number` | `false` | Numbers like `int` and `float` are parsed as `float64` by default. When `use_number` is enabled, numbers are parsed as `json.Number` and then converted to `int64` or `float64` based on the value. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

@newly12
Copy link
Contributor Author

newly12 commented Jul 2, 2024

Thank's @newly12.

since the overhead of enabling UseNumber is high

Would that make sense to mention this in the docs so as users know that enabling this option comes with a price?

Done, PTAL.

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank's! Not sure if we could avoid this performance overhead, but since that's behind the specific setting I think we can land it and evaluate its adoption.

@djaglowski please take a look as well.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, with one open question:

Should we use a different name for the parameter than use_number? To me the name seems to communicate an implementation detail, rather than the effect that it has. I would think something like parse_ints might be more informative. That said, I think the documentation is very clear so I don't think this is required.

@newly12
Copy link
Contributor Author

newly12 commented Jul 3, 2024

Looks good, thank's! Not sure if we could avoid this performance overhead, but since that's behind the specific setting I think we can land it and evaluate its adoption.

@djaglowski please take a look as well.

I checked fastjson code a little bit but looks like not much can be done apart from current approach, given limited options/interface exposed by the library.

@newly12
Copy link
Contributor Author

newly12 commented Jul 3, 2024

This looks good to me, with one open question:

Should we use a different name for the parameter than use_number? To me the name seems to communicate an implementation detail, rather than the effect that it has. I would think something like parse_ints might be more informative. That said, I think the documentation is very clear so I don't think this is required.

make sense to me, it is more informative, I've changed the config name to parse_ints, please take another look.

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just make sure to update the PR's title as well ;)

@newly12 newly12 changed the title Add use_number config in json parser to support decode into int or float properly Add parse_ints config in json parser to support parsing int or float properly Jul 3, 2024
@djaglowski
Copy link
Member

Thanks @newly12

@djaglowski djaglowski merged commit 62ea24b into open-telemetry:main Jul 3, 2024
154 checks passed
@github-actions github-actions bot added this to the next release milestone Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json_parser: number is always converted to float64
6 participants