Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse_ints config in json parser to support parsing int or float properly #33699

Merged
merged 11 commits into from
Jul 3, 2024
27 changes: 27 additions & 0 deletions .chloggen/json_parser_number_data_type.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: pkg/stanza

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add `use_number` config in json parser to support decode into int or float properly

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33696]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
21 changes: 11 additions & 10 deletions pkg/stanza/docs/operators/json_parser.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@ The `json_parser` operator parses the string-type field selected by `parse_from`

### Configuration Fields

| Field | Default | Description |
| --- | --- | --- |
| `id` | `json_parser` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
| `parse_to` | `attributes` | The [field](../types/field.md) to which the value will be parsed. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
| `timestamp` | `nil` | An optional [timestamp](../types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |
| Field | Default | Description |
| --- | --- | --- |
| `id` | `json_parser` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
| `parse_to` | `attributes` | The [field](../types/field.md) to which the value will be parsed. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
| `timestamp` | `nil` | An optional [timestamp](../types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |
| `use_number` | `false` | Numbers like `int` and `float` are parsed as `float64` by default, when `use_number` is enabled, numbers are parsed as `json.Number` and then converted to `int64` or `float64` based on the value. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `use_number` | `false` | Numbers like `int` and `float` are parsed as `float64` by default, when `use_number` is enabled, numbers are parsed as `json.Number` and then converted to `int64` or `float64` based on the value. |
| `use_number` | `false` | Numbers like `int` and `float` are parsed as `float64` by default. When `use_number` is enabled, numbers are parsed as `json.Number` and then converted to `int64` or `float64` based on the value. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.


### Embedded Operations

Expand Down
18 changes: 17 additions & 1 deletion pkg/stanza/operator/parser/json/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ func NewConfigWithID(operatorID string) *Config {
// Config is the configuration of a JSON parser operator.
type Config struct {
helper.ParserConfig `mapstructure:",squash"`

UseNumber bool `mapstructure:"use_number"`
}

// Build will build a JSON parser operator.
Expand All @@ -41,8 +43,22 @@ func (c Config) Build(set component.TelemetrySettings) (operator.Operator, error
return nil, err
}

// jsonConfig defaults to jsoniter.ConfigFastest for backward compatibility
var jsonConfig = jsoniter.Config{
EscapeHTML: false,
MarshalFloatWith6Digits: true,
ObjectFieldMustBeSimpleString: true,
}

// override the default values with the values from the config
// when UseNumber is disabled, `int` and `float` will be parsed as `float64`.
// when it is enabled, they will be parsed as `json.Number`, later the parser
// will convert them to `int` or `float64` according to the field type.
jsonConfig.UseNumber = c.UseNumber

return &Parser{
ParserOperator: parserOperator,
json: jsoniter.ConfigFastest,
json: jsonConfig.Froze(),
useNumber: c.UseNumber,
}, nil
}
8 changes: 8 additions & 0 deletions pkg/stanza/operator/parser/json/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@ func TestConfig(t *testing.T) {
return p
}(),
},
{
Name: "use_number",
Expect: func() *Config {
p := NewConfig()
p.UseNumber = true
return p
}(),
},
},
}.Run(t)
}
46 changes: 45 additions & 1 deletion pkg/stanza/operator/parser/json/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ package json // import "github.com/open-telemetry/opentelemetry-collector-contri

import (
"context"
"encoding/json"
"fmt"

jsoniter "github.com/json-iterator/go"
Expand All @@ -16,7 +17,8 @@ import (
// Parser is an operator that parses JSON.
type Parser struct {
helper.ParserOperator
json jsoniter.API
json jsoniter.API
useNumber bool
}

// Process will parse an entry for JSON.
Expand All @@ -36,5 +38,47 @@ func (p *Parser) parse(value any) (any, error) {
default:
return nil, fmt.Errorf("type %T cannot be parsed as JSON", value)
}

if p.useNumber {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the conversion explicitly here makes me wonder what is the actual reason for defining the UseNumber setting in the json object's config at first place 🤔 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure either. Looking at the encoding/json it looks like not as these many options as json iterator, which makes more sense to me to only expose use_number option for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the point of enabling the UseNumber of the jsoniter.Config? Is this required for some reason? If so I suggest we document this at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33699/files#diff-158189e84f05b177451492225bc83c2c23fa140cffdd4c5f9bf7db6ada8edc3aR54.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment, please also check this test that when UseNumber is false, regardless the data is int or float, they will be parsed as float64. https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33699/files#diff-f811622a59595f3ded728a9213b5e08d0c5fbe91c4cfff23b25d037f84e1f953R142-R155

p.convertNumbers(parsedValue)
}
return parsedValue, nil
}

func (p *Parser) convertNumbers(parsedValue map[string]any) {
for k, v := range parsedValue {
switch t := v.(type) {
case json.Number:
parsedValue[k] = p.convertNumber(t)
case map[string]any:
p.convertNumbers(t)
case []any:
p.convertNumbersArray(t)
}
}
}

func (p *Parser) convertNumbersArray(arr []any) {
for i, v := range arr {
switch t := v.(type) {
case json.Number:
arr[i] = p.convertNumber(t)
case map[string]any:
p.convertNumbers(t)
case []any:
p.convertNumbersArray(t)
}
}
}

func (p *Parser) convertNumber(value json.Number) any {
i64, err := value.Int64()
if err == nil {
return i64
}
f64, err := value.Float64()
if err == nil {
return f64
}
return value.String()
}
110 changes: 110 additions & 0 deletions pkg/stanza/operator/parser/json/parser_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,116 @@ func TestParser(t *testing.T) {
ScopeName: "logger",
},
},
{
"use_number_disabled",
func(_ *Config) {},
&entry.Entry{
Body: `{"int":1,"float":1.0}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": float64(1),
"float": float64(1),
},
Body: `{"int":1,"float":1.0}`,
},
},
{
"use_number_simple",
func(p *Config) {
p.UseNumber = true
},
&entry.Entry{
Body: `{"int":1,"float":1.0}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
},
Body: `{"int":1,"float":1.0}`,
},
},
{
"use_number_nested",
func(p *Config) {
p.UseNumber = true
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0}}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"nested": map[string]any{
"int": int64(2),
"float": float64(2),
},
},
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0}}`,
},
},
{
"use_number_arrays",
func(p *Config) {
p.UseNumber = true
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0},"array":[1,2]}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"nested": map[string]any{
"int": int64(2),
"float": float64(2),
},
"array": []any{int64(1), int64(2)},
},
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0},"array":[1,2]}`,
},
},
{
"use_number_mixed_arrays",
func(p *Config) {
p.UseNumber = true
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"mixed_array":[1,1.5,2]}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"mixed_array": []any{int64(1), float64(1.5), int64(2)},
},
Body: `{"int":1,"float":1.0,"mixed_array":[1,1.5,2]}`,
},
},
{
"use_number_nested_arrays",
func(p *Config) {
p.UseNumber = true
},
&entry.Entry{
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0,"array":[1,2]},"array":[3,4]}`,
},
&entry.Entry{
Attributes: map[string]any{
"int": int64(1),
"float": float64(1),
"nested": map[string]any{
"int": int64(2),
"float": float64(2),
"array": []any{int64(1), int64(2)},
},
"array": []any{int64(3), int64(4)},
},
Body: `{"int":1,"float":1.0,"nested":{"int":2,"float":2.0,"array":[1,2]},"array":[3,4]}`,
},
},
}

for _, tc := range cases {
Expand Down
3 changes: 3 additions & 0 deletions pkg/stanza/operator/parser/json/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,6 @@ timestamp:
parse_from: body.timestamp_field
layout_type: strptime
layout: '%Y-%m-%d'
use_number:
type: json_parser
use_number: true