Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] remove all usage of Multiline from the fileconsumer package #14947

Closed
11 changes: 11 additions & 0 deletions .chloggen/stanza-remove-mutiline-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: pkg/stanza

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "remove all usage of Multiline from the `fileconsumer` package"

# One or more tracking issues related to the change
issues: [14766]
22 changes: 16 additions & 6 deletions pkg/stanza/fileconsumer/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ func NewConfig() *Config {
IncludeFileNameResolved: false,
IncludeFilePathResolved: false,
PollInterval: 200 * time.Millisecond,
Splitter: helper.NewSplitterConfig(),
EncodingConfig: helper.NewEncodingConfig(),
Flusher: helper.NewFlusherConfig(),
StartAt: "end",
FingerprintSize: DefaultFingerprintSize,
MaxLogSize: defaultMaxLogSize,
Expand All @@ -57,11 +58,12 @@ type Config struct {
FingerprintSize helper.ByteSize `mapstructure:"fingerprint_size,omitempty"`
MaxLogSize helper.ByteSize `mapstructure:"max_log_size,omitempty"`
MaxConcurrentFiles int `mapstructure:"max_concurrent_files,omitempty"`
Splitter helper.SplitterConfig `mapstructure:",squash,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also take MultilineConfig out of SplitterConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggested removing it from the fileconsumer package earlier in #14605, so I removed it here~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm just trying to make sure we end up in the correct end state.

EncodingConfig helper.EncodingConfig `mapstructure:",squash,omitempty"`
Flusher helper.FlusherConfig `mapstructure:",squash,omitempty"`
}

// Build will build a file input operator from the supplied configuration
func (c Config) Build(logger *zap.SugaredLogger, emit EmitFunc) (*Manager, error) {
func (c Config) Build(logger *zap.SugaredLogger, emit EmitFunc, opts ...FactoryOption) (*Manager, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delayed review - I'm finally having time to look at this again.

This new parameter type is a bit confusing because the FactoryOption is really an option for the splitter factory, but this isn't necessarily clear. I had to look back at your original proposal here to remember why we are making these changes. The original goal was to be able to pass in a splitFunc, which I think makes a lot of sense. However, we have realized there are some parts that are needed outside of the splitFunc as well, such as encoding.

Rather than pass in options, I think we should take a step back and clarify the problems with our original plan. Then we can try to solve those problems.

What specific problems do we have if we pass in splitFunc directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to review this code.
In fact, the main purpose of this code is to remove the parameter MultilineConfig from the package fileconsumer as you mentioned.
Of course, we can also directly pass in the splitFunc parameter. We also need to consider what to do when the user uses the MultilineConfig parameter and passes in a custom splitFunc at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the main purpose of this code is to remove the parameter MultilineConfig from the package fileconsumer as you mentioned.

Sure, I just want to make sure we are solving the problem you initially proposed in #14593.

We also need to consider what to do when the user uses the MultilineConfig parameter and passes in a custom splitFunc at the same time.

I don't know that we need to worry about this case. I think if it is desired to user MultilineConfig, then that is used to generate the SplitFunc in some way. Then passing in the SplitFunc works regardless of whether or not Multiline is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the user passes in the SplitFunc, whenever the MultilineConfig is set, the SplitFunc passed in by the user is used. Will this confuse users? Is it possible to consider throwing an error?

Regarding whether to remove from the fileconsumer package, we can open an issue for discussion later. And all we need to do now is to go back to the original purpose and let fileconsumer support custom SplitFunc.

What do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the user passes in the SplitFunc, whenever the MultilineConfig is set, the SplitFunc passed in by the user is used. Will this confuse users? Is it possible to consider throwing an error?

Sorry, but this I don't think it should work this way, even temporarily. This would be confusing. We should use MultilineConfig to generate the splitFunc, then pass it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, it will confuse users.
I think we should throw an error when user not only customize splitFunc but also set MultilineConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case here is that we did not remove the multilineConfig from the fileconsumer, while allowing the user to pass in a custom SplitFunc when calling the Build method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reconsider your opinion, the Build method passes in an Options on the spliterFactory that the responsibility is not clear.
If we add a new method BuildWithSplitter, this method can directly pass in a user-defined splitFunc. What do you think of this way? @djaglowski

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense to me. I think we should try it. Thanks for finding another solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also not affect the existing interface. I will open a new PR to implement this.

if emit == nil {
return nil, fmt.Errorf("must provide emit function")
}
Expand Down Expand Up @@ -100,9 +102,16 @@ func (c Config) Build(logger *zap.SugaredLogger, emit EmitFunc) (*Manager, error
return nil, fmt.Errorf("`fingerprint_size` must be at least %d bytes", MinFingerprintSize)
}

// Ensure that encoding is buildable
enc, err := c.EncodingConfig.Build()
if err != nil {
return nil, err
}

flusher := c.Flusher.Build()
// Ensure that splitter is buildable
factory := newMultilineSplitterFactory(c.Splitter.EncodingConfig, c.Splitter.Flusher, c.Splitter.Multiline)
_, err := factory.Build(int(c.MaxLogSize))
factory := NewFactory(opts...)
_, err = factory.Build(int(c.MaxLogSize), enc.Encoding, flusher)
if err != nil {
return nil, err
}
Expand All @@ -129,7 +138,8 @@ func (c Config) Build(logger *zap.SugaredLogger, emit EmitFunc) (*Manager, error
},
fromBeginning: startAtBeginning,
splitterFactory: factory,
encodingConfig: c.Splitter.EncodingConfig,
encodingConfig: c.EncodingConfig,
flusherConfig: c.Flusher,
},
finder: c.Finder,
roller: newRoller(),
Expand Down
129 changes: 2 additions & 127 deletions pkg/stanza/fileconsumer/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,46 +248,6 @@ func TestUnmarshal(t *testing.T) {
return newMockOperatorConfig(cfg)
}(),
},
{
Name: "multiline_line_start_string",
Expect: func() *mockOperatorConfig {
cfg := NewConfig()
newSplit := helper.NewSplitterConfig()
newSplit.Multiline.LineStartPattern = "Start"
cfg.Splitter = newSplit
return newMockOperatorConfig(cfg)
}(),
},
{
Name: "multiline_line_start_special",
Expect: func() *mockOperatorConfig {
cfg := NewConfig()
newSplit := helper.NewSplitterConfig()
newSplit.Multiline.LineStartPattern = "%"
cfg.Splitter = newSplit
return newMockOperatorConfig(cfg)
}(),
},
{
Name: "multiline_line_end_string",
Expect: func() *mockOperatorConfig {
cfg := NewConfig()
newSplit := helper.NewSplitterConfig()
newSplit.Multiline.LineEndPattern = "Start"
cfg.Splitter = newSplit
return newMockOperatorConfig(cfg)
}(),
},
{
Name: "multiline_line_end_special",
Expect: func() *mockOperatorConfig {
cfg := NewConfig()
newSplit := helper.NewSplitterConfig()
newSplit.Multiline.LineEndPattern = "%"
cfg.Splitter = newSplit
return newMockOperatorConfig(cfg)
}(),
},
{
Name: "start_at_string",
Expect: func() *mockOperatorConfig {
Expand Down Expand Up @@ -340,15 +300,15 @@ func TestUnmarshal(t *testing.T) {
Name: "encoding_lower",
Expect: func() *mockOperatorConfig {
cfg := NewConfig()
cfg.Splitter.EncodingConfig = helper.EncodingConfig{Encoding: "utf-16le"}
cfg.EncodingConfig = helper.EncodingConfig{Encoding: "utf-16le"}
return newMockOperatorConfig(cfg)
}(),
},
{
Name: "encoding_upper",
Expect: func() *mockOperatorConfig {
cfg := NewConfig()
cfg.Splitter.EncodingConfig = helper.EncodingConfig{Encoding: "UTF-16lE"}
cfg.EncodingConfig = helper.EncodingConfig{Encoding: "UTF-16lE"}
return newMockOperatorConfig(cfg)
}(),
},
Expand Down Expand Up @@ -398,91 +358,6 @@ func TestBuild(t *testing.T) {
require.Error,
nil,
},
{
"MultilineConfiguredStartAndEndPatterns",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{
LineEndPattern: "Exists",
LineStartPattern: "Exists",
}
},
require.Error,
nil,
},
{
"MultilineConfiguredStartPattern",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{
LineStartPattern: "START.*",
}
},
require.NoError,
func(t *testing.T, f *Manager) {},
},
{
"MultilineConfiguredEndPattern",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{
LineEndPattern: "END.*",
}
},
require.NoError,
func(t *testing.T, f *Manager) {},
},
{
"InvalidEncoding",
func(f *Config) {
f.Splitter.EncodingConfig = helper.EncodingConfig{Encoding: "UTF-3233"}
},
require.Error,
nil,
},
{
"LineStartAndEnd",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{
LineStartPattern: ".*",
LineEndPattern: ".*",
}
},
require.Error,
nil,
},
{
"NoLineStartOrEnd",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{}
},
require.NoError,
func(t *testing.T, f *Manager) {},
},
{
"InvalidLineStartRegex",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{
LineStartPattern: "(",
}
},
require.Error,
nil,
},
{
"InvalidLineEndRegex",
func(f *Config) {
f.Splitter = helper.NewSplitterConfig()
f.Splitter.Multiline = helper.MultilineConfig{
LineEndPattern: "(",
}
},
require.Error,
nil,
},
}

for _, tc := range cases {
Expand Down
10 changes: 5 additions & 5 deletions pkg/stanza/fileconsumer/file_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ func TestReadUsingNopEncoding(t *testing.T) {
cfg := NewConfig().includeDir(tempDir)
cfg.StartAt = "beginning"
cfg.MaxLogSize = 8
cfg.Splitter.EncodingConfig.Encoding = "nop"
cfg.EncodingConfig.Encoding = "nop"
operator, emitCalls := buildTestManager(t, cfg)

// Create a file, then start
Expand Down Expand Up @@ -368,7 +368,7 @@ func TestNopEncodingDifferentLogSizes(t *testing.T) {
cfg := NewConfig().includeDir(tempDir)
cfg.StartAt = "beginning"
cfg.MaxLogSize = tc.maxLogSize
cfg.Splitter.EncodingConfig.Encoding = "nop"
cfg.EncodingConfig.Encoding = "nop"
operator, emitCalls := buildTestManager(t, cfg)

// Create a file, then start
Expand Down Expand Up @@ -491,8 +491,8 @@ func TestNoNewline(t *testing.T) {
tempDir := t.TempDir()
cfg := NewConfig().includeDir(tempDir)
cfg.StartAt = "beginning"
cfg.Splitter = helper.NewSplitterConfig()
cfg.Splitter.Flusher.Period = time.Nanosecond
cfg.Flusher = helper.NewFlusherConfig()
cfg.Flusher.Period = time.Nanosecond
operator, emitCalls := buildTestManager(t, cfg)

temp := openTemp(t, tempDir)
Expand Down Expand Up @@ -1163,7 +1163,7 @@ func TestEncodings(t *testing.T) {
tempDir := t.TempDir()
cfg := NewConfig().includeDir(tempDir)
cfg.StartAt = "beginning"
cfg.Splitter.EncodingConfig = helper.EncodingConfig{Encoding: tc.encoding}
cfg.EncodingConfig = helper.EncodingConfig{Encoding: tc.encoding}
operator, emitCalls := buildTestManager(t, cfg)

// Populate the file
Expand Down
16 changes: 9 additions & 7 deletions pkg/stanza/fileconsumer/reader_factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ type readerFactory struct {
fromBeginning bool
splitterFactory splitterFactory
encodingConfig helper.EncodingConfig
flusherConfig helper.FlusherConfig
}

func (f *readerFactory) newReader(file *os.File, fp *Fingerprint) (*Reader, error) {
Expand Down Expand Up @@ -94,21 +95,22 @@ func (b *readerBuilder) build() (r *Reader, err error) {
Offset: b.offset,
}

enc, err := b.encodingConfig.Build()
if err != nil {
return
}
r.encoding = enc

force := b.flusherConfig.Build()
if b.splitFunc != nil {
r.splitFunc = b.splitFunc
} else {
r.splitFunc, err = b.splitterFactory.Build(b.readerConfig.maxLogSize)
r.splitFunc, err = b.splitterFactory.Build(b.readerConfig.maxLogSize, enc.Encoding, force)
if err != nil {
return
}
}

enc, err := b.encodingConfig.Build()
if err != nil {
return
}
r.encoding = enc

if b.file != nil {
r.file = b.file
r.SugaredLogger = b.SugaredLogger.With("path", b.file.Name())
Expand Down
7 changes: 3 additions & 4 deletions pkg/stanza/fileconsumer/reader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,9 @@ func testReaderFactory(t *testing.T) (*readerFactory, chan *emitParams) {
emitChan <- &emitParams{attrs, token}
},
},
fromBeginning: true,
splitterFactory: newMultilineSplitterFactory(
helper.NewEncodingConfig(), helper.NewFlusherConfig(), helper.NewMultilineConfig()),
encodingConfig: helper.NewEncodingConfig(),
fromBeginning: true,
splitterFactory: newMultilineSplitterFactory(helper.NewMultilineConfig()),
encodingConfig: helper.NewEncodingConfig(),
}, emitChan
}

Expand Down
Loading