Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Support to Customize bufio.SplitFunc #14605

Conversation

atingchen
Copy link
Contributor

Description:
Support to Customize bufio.SplitFunc

Link to tracking Issue:
#14593

Testing:
Unit test

Documentation:
Update README.md

@atingchen atingchen requested a review from a team September 30, 2022 11:11
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitant about this implementation. It looks like it will work but it's hard to accept that we have to ignore or override configuration parameters.

I have another idea that I think we need to explore first. Can we remove all usage of Multiline from the fileconsumer package?

We cannot remove Multiline from the receivers that already depend on it. However, we may be able to have their config look like:

type FileLogConfig {
      fileconsumer.Config `mapstructure:",squash"`
      helper.Multiline    `mapstructure:",squash"`
}

Then what we need to do is identify the necessary interface that allow fileconsumer to work the same as it did before. Essentially, we would make fileconsumer always use "custom" split behavior. What do you think, is it possible?

@atingchen
Copy link
Contributor Author

I am hesitant about this implementation. It looks like it will work but it's hard to accept that we have to ignore or override configuration parameters.

I have another idea that I think we need to explore first. Can we remove all usage of Multiline from the fileconsumer package?

We cannot remove Multiline from the receivers that already depend on it. However, we may be able to have their config look like:

type FileLogConfig {
      fileconsumer.Config `mapstructure:",squash"`
      helper.Multiline    `mapstructure:",squash"`
}

Then what we need to do is identify the necessary interface that allow fileconsumer to work the same as it did before. Essentially, we would make fileconsumer always use "custom" split behavior. What do you think, is it possible?

Sounds like a challenge, I want to try.

@atingchen
Copy link
Contributor Author

we would make fileconsumer always use "custom" split behavior.

I think we should let the user choose whether to set a "custom" splitFunc.

If the user does not choose to use "custom" splitFunc, we provide the user with a default spliFunc, such as splitNone.

What do you think?

@atingchen atingchen force-pushed the feature/lookchen-customize-splitfunc branch from 9685715 to 21323e0 Compare October 1, 2022 09:21
@atingchen
Copy link
Contributor Author

atingchen commented Oct 1, 2022

I tried removing Multiline from the filecomsume package:

  • filecomsume.Config remove SplitterConfig and add EncodingConfig ,FlusherConfig
  • filecomsume.Build allows user to set custom splitFunc.
  • Add splitterFactory which is responsible for building helper.Splitter
  • readerFactory builds helper.Splitter using splitterFactory instead of SplitterConfig

TODO:
Add test unit and update README.

@atingchen
Copy link
Contributor Author

I tried removing Multiline from the filecomsume package:

  • filecomsume.Config remove SplitterConfig and add EncodingConfig ,FlusherConfig
  • filecomsume.Build allows user to set custom splitFunc.
  • Add splitterFactory which is responsible for building helper.Splitter
  • readerFactory builds helper.Splitter using splitterFactory instead of SplitterConfig

TODO: Add test unit and update README.

What do you think of this implement? @djaglowski

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atingchen, thanks for doing this. I think this looks a lot like what I was expected, but I'm having a hard time feeling confident in the changes. This is in part because this areas of the codebase is a bit tangled up.

We may have to iterate on this one slowly and carefully. Another thing that may help, if you have ideas, is to make smaller changes that make sense on their own and work towards this design.


input.fileConsumer, err = c.Config.Build(logger, input.emit)
input.fileConsumer, err = c.Config.Build(logger, input.emit,
fileconsumer.WithMultilineSplitter(c.EncodingConfig, c.Flusher, c.MultilineConfig))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.EncodingConfig and c.Flusher are fields that already belong to the fileconsumer.Config. It seems like we should be able to ask the config for a splitter based only on the one parameter that it does not have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atingchen, thanks for doing this. I think this looks a lot like what I was expected, but I'm having a hard time feeling confident in the changes. This is in part because this areas of the codebase is a bit tangled up.

We may have to iterate on this one slowly and carefully. Another thing that may help, if you have ideas, is to make smaller changes that make sense on their own and work towards this design.

Thank you for your CR. I will try to split it into several small PRs to gradually implement this feature.

Comment on lines +21 to +26
// Splitter consolidates Flusher and dependent splitFunc
type Splitter struct {
Encoding Encoding
SplitFunc bufio.SplitFunc
Flusher *Flusher
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need this struct. The encoding and flusher are not used in this file. Should splitter factory just return a split func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the splitter factory should only return a split func, so that the responsibilities of the splitter factory are more single.
But if we remove Splitter, here we need to modify readFactory and Reader. Do we need to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants