Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the pattern we use to configure topology units (sources/transforms/sinks) #1895

Closed
MOZGIII opened this issue Feb 22, 2020 · 1 comment
Labels
domain: config Anything related to configuring Vector needs: approval Needs review & approval before work can begin. source: file Anything `file` source related type: tech debt A code change that does not add user value.

Comments

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 22, 2020

I was working on a change to file transform and noticed the way it's configured is inefficient: there are places where the data is passed in an unsound way - first, validated at config time, and then assumed valid at runtime. This works, but we can do better.

Here are some practical examples from what I've touched:

https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L201-L203

https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L269-L272

It ended up like that for two reasons:

  • FileConfig is serializable and deserializable.
  • the file source directly uses &FileConfig:

https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L217-L221

Now, this is a problem, cause when you need a file implementation to access a non-deserializable field - you can't just put it at a FileConfig (it has to be deserializable) - and you have to either parse the value at config time and pass it to the file impl some way around the FileConfig, or you have to parse it once at config time, and the parse it again at the "run" time - like we've seen in the example above.

This can trivially be solved for most typical cases if we add a layer in between.

Here's an example: a line_agg::LineAgg, which has "real" line_agg::Config, and an accompanying MultilineConfig, that can be converted via try_into to the "real" config:

https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L88-L122

https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/line_agg.rs#L41-L53

https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/line_agg.rs#L71-L77

With this, the topology config would use the same FileConfig, but the file would use RealFileConfig - a similar type, but with an actual Regex instead of String. The names are just for illustrating the point - don't pay a lot of attention to them here.

I think, to the very least, we should change the whole file source to use the same pattern.

We can also consider using it for other parts. Maybe even introduce a trait to enforce it formally.

This is a follow up from #1852 (comment).

@binarylogic binarylogic added source: file Anything `file` source related type: tech debt A code change that does not add user value. needs: approval Needs review & approval before work can begin. labels Feb 22, 2020
@binarylogic binarylogic added the domain: config Anything related to configuring Vector label Aug 7, 2020
@jszwedko
Copy link
Member

jszwedko commented Aug 1, 2022

Closing since this hasn't seemed to be an issue in practice.

@jszwedko jszwedko closed this as completed Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: config Anything related to configuring Vector needs: approval Needs review & approval before work can begin. source: file Anything `file` source related type: tech debt A code change that does not add user value.
Projects
None yet
Development

No branches or pull requests

3 participants