Improve the pattern we use to configure topology units (sources/transforms/sinks) #1895
Labels
domain: config
Anything related to configuring Vector
needs: approval
Needs review & approval before work can begin.
source: file
Anything `file` source related
type: tech debt
A code change that does not add user value.
I was working on a change to
file
transform and noticed the way it's configured is inefficient: there are places where the data is passed in an unsound way - first, validated at config time, and then assumed valid at runtime. This works, but we can do better.Here are some practical examples from what I've touched:
https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L201-L203
https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L269-L272
It ended up like that for two reasons:
FileConfig
is serializable and deserializable.file
source directly uses&FileConfig
:https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L217-L221
Now, this is a problem, cause when you need a
file
implementation to access a non-deserializable field - you can't just put it at aFileConfig
(it has to be deserializable) - and you have to either parse the value at config time and pass it to thefile
impl some way around theFileConfig
, or you have to parse it once at config time, and the parse it again at the "run" time - like we've seen in the example above.This can trivially be solved for most typical cases if we add a layer in between.
Here's an example: a
line_agg::LineAgg
, which has "real"line_agg::Config
, and an accompanyingMultilineConfig
, that can be converted viatry_into
to the "real" config:https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/mod.rs#L88-L122
https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/line_agg.rs#L41-L53
https://github.com/timberio/vector/blob/89026d0a9a0dc99022ab116f71ef561995b78c69/src/sources/file/line_agg.rs#L71-L77
With this, the topology config would use the same
FileConfig
, but thefile
would useRealFileConfig
- a similar type, but with an actualRegex
instead ofString
. The names are just for illustrating the point - don't pay a lot of attention to them here.I think, to the very least, we should change the whole
file
source to use the same pattern.We can also consider using it for other parts. Maybe even introduce a trait to enforce it formally.
This is a follow up from #1852 (comment).
The text was updated successfully, but these errors were encountered: