-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Run processors in config order #12113
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code looks good. Should there also be ordering for aggregators?
Hmmm using the line will now randomly mix the order if you use multiple files, doesn't it? Before the order was kept within the files at least. |
86e5b6d
to
5ba356b
Compare
Order needs to be sorted across files Non-ordered processors take precedence
@srebhan good catch! I hadn't considered what would happen when using multiple configuration files. So before all processors were added to
To make sure I got my logic right, so that means:
I've updated the code so that the above properties are maintained and that the processors are only sorted based on line for each file by associating each processor with a unique ID. So the order:
Does this seem like expected behavior? |
@srebhan did you want to take another look before we merged it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement!
I think this will make the order
config less useful as you can now just be sure by reordering the config file?
Nevermind this, It still makes sense to have a processor in a second config file you want to be executed between other processors from a previous config file.
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
Hey guys, I didn't have time for final review.. The tests aren't specific enough as processors have the same name, they should also have an alias or something uniquely identifying in the |
Looking at the code, it does not seem that this is (still) true, I do not see any code that enforces order to start at 1. Also, I would suggest to not require this at all, since it makes maintaining config files easier if you can just do (BASIC-style) order of 10,20,30, etc. so you can easily insert some processors in the chain later without having to renumber everything, or you can add some organization to your processors by putting all that needs to be early in e.g. 100-200, later ones in 200-300, etc. |
Sorry, I didn't find the time to take another look yet. My proposal would have been to initialize |
That was also my impression from looking at the code - that sorting is not stable, though I believe it does always result in an order that satisfies the ordering requirements (e.g. all unordered before ordered etc). However, I do also think that ordering could become simpler and stable by implementing something like:
Step 2 could be problematic if file ids are not predictable (but I guess they could be made predictable by loading files in lexical order). Omitting step 2 would prevent stable sorting (you might get ties in the line number comparison), though you could maybe swap step 2 and 3 so line number decides the order and file id is only used to break ties. |
In my idea the loading of config files should be in this order:
Thus the resulting config and their ordering would be like if you had 1 conf file with all these files concatenated in the order defined above. |
I really don't get the reason for all this "order by line"... First of all, if you need a guaranteed order, use And using the line number is nonsense IMO as the TOML parser will give the plugins in the order specified in the file. So a stable-sort will preserve this order. The only question is should ordered processors be executed before unordered ones or after. Before this PR, they were executed before (order for them was initialized to zero) the ordered ones. With this PR, the ordering is enormously complex:
I think you can get more easily understandable orderings with less code by just using the "old" way and use a stable sort under the assumption that the TOML parser will provide the plugin sections in order. Just my two cent... |
For me, having to explicitly specify order is cumbersome and prone to mistakes. Simply writing the processors in the order they need to run is, to me, a lot more intuitive and makes the config more readable by providing visual structure. But I can see this can be largely a matter of taste. One reason to want to require explicit ordering could be if, lacking an explicit order, you'd want to support running processors in parallel, but I don't think the current config structure allows for this at all. So if you have to pick some order, might as well use the file order.
That's a good point, but it also seems this is solvable. Alternatively, you could ignore this and only do order-by-file-order within each file, with no guarantees about how files are ordered relative to each other (which I think is how I'd be using file ordering anyway, though maybe others have other expectations).
That sounds like a reasonable approach, yeah. I guess that (at least that behavior) is what I sortof aimed for in my previous comment, except I misunderstood the meaning of stablesort (thinking it referred to a sorting criterium that would result in an unambiguous/stable ordering, but I see now that it is a sort algorithm that preserves relative ordering of equal keys). |
Seems this was merged too quickly, sorry about that! Thanks everyone for the input but I am not entirely following what the problem is quite yet, so couple of questions.
@srebhan I was under the impression this isn't the case, the TOML parser returns the plugins stored in a map (https://github.com/influxdata/telegraf/blob/master/config/config.go#L455) so it isn't guaranteed the order returned matches the file order. Is there another place the parser does maintain the order that we can use?
I intended for the multiple configuration file sorting behavior to stay the same as before, were processor plugins with a defined "order" will be sorted based on all plugins across all files. And the only grouping will be for the plugins without a defined order, which should stay the same because isn't
From the discussion in the issue this was what I understood as the expected behavior, the only exception would be maintaining previous behavior when an "order" is defined. |
@sspaink could you already fix the current tests so there is no ambiguity in the expected plugin order list? |
@sspaink this one isn't problematic as it will only return the category ( Btw. the ID you introduced will give me great headaches in #12166... :-) |
After discussion I think we are clear on the next steps to help reduce the complexity of the sort and ensure that multiple configuration files are handled as expected.
I'll work on creating a new pull request to implement this and I will look into updating the tests as well so there is no ambiguity. |
Sounds correct, let us know here the PR number 😉 |
resolves #8016
This PR updates the sorting logic for processors so that the line number in the TOML config is considered. I made the design decision that if a user defined
order
is provided it will take precedence over the defined configuration order. So even if only one processors is defined with an order, and its order is high (e.g. "order=100") it will be run first. While it wasn't documented explicitly before, the examples made it seem order is required to start at 1 and for this change it makes sense to enforce it so that we can detect if an order is defined (default value will be of course0
). I updated the documentation accordingly, please advise if it isn't clear enough.To make sure this solution works I ran the test 10,000 times locally, to ensure the new sorting works as expected.
go test -run TestConfig_MultipleProcessorsOrder -count 10000 ./config
If you use the current sorting logic instead, and run the same test multiple times it will fail.
The reason the processors are random without sorting based on line number, is because the TOML AST table provides a map containing all the plugins and Go doesn't guarantee the order for maps. Duplicate plugins are actually stored in the same map key, and each duplicate plugins unique configuration is stored in a slice so these are sorted correctly. Therefore this random behavior is noticeable when you define multiple unique processor plugins. Therefore the test data I added makes sure to define multiple different processors in the test config so that the problem arises quickly when you run the test multiple times.