-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APPSEC-8287] Add AppSec::Processor::RuleMerger #2686
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2686 +/- ##
========================================
Coverage 98.09% 98.09%
========================================
Files 1168 1170 +2
Lines 64221 64332 +111
Branches 2857 2875 +18
========================================
+ Hits 62998 63108 +110
- Misses 1223 1224 +1
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
f1d5a97
to
c26f2e0
Compare
c26f2e0
to
e590c53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I think it needs a few adjustments.
Also, I feel like it would make sense to rebase this PR onto the chain of PRs, the latest add-remote-config-component
one would be a good candidate as a rebase target + base branch for the PR. This would make integration of the merger usable immediately.
if override['rules_override'] | ||
override['rules_override'].each do |rule_override| | ||
rules_override << rule_override | ||
end | ||
elsif override['exclusions'] | ||
override['exclusions'].each do |exclusion| | ||
exclusions << exclusion | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall you mentioned it's probably best if rules_override
and exclusions
processing is split between combine_overrides
and combine_exclusions
.
Do we expect these to be ever realistically mixed?
rules_data = combine_data(data) if data | ||
rules_overrides_and_exclusions = combine_overrides(overrides) if overrides | ||
|
||
rules_dup.merge!(rules_data) if rules_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken combine_data
only ever returns a hash with the 'rules_data'
key?
Maybe if combine_data
returned result
, then result
could be assigned directly to rules_dup['rules_data']
.
This would also save us a hash allocation.
rules_overrides_and_exclusions = combine_overrides(overrides) if overrides | ||
|
||
rules_dup.merge!(rules_data) if rules_data | ||
rules_dup.merge!(rules_overrides_and_exclusions) if rules_overrides_and_exclusions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken combine_overrides
only ever returns a hash with the 'rules_override'
and 'exclusions'
key?
Maybe if combine_overrides
was split and returned result
, then result
could be assigned directly to rules_dup['rules_override']
and rules_dup['exclusions']
This would also save us a hash allocation.
data_exists = result.select { |x| x['id'] == value['id'] } | ||
|
||
if data_exists.any? | ||
existing_data = data_exists.first | ||
|
||
if existing_data['type'] == value['type'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be equivalent to:
if (existing_data = result.find { |x| x['id'] == value['id'] }) &&
if existing_data['type'] == value['type']
# ...
So:
if (existing_data = result.find { |x| x['id'] == value['id'] }) && existing_data['type'] == value['type']
# ...
And a single else
clause
# There could be cases that there is no experitaion value. | ||
# To singal that there is no expiration we use the default value 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was omitted at the source, should we omit it instead of using a default?
result = data1.each_with_object({}) do |value, acc| | ||
acc[value['value']] = value['expiration'] | ||
end | ||
|
||
data2.each do |data| | ||
if result.key?(data['value']) | ||
# The value is duplicated so we need to keep | ||
# the one with the highest expiration value | ||
# We replace it if the expiration is higher than the current one | ||
# or if no experiration | ||
expiration = result[data['value']] | ||
result[data['value']] = data['expiration'] if data['expiration'].nil? || data['expiration'] > expiration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could that last expiration
be nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 😄
module RuleMerger | ||
class << self | ||
def merge(rules:, data: nil, overrides: nil) | ||
rules_dup = rules.dup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the logic below it's a bit hard to track whether no nested mutation happens.
WDYT of adding some deep freeze
to the let
s in specs so that we ensure we never mutate the originals?
end | ||
end | ||
|
||
return unless result.any? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is because empty rulesets are disallowed by libddwaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be any empty ruleset but an empty rules_data
key. We will not add it to the final rules hash.
class Processor | ||
# RuleMerger merge different sources of information | ||
# into the rules payload | ||
module RuleMerger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not merging rules but collections of rules, expirations, and overrides. I'm not happy with the name but I can't figure another one right now so let's go with that.
Consider maybe doing it the other way -- merging this to master (as long as it has no impact on existing users) and merging master into the working branch. This way allows incremental changes to land in master, rather than a big "land-everything" PR that is very hard to review. |
I'm confused, there is no "big PR", there is a chain of (currently) seven PRs, each with a specific and progressive subset of change that should be easier to review in order and in isolation than a big PR, and each allowing incremental changes to land. Whether this PR here is incrementally merged now or downstream of the sequence of PRs is immaterial. Each PR having their own branch based on the previous one, merging this to master and then merging master to the (currently) last branch would effectively freeze all the branches since they would not be rebase-able anymore†. So any change on a specific "top" branch would mean merging that branch down, once in each downstream branch. This would make history a terrible mess. The alternative would be to merge only on the last one, effectively turning the chained but independent branches into one big PR behind the scenes. † I can see some ways to work around that, like merge-preserving rebasing only the last one, then |
In Appsec, there has to be some transformation when the remote configuration client is going to merge existing information with the provided by the agent.
The agent can provide multiple pieces of information regarding new rules data, rules override, or entirely new rules set.
Before being able to instantiate a new
AppSec::Processor
we need to make sure the rules payload has the correct format.The correct format is:
There is also some merging logic we need to account for when merging
rules_data
When combining rules data with the same id, type and value, we need to keep the one with the highest expiration or if no expiration is provided.