New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[APPSEC-8287] Add AppSec::Processor::RuleMerger #2686

Merged

lloeki merged 3 commits into master from appsec-rule-merging

Mar 21, 2023

Member

GustavoCaso commented Mar 13, 2023 •

edited

Loading

In Appsec, there has to be some transformation when the remote configuration client is going to merge existing information with the provided by the agent.

The agent can provide multiple pieces of information regarding new rules data, rules override, or entirely new rules set.

Before being able to instantiate a new AppSec::Processor we need to make sure the rules payload has the correct format.

The correct format is:

{
    version: ...,
    metadata: ...,
    rules: [...],
    exclusions: [...],
    rules_override: [...],
    rules_data: [...]
}

There is also some merging logic we need to account for when merging rules_data

When combining rules data with the same id, type and value, we need to keep the one with the highest expiration or if no expiration is provided.

github-actions bot added the appsec label

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #2686 (f1d5a97) into master (22541af) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master    #2686    +/-   ##
========================================
  Coverage   98.09%   98.09%            
========================================
  Files        1168     1170     +2     
  Lines       64221    64332   +111     
  Branches     2857     2875    +18     
========================================
+ Hits        62998    63108   +110     
- Misses       1223     1224     +1

Impacted Files	Coverage Δ
lib/datadog/appsec/processor/rule_merger.rb	`100.00% <100.00%> (ø)`
spec/datadog/appsec/processor/rule_merger_spec.rb	`100.00% <100.00%> (ø)`

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

GustavoCaso force-pushed the appsec-rule-merging branch from f1d5a97 to c26f2e0 Compare

March 13, 2023 09:14

GustavoCaso changed the title ~~Add AppSec::Processor::RuleMerger~~ [APPSEC-8287] Add AppSec::Processor::RuleMerger

GustavoCaso marked this pull request as ready for review

March 13, 2023 09:23

GustavoCaso requested review from a team and lloeki

March 13, 2023 09:23


          Add AppSec::Processor::RuleMerger

e590c53

GustavoCaso force-pushed the appsec-rule-merging branch from c26f2e0 to e590c53 Compare

March 13, 2023 09:29

lloeki approved these changes

View reviewed changes

Member

lloeki left a comment

LGTM, but I think it needs a few adjustments.

Also, I feel like it would make sense to rebase this PR onto the chain of PRs, the latest add-remote-config-component one would be a good candidate as a rebase target + base branch for the PR. This would make integration of the merger usable immediately.

lib/datadog/appsec/processor/rule_merger.rb Outdated

Comment on lines 85 to 93

+                            if override['rules_override']
+                              override['rules_override'].each do |rule_override|
+                                rules_override << rule_override
+                              end
+                            elsif override['exclusions']
+                              override['exclusions'].each do |exclusion|
+                                exclusions << exclusion
+                              end
+                            end

Member

lloeki Mar 14, 2023

I seem to recall you mentioned it's probably best if rules_override and exclusions processing is split between combine_overrides and combine_exclusions.

Do we expect these to be ever realistically mixed?

lib/datadog/appsec/processor/rule_merger.rb Outdated

+                          rules_data = combine_data(data) if data
+                          rules_overrides_and_exclusions = combine_overrides(overrides) if overrides
+                          rules_dup.merge!(rules_data) if rules_data

Member

lloeki Mar 14, 2023

If I'm not mistaken combine_data only ever returns a hash with the 'rules_data' key?

Maybe if combine_data returned result, then result could be assigned directly to rules_dup['rules_data'].

This would also save us a hash allocation.

lib/datadog/appsec/processor/rule_merger.rb Outdated

+                          rules_overrides_and_exclusions = combine_overrides(overrides) if overrides
+                          rules_dup.merge!(rules_data) if rules_data
+                          rules_dup.merge!(rules_overrides_and_exclusions) if rules_overrides_and_exclusions

Member

lloeki Mar 14, 2023

If I'm not mistaken combine_overrides only ever returns a hash with the 'rules_override' and 'exclusions' key?

Maybe if combine_overrides was split and returned result, then result could be assigned directly to rules_dup['rules_override'] and rules_dup['exclusions']

This would also save us a hash allocation.

lib/datadog/appsec/processor/rule_merger.rb Outdated

Comment on lines 28 to 33

+                              data_exists = result.select { |x| x['id'] == value['id'] }
+                              if data_exists.any?
+                                existing_data = data_exists.first
+                                if existing_data['type'] == value['type']

Member

lloeki Mar 14, 2023

This seems to be equivalent to:

if (existing_data = result.find { |x| x['id'] == value['id'] }) && 
  if existing_data['type'] == value['type']
    # ...

So:

if (existing_data = result.find { |x| x['id'] == value['id'] }) && existing_data['type'] == value['type']
  # ...

And a single else clause

lib/datadog/appsec/processor/rule_merger.rb Outdated

Comment on lines 73 to 74

		# There could be cases that there is no experitaion value.
		# To singal that there is no expiration we use the default value 0.

Member

lloeki Mar 14, 2023

If it was omitted at the source, should we omit it instead of using a default?

lib/datadog/appsec/processor/rule_merger.rb Outdated

Comment on lines 55 to 66

+                          result = data1.each_with_object({}) do |value, acc|
+                            acc[value['value']] = value['expiration']
+                          end
+                          data2.each do |data|
+                            if result.key?(data['value'])
+                              # The value is duplicated so we need to keep
+                              # the one with the highest expiration value
+                              # We replace it if the expiration is higher than the current one
+                              # or if no experiration
+                              expiration = result[data['value']]
+                              result[data['value']] = data['expiration'] if data['expiration'].nil? || data['expiration'] > expiration

Member

lloeki Mar 14, 2023

Could that last expiration be nil?

Member Author

GustavoCaso Mar 15, 2023

Good point 😄

lib/datadog/appsec/processor/rule_merger.rb Outdated

+                    module RuleMerger
+                      class << self
+                        def merge(rules:, data: nil, overrides: nil)
+                          rules_dup = rules.dup

Member

lloeki Mar 14, 2023

Looking at the logic below it's a bit hard to track whether no nested mutation happens.

WDYT of adding some deep freeze to the lets in specs so that we ensure we never mutate the originals?

lib/datadog/appsec/processor/rule_merger.rb

+                            end
+                          end
+                          return unless result.any?

Member

lloeki Mar 14, 2023

I guess this is because empty rulesets are disallowed by libddwaf

Member Author

GustavoCaso Mar 15, 2023

This will not be any empty ruleset but an empty rules_data key. We will not add it to the final rules hash.

lib/datadog/appsec/processor/rule_merger.rb

+                  class Processor
+                    # RuleMerger merge different sources of information
+                    # into the rules payload
+                    module RuleMerger

Member

lloeki Mar 14, 2023

This is not merging rules but collections of rules, expirations, and overrides. I'm not happy with the name but I can't figure another one right now so let's go with that.

Member

ivoanjo commented Mar 14, 2023

Also, I feel like it would make sense to rebase this PR onto the chain of PRs, the latest add-remote-config-component one would be a good candidate as a rebase target + base branch for the PR. This would make integration of the merger usable immediately.

Consider maybe doing it the other way -- merging this to master (as long as it has no impact on existing users) and merging master into the working branch. This way allows incremental changes to land in master, rather than a big "land-everything" PR that is very hard to review.


          add support for merging rules files with same version

47b85bd

Member

lloeki commented Mar 15, 2023

Consider maybe doing it the other way -- merging this to master (as long as it has no impact on existing users) and merging master into the working branch. This way allows incremental changes to land in master, rather than a big "land-everything" PR that is very hard to review.

I'm confused, there is no "big PR", there is a chain of (currently) seven PRs, each with a specific and progressive subset of change that should be easier to review in order and in isolation than a big PR, and each allowing incremental changes to land. Whether this PR here is incrementally merged now or downstream of the sequence of PRs is immaterial.

Each PR having their own branch based on the previous one, merging this to master and then merging master to the (currently) last branch would effectively freeze all the branches since they would not be rebase-able anymore†. So any change on a specific "top" branch would mean merging that branch down, once in each downstream branch. This would make history a terrible mess. The alternative would be to merge only on the last one, effectively turning the chained but independent branches into one big PR behind the scenes.

† I can see some ways to work around that, like merge-preserving rebasing only the last one, then branch -f on each intermediate branches to the newly rebased commits.


          apply feedback

d1b004c

lloeki merged commit 7289395 into master

lloeki deleted the appsec-rule-merging branch

March 21, 2023 09:50

github-actions bot added this to the 1.11.0 milestone

lloeki modified the milestones: 1.11.0, 1.11.0.beta1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

appsec