Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APPSEC-8287] Add AppSec::Processor::RuleMerger #2686

Merged
merged 3 commits into from
Mar 21, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions lib/datadog/appsec/processor/rule_merger.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# frozen_string_literal: true

module Datadog
module AppSec
class Processor
# RuleMerger merge different sources of information
# into the rules payload
module RuleMerger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not merging rules but collections of rules, expirations, and overrides. I'm not happy with the name but I can't figure another one right now so let's go with that.

class << self
def merge(rules:, data: nil, overrides: nil)
rules_dup = rules.dup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the logic below it's a bit hard to track whether no nested mutation happens.

WDYT of adding some deep freeze to the lets in specs so that we ensure we never mutate the originals?


rules_data = combine_data(data) if data
rules_overrides_and_exclusions = combine_overrides(overrides) if overrides

rules_dup.merge!(rules_data) if rules_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken combine_data only ever returns a hash with the 'rules_data' key?

Maybe if combine_data returned result, then result could be assigned directly to rules_dup['rules_data'].

This would also save us a hash allocation.

rules_dup.merge!(rules_overrides_and_exclusions) if rules_overrides_and_exclusions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken combine_overrides only ever returns a hash with the 'rules_override' and 'exclusions' key?

Maybe if combine_overrides was split and returned result, then result could be assigned directly to rules_dup['rules_override'] and rules_dup['exclusions']

This would also save us a hash allocation.

rules_dup
end

private

def combine_data(data)
result = []

data.each do |data_entry|
data_entry['rules_data'].each do |value|
data_exists = result.select { |x| x['id'] == value['id'] }

if data_exists.any?
existing_data = data_exists.first

if existing_data['type'] == value['type']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be equivalent to:

if (existing_data = result.find { |x| x['id'] == value['id'] }) && 
  if existing_data['type'] == value['type']
    # ...

So:

if (existing_data = result.find { |x| x['id'] == value['id'] }) && existing_data['type'] == value['type']
  # ...

And a single else clause

# Duplicate entry base on type and id
# We need to merge the existing data with the new one
# and make sure to remove duplicates
merged_data = merge_data_base_on_expiration(existing_data['data'], value['data'])
existing_data['data'] = merged_data
else
result << value
end
else
# First entry for that id
result << value
end
end
end

return unless result.any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is because empty rulesets are disallowed by libddwaf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be any empty ruleset but an empty rules_data key. We will not add it to the final rules hash.


{ 'rules_data' => result }
end

def merge_data_base_on_expiration(data1, data2)
result = data1.each_with_object({}) do |value, acc|
acc[value['value']] = value['expiration']
end

data2.each do |data|
if result.key?(data['value'])
# The value is duplicated so we need to keep
# the one with the highest expiration value
# We replace it if the expiration is higher than the current one
# or if no experiration
expiration = result[data['value']]
result[data['value']] = data['expiration'] if data['expiration'].nil? || data['expiration'] > expiration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that last expiration be nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 😄

else
result[data['value']] = data['expiration']
end
end

result.each_with_object([]) do |entry, acc|
# There could be cases that there is no experitaion value.
# To singal that there is no expiration we use the default value 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was omitted at the source, should we omit it instead of using a default?

acc << { 'value' => entry[0], 'expiration' => entry[1] || 0 }
end
end

def combine_overrides(overrides)
result = {}
exclusions = []
rules_override = []

overrides.each do |override|
if override['rules_override']
override['rules_override'].each do |rule_override|
rules_override << rule_override
end
elsif override['exclusions']
override['exclusions'].each do |exclusion|
exclusions << exclusion
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall you mentioned it's probably best if rules_override and exclusions processing is split between combine_overrides and combine_exclusions.

Do we expect these to be ever realistically mixed?

end

result['exclusions'] = exclusions if exclusions.any?
result['rules_override'] = rules_override if rules_override.any?

return if result.empty?

result
end
end
end
end
end
end
21 changes: 21 additions & 0 deletions sig/datadog/appsec/processor/rule_merger.rbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module Datadog
module AppSec
class Processor
module RuleMerger
type rules = ::Hash[::String, untyped]
type data = ::Hash[::String, untyped]
type overrides = ::Hash[::String, untyped]

def self.merge: (rules: untyped, ?data: ::Array[data]?, ?overrides: ::Array[overrides]?) -> rules

private

def self.combine_data: (::Array[data] data) -> (nil | { rules_data: untyped })

def self.merge_data_base_on_expiration: (::Array[data] data1, ::Array[data] data2) -> ::Array[data]

def self.combine_overrides: (::Array[overrides] overrides) -> overrides?
end
end
end
end
Loading