Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(DD_DEDUPLICATION_ALGORITHM_PER_PARSER + DD_HASHCODE_FIELDS_PER_SCANNER): Add checker of values #11244

Merged
merged 1 commit into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/content/en/usage/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ The environment variable will override the settings in `settings.dist.py`, repla

The available algorithms are:

DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL (value for `DD_DEDUPLICATION_ALGORITHM_PER_PARSER`: `unique_id_from_tool`)
: The deduplication occurs based on
finding.unique_id_from_tool which is a unique technical
id existing in the source tool. Few scanners populate this
Expand All @@ -266,12 +266,12 @@ DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL
able to recognise that findings found in previous
scans are actually the same as the new findings.

DEDUPE_ALGO_HASH_CODE
DEDUPE_ALGO_HASH_CODE (value for `DD_DEDUPLICATION_ALGORITHM_PER_PARSER`: `hash_code`)
: The deduplication occurs based on finding.hash_code. The
hash_code itself is configurable for each scanner in
parameter `HASHCODE_FIELDS_PER_SCANNER`.

DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE (value for `DD_DEDUPLICATION_ALGORITHM_PER_PARSER`: `unique_id_from_tool_or_hash_code`)
: A finding is a duplicate with another if they have the same
unique_id_from_tool OR the same hash_code.

Expand All @@ -284,7 +284,7 @@ DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE
cross-parser deduplication


DEDUPE_ALGO_LEGACY
DEDUPE_ALGO_LEGACY (value for `DD_DEDUPLICATION_ALGORITHM_PER_PARSER`: `legacy`)
: This is algorithm that was in place before the configuration
per parser was made possible, and also the default one for
backward compatibility reasons.
Expand Down
2 changes: 1 addition & 1 deletion dojo/settings/.settings.dist.py.sha256sum
Original file line number Diff line number Diff line change
@@ -1 +1 @@
09169f6d20ebf2f37347156111c3670a5b207c3530583a53ed9ac59ae4221188
f09caa2d4e41f44b7cd6ecf2f1400817d4776e703bd039c8d857f1356382e1f3
16 changes: 16 additions & 0 deletions dojo/settings/settings.dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -1296,6 +1296,12 @@ def saml2_attrib_map_format(dict):
if len(env("DD_HASHCODE_FIELDS_PER_SCANNER")) > 0:
env_hashcode_fields_per_scanner = json.loads(env("DD_HASHCODE_FIELDS_PER_SCANNER"))
for key, value in env_hashcode_fields_per_scanner.items():
if not isinstance(value, list):
msg = f"Fields definition '{value}' for hashcode calculation of '{key}' is not valid. It needs to be list of strings but it is {type(value)}."
raise TypeError(msg)
if not all(isinstance(field, str) for field in value):
msg = f"Fields for hashcode calculation for {key} are not valid. It needs to be list of strings. Some of fields are not string."
raise AttributeError(msg)
if key in HASHCODE_FIELDS_PER_SCANNER:
logger.info(f"Replacing {key} with value {value} (previously set to {HASHCODE_FIELDS_PER_SCANNER[key]}) from env var DD_HASHCODE_FIELDS_PER_SCANNER")
HASHCODE_FIELDS_PER_SCANNER[key] = value
Expand Down Expand Up @@ -1377,6 +1383,13 @@ def saml2_attrib_map_format(dict):
# Makes it possible to deduplicate on a technical id (same parser) and also on some functional fields (cross-parsers deduplication)
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE = "unique_id_from_tool_or_hash_code"

DEDUPE_ALGOS = [
DEDUPE_ALGO_LEGACY,
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL,
DEDUPE_ALGO_HASH_CODE,
DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE,
]

# Allows to deduplicate with endpoints if endpoints is not included in the hashcode.
# Possible values are: scheme, host, port, path, query, fragment, userinfo, and user. For a details description see https://hyperlink.readthedocs.io/en/latest/api.html#attributes.
# Example:
Expand Down Expand Up @@ -1526,6 +1539,9 @@ def saml2_attrib_map_format(dict):
if len(env("DD_DEDUPLICATION_ALGORITHM_PER_PARSER")) > 0:
env_dedup_algorithm_per_parser = json.loads(env("DD_DEDUPLICATION_ALGORITHM_PER_PARSER"))
for key, value in env_dedup_algorithm_per_parser.items():
if value not in DEDUPE_ALGOS:
msg = f"DEDUP algorithm '{value}' for '{key}' is not valid. Use one of following values: {', '.join(DEDUPE_ALGOS)}"
raise AttributeError(msg)
if key in DEDUPLICATION_ALGORITHM_PER_PARSER:
logger.info(f"Replacing {key} with value {value} (previously set to {DEDUPLICATION_ALGORITHM_PER_PARSER[key]}) from env var DD_DEDUPLICATION_ALGORITHM_PER_PARSER")
DEDUPLICATION_ALGORITHM_PER_PARSER[key] = value
Expand Down