Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue that we cannot modify apm_config.obfuscation.* through env vars #32318

Merged
merged 7 commits into from
Dec 24, 2024

Conversation

keisku
Copy link
Contributor

@keisku keisku commented Dec 18, 2024

What does this PR do?

Fix the issue that we cannot modify apm_config.obfuscation.* through env vars.

Set default values based on this file.

# obfuscation:
# credit_cards:
## @param DD_APM_OBFUSCATION_CREDIT_CARDS_ENABLED - boolean - optional
## Enables obfuscation rules for credit cards. Enabled by default.
# enabled: true
## @param DD_APM_OBFUSCATION_CREDIT_CARDS_LUHN - boolean - optional
## Enables a Luhn checksum check in order to eliminate false negatives. Disabled by default.
# luhn: false
#
# elasticsearch:
## @param DD_APM_OBFUSCATION_ELASTICSEARCH_ENABLED - boolean - optional
## Enables obfuscation rules for spans of type "elasticsearch". Enabled by default.
# enabled: true
## @param DD_APM_OBFUSCATION_ELASTICSEARCH_KEEP_VALUES - object - optional
## List of keys that should not be obfuscated.
# keep_values:
# - client_id
## @param DD_APM_OBFUSCATION_ELASTICSEARCH_OBFUSCATE_SQL_VALUES - boolean - optional
## The set of keys for which their values will be passed through SQL obfuscation
# obfuscate_sql_values:
# - val1
#
# opensearch:
## @param DD_APM_OBFUSCATION_OPENSEARCH_ENABLED - boolean - optional
## Enables obfuscation rules for spans of type "opensearch". Enabled by default.
# enabled: true
## @param DD_APM_OBFUSCATION_OPENSEARCH_KEEP_VALUES - object - optional
## List of keys that should not be obfuscated.
# keep_values:
# - client_id
## @param DD_APM_OBFUSCATION_OPENSEARCH_OBFUSCATE_SQL_VALUES - boolean - optional
## The set of keys for which their values will be passed through SQL obfuscation
# obfuscate_sql_values:
# - val1
#
# http:
## @param DD_APM_OBFUSCATION_HTTP_REMOVE_QUERY_STRING - boolean - optional
## Enables obfuscation of query strings in URLs
# remove_query_string: false
## @param DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS - boolean - optional
## If enabled, path segments in URLs containing digits are replaced by "?"
# remove_path_with_digits: false
#
# memcached:
## @param DD_APM_OBFUSCATION_MEMCACHED_ENABLED - boolean - optional
## Enables obfuscation rules for spans of type "memcached". Enabled by default.
# enabled: true
## @param DD_APM_OBFUSCATION_MEMCACHED_KEEP_COMMAND - boolean - optional
## If enabled, the full command for the query will be kept, including any lookup
## keys that could be present. The value for storage commands will still be
## redacted if Memcached obfuscation is enabled.
# keep_command: false
#
# mongodb:
## @param DD_APM_OBFUSCATION_MONGODB_ENABLED - boolean - optional
## Enables obfuscation rules for spans of type "mongodb". Enabled by default.
# enabled: true
## @param DD_APM_OBFUSCATION_MONGODB_KEEP_VALUES - object - optional
## List of keys that should not be obfuscated.
# keep_values:
# - document_id
## @param DD_APM_OBFUSCATION_MONGODB_OBFUSCATE_SQL_VALUES - object - optional
## The set of keys for which their values will be passed through SQL obfuscation
# obfuscate_sql_values:
# - val1
#
# redis:
## @param DD_APM_OBFUSCATION_REDIS_ENABLED - boolean - optional
## Enables obfuscation rules for spans of type "redis". Enabled by default.
# enabled: true
## @param DD_APM_OBFUSCATION_REDIS_REMOVE_ALL_ARGS - boolean - optional
## When true, replaces all arguments of a redis command with a single "?". Disabled by default.
# remove_all_args: false
#
## @param DD_APM_OBFUSCATION_REMOVE_STACK_TRACES - boolean - optional
## Enables removing stack traces to replace them with "?". Disabled by default.
# remove_stack_traces: false
#
# sql_exec_plan:
## @param DD_APM_SQL_EXEC_PLAN_ENABLED - boolean - optional
## Enables obfuscation rules for JSON query execution plans. Disabled by default.
# enabled: false
## @param DD_APM_SQL_EXEC_PLAN_KEEP_VALUES - object - optional
## List of keys that should not be obfuscated.
# keep_values:
# - id1
## @param DD_APM_SQL_EXEC_PLAN_OBFUSCATE_SQL_VALUES - boolean - optional
## The set of keys for which their values will be passed through SQL obfuscation
# obfuscate_sql_values:
# - val1
#
# sql_exec_plan_normalize:
## @param DD_APM_SQL_EXEC_PLAN_NORMALIZE_ENABLED - boolean - optional
## Enables obfuscation rules for JSON query execution plans, including cost and row estimates.
## Produces a normalized execution plan. Disabled by default.
# enabled: false
## @param DD_APM_SQL_EXEC_PLAN_NORMALIZE_KEEP_VALUES - object - optional
## List of keys that should not be obfuscated.
# keep_values:
# - id1
## @param DD_APM_SQL_EXEC_PLAN_NORMALIZE_OBFUSCATE_SQL_VALUES - boolean - optional
## The set of keys for which their values will be passed through SQL obfuscation
# obfuscate_sql_values:
# - val1
# cache:
## @param DD_APM_OBFUSCATION_CACHE_ENABLED - boolean - optional
## Enables caching obfuscated statements. Currently supported for SQL and MongoDB queries.
## Enabled by default.
# enabled: true

Motivation

There’s no special meaning behind choosing apm_config.obfuscatio.memcached.enabled and apm_config.obfuscatio.http.remove_paths_with_digits. They’re just examples.

For example, when we try to change apm_config.obfuscatio.memcached.enabled and apm_config.obfuscatio.http.remove_paths_with_digits in container environment, we would try to set env vars like this.

docker run --name datadog-agent -d --cgroupns host --pid host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
-e DD_APM_OBFUSCATION_MEMCACHED_ENABLED=false \
-e DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS=false \
datadog/agent:7.60.0

docker exec -it datadog-agent agent config
apm_config:
  # ...skip...
  obfuscation.memcached.enabled: "false"
  obfuscation.http.remove_paths_with_digits: "false"

The following outputs are not expected. They should be hierarchical structure.

  • obfuscation.memcached.enabled: "false"
  • obfuscation.http.remove_paths_with_digits: "false"

Workaround

Change configs through a static file.

docker run --name datadog-agent -d --cgroupns host --pid host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
$(docker build -t datadog-agent --quiet -<<-EOD
FROM datadog/agent:7.60.0
COPY --chmod=755 <<"CONFIGEND" /etc/datadog-agent/datadog.yaml
apm_config:
  obfuscation:
    memcached:
      enabled: false
    http:
      remove_paths_with_digits: false
CONFIGEND
EOD
)

docker exec -it datadog-agent agent config
apm_config:
  # ...skip...
  obfuscation:
    memcached:
      enabled: false
    http:
      remove_paths_with_digits: false

Describe how you validated your changes

docker run --name datadog-agent -d --cgroupns host --pid host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
datadog/agent-dev:keisku-apm-config-obfuscation-py3

docker exec -it datadog-agent agent config
apm_config:
  # The following configs are default.
  obfuscation:
    cache:
      enabled: true
    elasticsearch:
      enabled: true
      keep_values: []
      obfuscate_sql_values: []
    http:
      remove_paths_with_digits: false
      remove_query_string: false
    memcached:
      enabled: true
      keep_command: false
    mongodb:
      enabled: true
      keep_values: []
      obfuscate_sql_values: []
    opensearch:
      enabled: true
      keep_values: []
      obfuscate_sql_values: []
    redis:
      enabled: true
      remove_all_args: true
    remove_stack_traces: false
    sql_exec_plan:
      enabled: false
      keep_values: []
      obfuscate_sql_values: []
    sql_exec_plan_normalize:
      enabled: false
      keep_values: []
      obfuscate_sql_values: []

docker run --name datadog-agent -d --cgroupns host --pid host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
-e DD_APM_OBFUSCATION_MEMCACHED_ENABLED=false \
-e DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS=false \
-e DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_KEEP_VALUES='value1 value2 value3' \
datadog/agent-dev:keisku-apm-config-obfuscation-py3

docker exec -it datadog-agent agent config
apm_config:
  # ...skip....
  obfuscation:
    cache:
      enabled: true
    elasticsearch:
      enabled: true
      keep_values: []
      obfuscate_sql_values: []
    http:
      remove_paths_with_digits: false
      remove_query_string: false
    memcached:
      # MODIFIED
      enabled: false
      keep_command: false
    mongodb:
      enabled: true
      keep_values: []
      obfuscate_sql_values: []
    opensearch:
      enabled: true
      keep_values: []
      obfuscate_sql_values: []
    redis:
      enabled: true
      remove_all_args: true
    remove_stack_traces: false
    sql_exec_plan:
      enabled: false
      keep_values: []
      obfuscate_sql_values: []
    sql_exec_plan_normalize:
      enabled: false
      # MODIFIED
      keep_values:
      - value1
      - value2
      - value3
      obfuscate_sql_values: []

Possible Drawbacks / Trade-offs

Additional Notes

@keisku keisku added changelog/no-changelog qa/done QA done before merge and regressions are covered by tests labels Dec 18, 2024
@keisku keisku added this to the 7.61.0 milestone Dec 18, 2024
@github-actions github-actions bot added the short review PR is simple enough to be reviewed quickly label Dec 18, 2024
@keisku keisku force-pushed the keisku/apm-config-obfuscation branch from 8dcb3a2 to 88bebf8 Compare December 18, 2024 06:12
@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Dec 18, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv aws.create-vm --pipeline-id=51758475 --os-family=ubuntu

Note: This applies to commit 83c4fc04

@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Dec 18, 2024

Uncompressed package size comparison

Comparison with ancestor a6860d5349a50d8c9df8dc8fbaf1eeaf58edb9f3

Diff per package
package diff status size ancestor threshold
datadog-agent-x86_64-rpm 0.02MB ⚠️ 1198.29MB 1198.26MB 140.00MB
datadog-agent-x86_64-suse 0.02MB ⚠️ 1198.29MB 1198.26MB 140.00MB
datadog-agent-amd64-deb 0.02MB ⚠️ 1189.02MB 1189.00MB 140.00MB
datadog-agent-aarch64-rpm 0.01MB ⚠️ 944.17MB 944.16MB 140.00MB
datadog-agent-arm64-deb 0.01MB ⚠️ 934.93MB 934.92MB 140.00MB
datadog-dogstatsd-arm64-deb 0.00MB 55.77MB 55.77MB 10.00MB
datadog-iot-agent-x86_64-rpm 0.00MB 113.42MB 113.42MB 10.00MB
datadog-iot-agent-x86_64-suse 0.00MB 113.41MB 113.41MB 10.00MB
datadog-dogstatsd-x86_64-rpm 0.00MB 78.65MB 78.65MB 10.00MB
datadog-dogstatsd-x86_64-suse 0.00MB 78.65MB 78.65MB 10.00MB
datadog-dogstatsd-amd64-deb 0.00MB 78.57MB 78.57MB 10.00MB
datadog-heroku-agent-amd64-deb 0.00MB 505.09MB 505.09MB 70.00MB
datadog-iot-agent-amd64-deb 0.00MB 113.35MB 113.35MB 10.00MB
datadog-iot-agent-arm64-deb 0.00MB 108.81MB 108.81MB 10.00MB
datadog-iot-agent-aarch64-rpm -0.00MB 108.88MB 108.88MB 10.00MB

Decision

⚠️ Warning

@keisku keisku force-pushed the keisku/apm-config-obfuscation branch 2 times, most recently from b92db61 to 136f97b Compare December 18, 2024 06:57
@keisku keisku force-pushed the keisku/apm-config-obfuscation branch from 136f97b to 6caa631 Compare December 18, 2024 07:09
@keisku keisku marked this pull request as ready for review December 18, 2024 07:48
@keisku keisku requested review from a team as code owners December 18, 2024 07:48
@keisku keisku requested a review from pgimalac December 18, 2024 07:48
Copy link
Member

@pgimalac pgimalac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what the issue was or how your changes fix it, can you add some details in the PR description ?

pkg/config/setup/apm.go Show resolved Hide resolved
pkg/config/setup/apm.go Show resolved Hide resolved
@github-actions github-actions bot added medium review PR review might take time and removed short review PR is simple enough to be reviewed quickly labels Dec 18, 2024
@keisku
Copy link
Contributor Author

keisku commented Dec 19, 2024

@pgimalac Thanks for reviewing this PR. I put the detail here.

I don't think I understand what the issue was or how your changes fix it, can you add some details in the PR description ?

The goal is to solve the issue that we cannot change apm_config.obfuscation.* through env vars.

For example, when we try to change apm_config.obfuscatio.memcached.enabled and apm_config.obfuscatio.http.remove_paths_with_digits in container environment, we would try to set env vars like this.
There’s no special meaning behind choosing apm_config.obfuscatio.memcached.enabled and apm_config.obfuscatio.http.remove_paths_with_digits. They’re just examples.

docker run --name datadog-agent -d --cgroupns host --pid host \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
-e DD_APM_OBFUSCATION_MEMCACHED_ENABLED=false \
-e DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS=false \
datadog/agent:7.60.0

Then, we would check if configs are reflected by env vars like docker exec -it datadog-agent agent config.

We can expect output like this.

apm_config:
  obfuscation:
    memcached:
      enabled: false
    http:
      remove_paths_with_digits: false

BUT, the actual output is like this.

apm_config:
  obfuscation.memcached.enabled: "false"
  obfuscation.http.remove_paths_with_digits: "false"

obfuscation.memcached.enabled and obfuscation.http.remove_paths_with_digits are expected to have a hierarchical structure, but it does not. Besides, the value is string, not boolean.

To solve this issue, I opened this PR.

@keisku keisku requested a review from pgimalac December 19, 2024 09:27
@pgimalac
Copy link
Member

Thanks for the PR, we'll look into the bug in the config

@keisku keisku requested review from a team as code owners December 20, 2024 05:33
@keisku keisku force-pushed the keisku/apm-config-obfuscation branch from 5b3a84e to b547803 Compare December 20, 2024 05:44
Copy link

cit-pr-commenter bot commented Dec 20, 2024

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: ebea54ff-6df7-4e95-a4ee-7c169ec834c4

Baseline: a6860d5
Comparison: 83c4fc0
Diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
uds_dogstatsd_to_api_cpu % cpu utilization +1.62 [+0.93, +2.31] 1 Logs
tcp_syslog_to_blackhole ingress throughput +0.32 [+0.26, +0.39] 1 Logs
file_to_blackhole_0ms_latency_http2 egress throughput +0.20 [-0.71, +1.11] 1 Logs
file_to_blackhole_100ms_latency egress throughput +0.19 [-0.52, +0.90] 1 Logs
file_to_blackhole_500ms_latency egress throughput +0.16 [-0.61, +0.93] 1 Logs
file_tree memory utilization +0.13 [+0.01, +0.25] 1 Logs
quality_gate_idle_all_features memory utilization +0.11 [+0.03, +0.19] 1 Logs bounds checks dashboard
file_to_blackhole_1000ms_latency_linear_load egress throughput +0.10 [-0.37, +0.57] 1 Logs
file_to_blackhole_0ms_latency egress throughput +0.10 [-0.80, +1.00] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput -0.00 [-0.02, +0.01] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.01 [-0.12, +0.10] 1 Logs
file_to_blackhole_0ms_latency_http1 egress throughput -0.02 [-0.87, +0.82] 1 Logs
file_to_blackhole_300ms_latency egress throughput -0.04 [-0.68, +0.61] 1 Logs
file_to_blackhole_1000ms_latency egress throughput -0.21 [-0.99, +0.58] 1 Logs
quality_gate_idle memory utilization -0.87 [-0.90, -0.84] 1 Logs bounds checks dashboard
quality_gate_logs % cpu utilization -1.30 [-4.48, +1.89] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed links
file_to_blackhole_0ms_latency lost_bytes 10/10
file_to_blackhole_0ms_latency memory_usage 10/10
file_to_blackhole_0ms_latency_http1 lost_bytes 10/10
file_to_blackhole_0ms_latency_http1 memory_usage 10/10
file_to_blackhole_0ms_latency_http2 lost_bytes 10/10
file_to_blackhole_0ms_latency_http2 memory_usage 10/10
file_to_blackhole_1000ms_latency memory_usage 10/10
file_to_blackhole_1000ms_latency_linear_load memory_usage 10/10
file_to_blackhole_100ms_latency lost_bytes 10/10
file_to_blackhole_100ms_latency memory_usage 10/10
file_to_blackhole_300ms_latency lost_bytes 10/10
file_to_blackhole_300ms_latency memory_usage 10/10
file_to_blackhole_500ms_latency lost_bytes 10/10
file_to_blackhole_500ms_latency memory_usage 10/10
quality_gate_idle memory_usage 10/10 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 bounds checks dashboard
quality_gate_logs lost_bytes 10/10
quality_gate_logs memory_usage 10/10

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.

@ajgajg1134
Copy link
Contributor

Overall this change looks fine to me, if I understand the issue correctly is this env var configuration not working an issue affecting traces such that they are still being obfuscated when you don't expect them to be? This is surprising to me since it looks like the old code worked alright / I'm fairly sure I've used env vars to configure these in the past 🤔
In any case can you provide a test that demonstrates this fix (and ensures we don't break it in the future)?

@keisku
Copy link
Contributor Author

keisku commented Dec 21, 2024

@ajgajg1134

Thank you for reviewing this PR! I am documenting the testing method and results here. I will add the Unit Test later.

Background:
Let me explain the background of this PR more.
#31336 introduced obfuscation caching to address the CPU usage issue. I wanted to test the cache feature but noticed that it isn’t enabled for the core-agent, and I can’t enable it through an environment variable.

For investigation, I added debug logs that dump obfuscation config to the code before this PR (= main branch).

TL;DR: We can modify obfuscation config through env vars for trace-agent, but not core-agent.

git diff
git diff HEAD~1 | cat -
diff --git a/comp/trace/config/setup.go b/comp/trace/config/setup.go
index 06a157dc64..0c26271855 100644
--- a/comp/trace/config/setup.go
+++ b/comp/trace/config/setup.go
@@ -508,6 +508,7 @@ func applyDatadogConfig(c *config.AgentConfig, core corecompcfg.Component) error
 		if pkgconfigsetup.Datadog().IsSet("apm_config.obfuscation.cache.enabled") {
 			c.Obfuscation.Cache.Enabled = pkgconfigsetup.Datadog().GetBool("apm_config.obfuscation.cache.enabled")
 		}
+		log.Debugf("obfuscation config: %#v", c.Obfuscation)
 	}
 
 	if core.IsSet("apm_config.filter_tags.require") {
diff --git a/pkg/collector/python/datadog_agent.go b/pkg/collector/python/datadog_agent.go
index 95a0b2f69f..af02378d79 100644
--- a/pkg/collector/python/datadog_agent.go
+++ b/pkg/collector/python/datadog_agent.go
@@ -258,6 +258,7 @@ func lazyInitObfuscator() *obfuscate.Obfuscator {
 			log.Errorf("Failed to unmarshal apm_config.obfuscation: %s", err.Error())
 			cfg = obfuscate.Config{}
 		}
+		log.Debugf("obfuscation config: %#v", cfg)
 		if !cfg.SQLExecPlan.Enabled {
 			cfg.SQLExecPlan = defaultSQLPlanObfuscateSettings
 		}
diff --git a/pkg/trace/agent/obfuscate.go b/pkg/trace/agent/obfuscate.go
index 25042e3fd6..228c2433f3 100644
--- a/pkg/trace/agent/obfuscate.go
+++ b/pkg/trace/agent/obfuscate.go
@@ -138,6 +138,7 @@ func (a *Agent) lazyInitObfuscator() *obfuscate.Obfuscator {
 
 	if a.obfuscator == nil {
 		if a.obfuscatorConf != nil {
+			log.Debugf("obfuscation config: %#v", *a.obfuscatorConf)
 			a.obfuscator = obfuscate.NewObfuscator(*a.obfuscatorConf)
 		} else {
 			a.obfuscator = obfuscate.NewObfuscator(obfuscate.Config{})

Then, I checked debug logs with the following containers.

docker-compose.yaml
services:
  agent:
    container_name: datadog-agent
    image: datadog/agent-dev:keisku-apm-config-obfuscation-investigation-py3
    volumes:
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
      - /proc/:/host/proc/:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /var/lib/cloud/data/instance-id:/var/lib/cloud/data/instance-id:ro
    environment:
      DD_LOG_LEVEL: debug
      DD_HOSTNAME_FILE: /var/lib/cloud/data/instance-id
      DD_APM_OBFUSCATION_MEMCACHED_ENABLED: false
      DD_APM_OBFUSCATION_CACHE_ENABLED: true
      DD_APM_OBFUSCATION_SQL_EXEC_PLAN_ENABLED: true
      DD_APM_OBFUSCATION_SQL_EXEC_PLAN_KEEP_VALUES: value1 value2 value3
    env_file:
      - ~/sandbox.docker.env
  mysql:
    container_name: mysql
    image: mysql:8
    restart: always
    labels:
      com.datadoghq.ad.checks: |
        {
          "mysql": {
            "instances": [
              {
                "host": "%%host%%",
                "port": "3306",
                "username": "datadog",
                "password": "datadog",
                "dbm": "true"
              }
            ]
          }
        }
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_USER: datadog
      MYSQL_PASSWORD: datadog
    volumes:
    - ./init.sh:/docker-entrypoint-initdb.d/init.sh

We can modify obfuscation config through env vars for trace-agent, but not core-agent.

The result of this test case, I am sure env vars don't change obfuscation config for CORE.

  • DD_APM_OBFUSCATION_MEMCACHED_ENABLED: false is reflected for TRACE and CORE both.
  • DD_APM_OBFUSCATION_CACHE_ENABLED: true is reflected for TRACE, but not CORE.
    • TRACE | Cache:obfuscate.CacheConfig{Enabled:true}
    • CORE | Cache:obfuscate.CacheConfig{Enabled:false}, it should be true.
  • DD_APM_OBFUSCATION_SQL_EXEC_PLAN_ENABLED: true and DD_APM_OBFUSCATION_SQL_EXEC_PLAN_KEEP_VALUES: value1 value2 value3 are reflected for TRACE, but not CORE.
    • TRACE | SQLExecPlan:obfuscate.JSONConfig{Enabled:true, KeepValues:[]string{"value1", "value2", "value3"}
    • CORE | SQLExecPlan:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}
docker logs datadog-agent | grep 'obfuscation\ config\:'

2024-12-21 09:49:38 UTC | TRACE | DEBUG | (comp/trace/config/setup.go:511 in applyDatadogConfig) | obfuscation config: &config.ObfuscationConfig{ES:obfuscate.JSONConfig{Enabled:true, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, OpenSearch:obfuscate.JSONConfig{Enabled:true, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, Mongo:obfuscate.JSONConfig{Enabled:true, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, SQLExecPlan:obfuscate.JSONConfig{Enabled:true, KeepValues:[]string{"value1", "value2", "value3"}, ObfuscateSQLValues:[]string(nil)}, SQLExecPlanNormalize:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, HTTP:obfuscate.HTTPConfig{RemoveQueryString:false, RemovePathDigits:false}, RemoveStackTraces:false, Redis:obfuscate.RedisConfig{Enabled:true, RemoveAllArgs:false}, Memcached:obfuscate.MemcachedConfig{Enabled:false, KeepCommand:false}, CreditCards:obfuscate.CreditCardsConfig{Enabled:true, Luhn:false, KeepValues:[]string(nil)}, Cache:obfuscate.CacheConfig{Enabled:true}}

2024-12-21 09:49:42 UTC | CORE | DEBUG | (pkg/collector/python/datadog_agent.go:261 in func1) | obfuscation config: obfuscate.Config{SQL:obfuscate.SQLConfig{DBMS:"", TableNames:false, CollectCommands:false, CollectComments:false, CollectProcedures:false, ReplaceDigits:false, KeepSQLAlias:false, DollarQuotedFunc:false, ObfuscationMode:"", RemoveSpaceBetweenParentheses:false, KeepNull:false, KeepBoolean:false, KeepPositionalParameter:false, KeepTrailingSemicolon:false, KeepIdentifierQuotation:false, KeepJSONPath:false, Cache:false}, ES:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, OpenSearch:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, Mongo:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, SQLExecPlan:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, SQLExecPlanNormalize:obfuscate.JSONConfig{Enabled:false, KeepValues:[]string(nil), ObfuscateSQLValues:[]string(nil)}, HTTP:obfuscate.HTTPConfig{RemoveQueryString:false, RemovePathDigits:false}, Redis:obfuscate.RedisConfig{Enabled:false, RemoveAllArgs:false}, Memcached:obfuscate.MemcachedConfig{Enabled:false, KeepCommand:false}, CreditCard:obfuscate.CreditCardsConfig{Enabled:false, Luhn:false, KeepValues:[]string(nil)}, Statsd:obfuscate.StatsClient(nil), Logger:obfuscate.Logger(nil), Cache:obfuscate.CacheConfig{Enabled:false}}

@keisku keisku force-pushed the keisku/apm-config-obfuscation branch 8 times, most recently from bf4e79e to f03a4de Compare December 23, 2024 13:02
@keisku keisku force-pushed the keisku/apm-config-obfuscation branch from f03a4de to 1fa31ff Compare December 23, 2024 13:03
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks for the added details and tests here!

@keisku
Copy link
Contributor Author

keisku commented Dec 23, 2024

ERROR: Merge-base of this branch is too old for SMP. Please update your branch by merging an up-to-date main branch into your branch or by rebasing it on an up-to-date main branch.
https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/748647629

I am going to update this branch.

Comment on lines +267 to +268
if len(obfuscaterConfig.Mongo.KeepValues) == 0 {
obfuscaterConfig.Mongo.KeepValues = defaultMongoObfuscateSettings.KeepValues
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, cfg here is not reflected by default values because DD_APM_OBFUSCATION_* are not bound to default values. This is the issue that I want to solve with this PR.

After this PR, cfg.Mongo.Enabled is true since DD_APM_OBFUSCATION_MONGODB_ENABLED is bound to true. This follows this documentation.

# mongodb:
## @param DD_APM_OBFUSCATION_MONGODB_ENABLED - boolean - optional
## Enables obfuscation rules for spans of type "mongodb". Enabled by default.
# enabled: true

So, we need this change to keep backward compatibility.

cfg.SQLExecPlan.Enabled and cfg.SQLExecPlanNormalize.Enabled are false by default because DD_APM_OBFUSCATION_SQL_EXEC_PLAN_ENABLED and DD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_ENABLED are bound to false. Therefore, we can keep backward compatibility without code change here.

@keisku
Copy link
Contributor Author

keisku commented Dec 24, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Dec 24, 2024

Devflow running: /merge

View all feedbacks in Devflow UI.


2024-12-24 12:45:12 UTC ℹ️ MergeQueue: pull request added to the queue

The median merge time in main is 34m.


2024-12-24 13:19:18 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 45b38f9 into main Dec 24, 2024
228 checks passed
@dd-mergequeue dd-mergequeue bot deleted the keisku/apm-config-obfuscation branch December 24, 2024 13:19
@github-actions github-actions bot modified the milestones: 7.61.0, 7.62.0 Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog medium review PR review might take time qa/done QA done before merge and regressions are covered by tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants