Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate tags in TCP/UDP logs #29780

Merged
merged 11 commits into from
Oct 4, 2024
Merged

Conversation

andrewqian2001datadog
Copy link
Contributor

@andrewqian2001datadog andrewqian2001datadog commented Oct 3, 2024

What does this PR do?

Fixes bug where duplicate tags occur in UDP/TCP logs

Motivation

Issue

Describe how to test/QA your changes

Inside conf.yaml, add the following
Create the file if needed dev/dist/conf.d/test.d/conf.yaml

logs:
  - type: udp
    port: 10518
    service: "test_app"
    source: "test_app_src"
    tags:
      - "name:integrationtag"

Inside Datadog.yaml, enable logs and have tags as well

logs_enabled: true
tags:
  - "name:hosttag"

Run the agent
./bin/agent/agent run -c bin/agent/dist/datadog.yaml
In a different terminal, get the logs from the agent
./bin/agent/agent stream-logs -c bin/agent/dist/datadog.yaml
In a different terminal, send logs to the agent
echo -n "this is my log" | nc -u -w 1 127.0.0.1 10518
Ensure that the tags are not duplicated in the terminal that gets the logs

Different QA steps are also provided in the ticket if needed

@pr-commenter
Copy link

pr-commenter bot commented Oct 3, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=45874840 --os-family=ubuntu

Note: This applies to commit 1bb1a6a

@andrewqian2001datadog andrewqian2001datadog self-assigned this Oct 3, 2024
Copy link

cit-pr-commenter bot commented Oct 3, 2024

Regression Detector

Regression Detector Results

Run ID: 211f477f-8fdf-475d-9ed3-83d4bba15ede Metrics dashboard Target profiles

Baseline: 8d63f09
Comparison: 1bb1a6a

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
basic_py_check % cpu utilization +2.58 [-0.17, +5.34] 1 Logs
uds_dogstatsd_to_api_cpu % cpu utilization +1.34 [+0.60, +2.07] 1 Logs
otel_to_otel_logs ingress throughput +0.81 [-0.00, +1.62] 1 Logs
pycheck_lots_of_tags % cpu utilization +0.60 [-1.91, +3.11] 1 Logs
idle_all_features memory utilization +0.46 [+0.37, +0.55] 1 Logs
file_tree memory utilization +0.31 [+0.20, +0.41] 1 Logs
idle memory utilization +0.06 [+0.02, +0.11] 1 Logs
tcp_syslog_to_blackhole ingress throughput +0.03 [-0.03, +0.09] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput -0.00 [-0.01, +0.01] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.01 [-0.07, +0.06] 1 Logs

Bounds Checks

perf experiment bounds_check_name replicates_passed
idle memory_usage 10/10

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@andrewqian2001datadog andrewqian2001datadog marked this pull request as ready for review October 3, 2024 21:37
@andrewqian2001datadog andrewqian2001datadog requested review from a team as code owners October 3, 2024 21:37
Copy link
Member

@gh123man gh123man left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change and tests LGTM 👍
Lets update the title to something like: Fix duplicate tags in TCP/UDP logs (just so it's more clear when looking at the commit history).

And also it would be good to describe the problem and solution in the PR description. Links to JIRA are helpful, but when digging though git history it's convenient to have the additional context 😄

@andrewqian2001datadog andrewqian2001datadog changed the title fix duplicate tags Fix duplicate tags in TCP/UDP logs Oct 4, 2024
@carlosroman carlosroman modified the milestones: 7.59.0, 7.58.0 Oct 4, 2024
@carlosroman carlosroman added the qa/no-code-change No code change in Agent code requiring validation label Oct 4, 2024
@@ -168,6 +168,32 @@ func CheckLogsExpected(t *testing.T, fakeIntake *components.FakeIntake, service,
}, 2*time.Minute, 10*time.Second)
}

// CheckNoDuplicateTags verifies that there is no duplicate tags
func CheckNoDuplicateTags(t *testing.T, fakeIntake *components.FakeIntake, service, content string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to instead update the above CheckLogsExpected to check for duplicate tags?
That way we get dupe tag coverage for all test cases instead of just one. I don't think there is a good reason why we couldn't perform this check on all test cases.

@andrewqian2001datadog
Copy link
Contributor Author

/merge

@dd-devflow
Copy link

dd-devflow bot commented Oct 4, 2024

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 23m.

Use /merge -c to cancel this operation!

@dd-mergequeue dd-mergequeue bot merged commit 7991059 into main Oct 4, 2024
228 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qa/no-code-change No code change in Agent code requiring validation team/agent-metrics-logs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants