Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otel-agent] extension: require authentication token #29069

Merged
merged 14 commits into from
Sep 27, 2024
Merged

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Sep 5, 2024

What does this PR do?

This PR ensures all requests to the flare/FA extension contain a valid authtoken to serve said requests.

Motivation

Prevent snooping by unauthorized guests.

Additional Notes

Possible Drawbacks / Trade-offs

This work requires the modularization of the comp/api/authtoken component, which adds to the proliferation of modules. :(

otel-agent must have access to the auth_token, while most of our deployments already account for this, it's important for the otel-agent to have accesso to the same access token created by the core agent at startup.

Describe how to test/QA your changes

@truthbk truthbk added this to the 7.58.0 milestone Sep 5, 2024
@truthbk truthbk requested review from a team as code owners September 5, 2024 00:17
@truthbk truthbk requested a review from ankitpatel96 September 5, 2024 00:17
@truthbk truthbk changed the title Jaime/authtoken [otel-agent] extension: require authentication token to serve requests Sep 5, 2024
@truthbk truthbk changed the title [otel-agent] extension: require authentication token to serve requests [otel-agent] extension: require authentication token Sep 5, 2024
@jackgopack4
Copy link
Contributor

seeing the errors in the pipeline; instructions in https://github.com/DataDog/datadog-agent/blob/main/docs/dev/modules.md should help (I've been dealing with this with my ticket)

@github-actions github-actions bot added the team/opentelemetry OpenTelemetry team label Sep 5, 2024
@pr-commenter
Copy link

pr-commenter bot commented Sep 5, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=45290071 --os-family=ubuntu

Note: This applies to commit e9c2531

@pr-commenter
Copy link

pr-commenter bot commented Sep 5, 2024

Regression Detector

Regression Detector Results

Run ID: a56a71c7-71b5-443e-b21e-1d455cc10d53 Metrics dashboard Target profiles

Baseline: e779286
Comparison: e9c2531

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
uds_dogstatsd_to_api_cpu % cpu utilization +0.95 [+0.21, +1.69] 1 Logs
otel_to_otel_logs ingress throughput +0.71 [-0.11, +1.52] 1 Logs
idle memory utilization +0.39 [+0.33, +0.45] 1 Logs
idle_all_features memory utilization +0.19 [+0.11, +0.28] 1 Logs
tcp_syslog_to_blackhole ingress throughput +0.14 [+0.09, +0.19] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput -0.00 [-0.01, +0.01] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.00 [-0.09, +0.09] 1 Logs
basic_py_check % cpu utilization -0.41 [-3.11, +2.28] 1 Logs
file_tree memory utilization -1.74 [-1.84, -1.64] 1 Logs
pycheck_lots_of_tags % cpu utilization -2.30 [-4.92, +0.33] 1 Logs

Bounds Checks

perf experiment bounds_check_name replicates_passed
idle memory_usage 10/10

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@@ -1,6 +1,6 @@
module github.com/DataDog/datadog-agent

go 1.22.0
go 1.22.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go 1.22.5
go 1.22.0

Copy link
Contributor

@jackgopack4 jackgopack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but want to make sure you run another inv -e tidy after making sure you're using go version 1.22.0 so as to not change the version of the root go.mod, and make sure the modules are listed as used_by_otel=True in modules.py

Copy link
Contributor

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for files owned by ASC (after fixing jackgopack4 comment)

Copy link
Contributor

@jackgopack4 jackgopack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like line 214 of modules.py needs a used_by_otel=True, just pushed a change to do that

@GustavoCaso
Copy link
Member

@truthbk wondering if we need to increase the granularity with the module creation. Currently there are two implementations for authtoken.

Should we create a separate module out of each implementation? I believe that was the intended use of the different implementation folder layout.

@truthbk
Copy link
Member Author

truthbk commented Sep 9, 2024

@GustavoCaso I'm not sure the question of having alternate implementations and alternate modules is the same one. I'm definitely OK with changing the granularity of this module (which is something I was already not thrilled about because, well, more modules) as you are requesting.

However, the compiler is perfectly capable of only introducing a single implementation if these are well decoupled, and if we want to avoid excessive module proliferation, which I think we do, we don't want to end up with a module for the component definition, a module for implementation A, a module for implementation B, and a module for the fx stuff.

At any rate, I hear you, as we've discussed over the past months the modularization of code we're seeing across the codebase is definitely not ideal.

@GustavoCaso
Copy link
Member

@truthbk I here your concerns about the proliferation of Go modules.

I just find that there is a discrepancy between the components documentation and the approach taken in this PR.

In the component documentation we state that we should create different go modules for the interface, implementations and mocks https://datadoghq.dev/datadog-agent/components/creating-components/#go-module

@dustmop
Copy link
Contributor

dustmop commented Sep 9, 2024

@GustavoCaso One important thing to note is that the authtoken module in question is still using the "classic" style of definition, not the new layout with def, impl, fx folders. Converting it to the new format could be done, but I'm not sure there's an immediate need to do so. As long as its using the classic style, there's not much benefit to increasing the module granularity as you're describing, since the definition of the component is in the root folder. Anything depending on it will also be pulling in both child folders with the implementations.

@GustavoCaso
Copy link
Member

Thanks for the clarification @dustmop

@truthbk I was confused about the file hierarchy of the authtoken component 🤦. Please discard my comment

@truthbk
Copy link
Member Author

truthbk commented Sep 27, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Sep 27, 2024

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 24m.

Use /merge -c to cancel this operation!

@dd-mergequeue dd-mergequeue bot merged commit 2588d20 into main Sep 27, 2024
229 checks passed
@dd-mergequeue dd-mergequeue bot deleted the jaime/authtoken branch September 27, 2024 12:47
@github-actions github-actions bot modified the milestones: 7.58.0, 7.59.0 Sep 27, 2024
wdhif pushed a commit that referenced this pull request Sep 30, 2024
wdhif pushed a commit that referenced this pull request Oct 2, 2024
grantseltzer pushed a commit that referenced this pull request Oct 2, 2024
grantseltzer pushed a commit that referenced this pull request Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants