Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock when sending telemetry logs #3886

Merged
merged 4 commits into from
Sep 6, 2024

Conversation

TonyCTHsu
Copy link
Contributor

@TonyCTHsu TonyCTHsu commented Sep 4, 2024

Motivation:

This PR fix a bug introduced in: #3873

To reproduce: DD_APPSEC_RULES='/weblog/wtf.json' bundle exec rails s

This runs a rails server with an invalid appsec rules and deadlock occurs when trying to report telemetry.

E, [2024-09-06T09:33:59.052611 #3892] ERROR -- datadog: [datadog] (/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/processor.rb:121:in `rescue in create_waf_handle') libddwaf failed to initialize, error: #<Datadog::AppSec::WAF::LibDDWAF::Error: Datadog::AppSec::WAF::LibDDWAF::Error>
E, [2024-09-06T09:33:59.052695 #3892] ERROR -- datadog: [datadog] (/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/core/configuration.rb:237:in `rescue in block in safely_synchronize') Detected deadlock during datadog initialization. Please report this at https://github.com/DataDog/dd-trace-rb/blob/master/CONTRIBUTING.md#found-a-bug
	Source:
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/core/configuration.rb:233:in `synchronize'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/core/configuration.rb:233:in `safely_synchronize'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/core/configuration.rb:196:in `components'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/core/telemetry/logging.rb:74:in `dispatch'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/core/telemetry/logging.rb:62:in `report'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/processor.rb:124:in `rescue in create_waf_handle'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/processor.rb:106:in `create_waf_handle'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/processor.rb:83:in `initialize'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/component.rb:52:in `new'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/component.rb:52:in `create_processor'
	/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/appsec/component.rb:16:in `build_appsec_component'
Exiting
/Users/tony.hsu/datadog/dd-trace-rb/lib/datadog/tracing/contrib/extensions.rb:100:in `configure': undefined method `telemetry' for nil:NilClass (NoMethodError)

              components.telemetry.integrations_change! if configuration.integrations_pending_activation

Although we rescue the deadlock, it is still a non-recoverable error as it left the application in a broken state which leads to unexpected error further downstream.

What does this PR do?

The error is mitigated by implementing dependency injection of the telemetry component instance instead of referencing the global telemetry component instance via mutex(from safely_synchronize)

The not so good part of it: Lots of code during components lifecycle are pure function/static, instead of object oriented. This means the telemetry component instance must be injected (because the function depends on it) to all the downstream functions. It leads to increased verbosity and shotgun surgery.

In order to make dependency injection work, I also changed to following:

  • Provide default :error for level on report method to decrease verbosity for callers.
  • Mixin Telemetry::Logging for Telemetry::Component, so that the component instance is able to report logs directly. By injecting the component instance, it could be used to repot other kinds of telemetry data when needed.
  • Prevent invoking on Telemetry::Logging directly, replace those by Telemetry::Logger which delegates to the global telemetry component. There is still risk causing deadlock with this module.
  • Refactor AppSec::Processor class methods to private instance methods in order to shared the telemetry component instance.

@github-actions github-actions bot added appsec Application Security monitoring product core Involves Datadog core libraries tracing labels Sep 4, 2024
@TonyCTHsu TonyCTHsu force-pushed the tonycthsu/avoid-telemetry-deadlock branch from 06f3c73 to 934d4c2 Compare September 4, 2024 13:27
@pr-commenter
Copy link

pr-commenter bot commented Sep 4, 2024

Benchmarks

Benchmark execution time: 2024-09-06 08:52:28

Comparing candidate commit 9868ed2 in PR branch tonycthsu/avoid-telemetry-deadlock with baseline commit ae50107 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 11 lines in your changes missing coverage. Please review.

Project coverage is 97.85%. Comparing base (ae50107) to head (9868ed2).
Report is 75 commits behind head on master.

Files with missing lines Patch % Lines
lib/datadog/appsec/processor.rb 77.27% 5 Missing ⚠️
spec/datadog/appsec/processor_spec.rb 96.11% 4 Missing ⚠️
lib/datadog/appsec.rb 50.00% 1 Missing ⚠️
spec/datadog/core/telemetry/logging_spec.rb 92.85% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3886   +/-   ##
=======================================
  Coverage   97.85%   97.85%           
=======================================
  Files        1277     1279    +2     
  Lines       76392    76489   +97     
  Branches     3744     3745    +1     
=======================================
+ Hits        74757    74852   +95     
- Misses       1635     1637    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TonyCTHsu TonyCTHsu force-pushed the tonycthsu/avoid-telemetry-deadlock branch from 934d4c2 to a4d99a1 Compare September 5, 2024 13:43
@github-actions github-actions bot added the profiling Involves Datadog profiling label Sep 5, 2024
@TonyCTHsu TonyCTHsu force-pushed the tonycthsu/avoid-telemetry-deadlock branch from 7b61550 to 80356eb Compare September 5, 2024 21:29
@TonyCTHsu TonyCTHsu changed the title Inject telemetry component to avoid deadlock Fix deadlock when sending telemetry logs Sep 6, 2024
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 bot Sep 6, 2024
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 bot Sep 6, 2024
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 bot Sep 6, 2024
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 bot Sep 6, 2024
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 bot Sep 6, 2024
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 bot Sep 6, 2024
@TonyCTHsu TonyCTHsu marked this pull request as ready for review September 6, 2024 08:15
@TonyCTHsu TonyCTHsu requested review from a team as code owners September 6, 2024 08:15
@TonyCTHsu TonyCTHsu added the dev/internal Other internal work that does not need to be included in the changelog label Sep 6, 2024
end

def required_addresses
[:required_addresses]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Suggested change
[:required_addresses]
%i[required_addresses]
Consider using the %i syntax instead (...read more)

The rule "Prefer %i to the literal array syntax" is a guideline that encourages the use of the %i syntax for arrays of symbols. This is a part of the Ruby style guide that aims to promote conciseness and readability.

Symbols are immutable, reusable objects often used in Ruby instead of strings when the value does not need to be changed. When declaring an array of symbols, using the %i syntax can make your code cleaner and easier to read.

To adhere to this rule, instead of declaring an array of symbols using the literal array syntax like [:foo, :bar, :baz], use the %i syntax like %i[foo bar baz]. It's a good practice to consistently use %i for arrays of symbols as it enhances code readability and maintainability.

View in Datadog  Leave us feedback  Documentation


expect(processor).to be_ready
expect(processor.diagnostics).to eq(:handle_diagnostics)
expect(processor.addresses).to eq([:required_addresses])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Suggested change
expect(processor.addresses).to eq([:required_addresses])
expect(processor.addresses).to eq(%i[required_addresses])
Consider using the %i syntax instead (...read more)

The rule "Prefer %i to the literal array syntax" is a guideline that encourages the use of the %i syntax for arrays of symbols. This is a part of the Ruby style guide that aims to promote conciseness and readability.

Symbols are immutable, reusable objects often used in Ruby instead of strings when the value does not need to be changed. When declaring an array of symbols, using the %i syntax can make your code cleaner and easier to read.

To adhere to this rule, instead of declaring an array of symbols using the literal array syntax like [:foo, :bar, :baz], use the %i syntax like %i[foo bar baz]. It's a good practice to consistently use %i for arrays of symbols as it enhances code readability and maintainability.

View in Datadog  Leave us feedback  Documentation

@TonyCTHsu TonyCTHsu merged commit e9dd206 into master Sep 6, 2024
190 of 191 checks passed
@TonyCTHsu TonyCTHsu deleted the tonycthsu/avoid-telemetry-deadlock branch September 6, 2024 13:46
@anmarchenko anmarchenko added this to the 2.4.0 milestone Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product core Involves Datadog core libraries dev/internal Other internal work that does not need to be included in the changelog profiling Involves Datadog profiling tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants