Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for http.client_ip tag for Rack-based frameworks #2248

Merged
merged 15 commits into from
Sep 8, 2022

Conversation

barweiss
Copy link
Contributor

What does this PR do?

Add support for the http.client_ip tag for Rack-based API frameworks.

Motivation

The http.client_ip tag will help distinguish whether the request came from a public or a private address, allowing Datadog to distinguish external from internal services and API endpoints.

How to test the change?

The change is validated by RSpec behavior tests.

To test integration:

Set up and start a demo Rails server and enable auto-instrumentation.
Then use the following commands:

curl localhost:3000 # direct IP
curl -H 'X-Forwarded-For: 1.1.1.1' localhost:3000 # forwarded IP
curl -H 'X-Forwarded-For: 1.1.1.1' -H 'X-Forwarded-For: 1.1.1.1' localhost:3000 # two values for the header = no client ip
curl -H 'X-Forwarded-For: 1.1.1.1' -H 'X-Real-Ip: 1.2.3.4' localhost:3000 # two "ip" headers set = no client ip

Review the logs from the server. The emitted traces should reflect the expected results.

@barweiss barweiss requested a review from a team August 30, 2022 12:44
Copy link
Member

@lloeki lloeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes needed but overall LGTM.

It seems like CircleCI ought to catch some linting issues but it did not run, or Rubocop is set to ignore them.

I'd rather have a second review from the Tracing team as well :)

valid_ipv4?(ip) || valid_ipv6?(ip)
end

# --- Section vendored from the ipaddress gem --- #
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rationale for not using IPAddr.new(addr).ipv4? and IPAddr.new(addr).ipv6??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to vendor third party code, we should find a way to slip the corresponding license, at least in LICENSE-3rd_party.csv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided IPAddr initially because it has the netmask handling which I didn't want but I guess it's solvable with some verification and it spares vendoring from ipaddress.
I'll give IPAddr a shot

end

def self.strip_zone_specifier(ipv6)
if /\A(.*?)%.*/ =~ ipv6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I had to add this as well at Sqreen.

This could be achieved with ipv6.gsub(/%.*$/, '') instead of relying on intermediate match objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Fixed

o.default { env_to_bool(Tracing::Configuration::Ext::ClientIp::ENV_DISABLED, false) }
o.lazy
end
# An optional name of a custom header to resolve the client IP from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# An optional name of a custom header to resolve the client IP from.
# An optional name of a custom header to resolve the client IP from.

Linter should have caught this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops.
Rubocop seems to pass but I shouldv'e considered the formatting in the other options.
Fixed

TAG_MULTIPLE_IP_HEADERS = '_dd.multiple-ip-headers'.freeze

# A header collection implementation that looks up headers in a Hash.
class HashHeaderCollection < HeaderCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could go to the Core namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, fixed

].freeze

# A collection of headers.
class HeaderCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could go to the Core namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return
end

unless valid_ip?(ip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this nesting of unless is confusing.

Is this necessary since extract_ip_from_full_address seems to return the address when its internal criteria don't match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely refactored that part, let me know if it makes more sense now :)

return {} unless headers

{}.tap do |result|
DEFAULT_IP_HEADERS_NAMES.each do |name|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a job for DEFAULT_IP_HEADERS_NAMES.each_with_object({}) do |name, result|

Copy link
Contributor Author

@barweiss barweiss Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with reduce (each_with_object returns an array)

EDIT: No it does not, I misread the documentation. Changed

return headers.get(configuration.header_name) if configuration.header_name && headers

ip_values_from_headers = ip_headers(headers).values
case ip_values_from_headers.size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case ip_values_from_headers.size
case ip_values_from_headers.size

Method return value of this method comes from the case return value, so I'd suggest to space this to conform with the Ruby style guide where implicit returns live on their own "line".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done

@@ -219,13 +226,13 @@ def configuration
Datadog.configuration.tracing[:rack]
end

def parse_request_headers(env)
def parse_request_headers(headers)
{}.tap do |result|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to above, looks like a job for each_with_object

Copy link
Contributor Author

@barweiss barweiss Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return if configuration.disabled

ip = client_address_from_request(headers, remote_ip)
if !configuration.header_name && ip.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is only done when there's no IP reported? This feels off somehow (like, when would this happen), so I must be missing something.

Copy link
Contributor Author

@barweiss barweiss Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is still confusing after the refactor

@barweiss barweiss requested a review from lloeki August 31, 2022 07:05
# @param [Span] span The span that's associated with the request.
# @param [HeaderCollection, #get, nil] headers A collection with the request headers.
# @param [String, nil] remote_ip The remote IP the request associated with the span is sent to.
def self.set_client_ip_tag(span, headers, remote_ip)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question: should this have ! in the name?

Copy link
Contributor

@TonyCTHsu TonyCTHsu Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! is a Ruby convention for hinting side-effect. For example, Array#map returns a new array while Array#map! returns a mutated self. Another common pattern is whether a method raise an exception, such as find_by vs find_by!.

I don't think this method deserve !, but this one seems to be a better candidate.


Since headers, remote_ip are optional. It is awkward to see

client_ip.set_client_ip_tag(span, nil, '1.1.1.1').

Consider making those keyword argument, so the caller makes the invokation a bit more explicit like,

client_ip.set_client_ip_tag(span, remote_ip: '1.1.1.1') or
client_ip.set_client_ip_tag(span, headers: headers).


Control flow using exception handling is slow in Ruby. This method need to be changed for

  1. Reconsider whether if the scenario is truly an exception
  2. Streamline the exceptional handling cases to avoid nested rescue/raise

Copy link
Contributor Author

@barweiss barweiss Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks!

As for using exceptions, I was indeed wondering whether this would have a performance impact. I'll change it so it doesn't use any exceptions.

#
# @default `DD_TRACE_CLIENT_IP_HEADER_DISABLED` environment variable, otherwise `false`.
# @return [Boolean]
option :disabled do |o|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, prefer enabled to disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually agree but since the environment variable is DISABLED I figured I'd keep it disabled.
Do you still think it's worth the setting name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename the environment variable to DD_TRACE_CLIENT_IP_HEADER_ENABLED to be consistent with other environment variable?

# @param [Span] span The span that's associated with the request.
# @param [HeaderCollection, #get, nil] headers A collection with the request headers.
# @param [String, nil] remote_ip The remote IP the request associated with the span is sent to.
def self.set_client_ip_tag(span, headers, remote_ip)
Copy link
Contributor

@TonyCTHsu TonyCTHsu Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! is a Ruby convention for hinting side-effect. For example, Array#map returns a new array while Array#map! returns a mutated self. Another common pattern is whether a method raise an exception, such as find_by vs find_by!.

I don't think this method deserve !, but this one seems to be a better candidate.


Since headers, remote_ip are optional. It is awkward to see

client_ip.set_client_ip_tag(span, nil, '1.1.1.1').

Consider making those keyword argument, so the caller makes the invokation a bit more explicit like,

client_ip.set_client_ip_tag(span, remote_ip: '1.1.1.1') or
client_ip.set_client_ip_tag(span, headers: headers).


Control flow using exception handling is slow in Ruby. This method need to be changed for

  1. Reconsider whether if the scenario is truly an exception
  2. Streamline the exceptional handling cases to avoid nested rescue/raise


describe 'ip validation' do
context 'when given valid ip addresses' do
subject do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified, without custom matcher

describe ".validate_ip" do
  ips = [
    ...
  ]

  ips.each do |ip|
    context 'when given #{ip}' do
      let(:normalised) { described_class.strip_decorations(ip) }
      subject { described_class.validate_ip(normalised) }

      it { is_expected.to be true }
    end
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

let(:tagging_enabled) { false }

it 'does nothing' do
expect(span).to_not receive(:set_tag).with(any_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with(any_args) is redundant and assertion expect(span).to_not receive(:set_tag) should be strong enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, fixed

@barweiss barweiss requested a review from TonyCTHsu September 1, 2022 07:55
@barweiss
Copy link
Contributor Author

barweiss commented Sep 5, 2022

One of the remaining issues in this PR is choosing to use the builtin IPAddr class, or vendoring the ipaddress gem.

In order to check whether a string is a valid IP with IPAddr, we need to see that IPAddr.new does not raise an exception. However, IPAddress.valid? allows checking for invalid IP addresses without raising exceptions.

One of my concerns was whether raising an exceptions will have a significant performance impact, considering that in cases of misconfiguration, an invalid IP can be present in every traced request.

After a discussion with @TonyCTHsu, and following his advice, I ran some benchmarks that make it clear there's no significant performance hit:

Warming up --------------------------------------
              ipaddr    13.742k i/100ms
           ipaddress    14.983k i/100ms
Calculating -------------------------------------
              ipaddr    141.715k (± 1.5%) i/s -    714.584k in   5.043502s
           ipaddress    148.502k (± 1.6%) i/s -    749.150k in   5.046105s

Comparison:
           ipaddress:   148501.6 i/s
              ipaddr:   141714.9 i/s - 1.05x  (± 0.00) slower

@lloeki
Copy link
Member

lloeki commented Sep 7, 2022

Approved, provided the integration CI failures are investigated if they still fail after a re-run.

class HashHeaderCollection < HeaderCollection
def initialize(hash)
super()
@hash = hash.transform_keys(&:downcase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash#transform_keys is only available in Ruby 2.5 or later:

puts({}.transform_keys(&:downcase))
#  == 2.4.10 ==
# -e:1:in `<main>': undefined method `transform_keys' for {}:Hash (NoMethodError)
# Did you mean?  transform_values
#
#  == 2.5.6 ==
# {}

Is this method being hit during our test runs? It should show up as a failure in 2.1-2.4 test cases, if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in spec/datadog/tracing/client_ip_spec, odd. I'll try to see why it isn't called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotc marcotc merged commit d2ba9c9 into master Sep 8, 2022
@marcotc marcotc deleted the bar.weiss/client-ip-rack-support branch September 8, 2022 18:58
@marcotc marcotc added the integrations Involves tracing integrations label Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants