Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for http.client_ip tag for Rack-based frameworks #2248

Merged
merged 15 commits into from
Sep 8, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions lib/datadog/core/configuration/settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,28 @@ def initialize(*_)
# @default `{}`
# @return [Hash,nil]
option :writer_options, default: ->(_i) { {} }, lazy: true

# Client IP configuration
# @public_api
settings :client_ip do
# Whether client IP collection is disabled. This disables client IPs from HTTP requests to be reported in traces.
#
# @default `DD_TRACE_CLIENT_IP_HEADER_DISABLED` environment variable, otherwise `false`.
# @return [Boolean]
option :disabled do |o|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, prefer enabled to disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually agree but since the environment variable is DISABLED I figured I'd keep it disabled.
Do you still think it's worth the setting name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename the environment variable to DD_TRACE_CLIENT_IP_HEADER_ENABLED to be consistent with other environment variable?

o.default { env_to_bool(Tracing::Configuration::Ext::ClientIp::ENV_DISABLED, false) }
o.lazy
end

# An optional name of a custom header to resolve the client IP from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# An optional name of a custom header to resolve the client IP from.
# An optional name of a custom header to resolve the client IP from.

Linter should have caught this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops.
Rubocop seems to pass but I shouldv'e considered the formatting in the other options.
Fixed

#
# @default `DD_TRACE_CLIENT_IP_HEADER` environment variable, otherwise `nil`.
# @return [String,nil]
option :header_name do |o|
o.default { ENV.fetch(Tracing::Configuration::Ext::ClientIp::ENV_HEADER_NAME, nil) }
o.lazy
end
end
end

# The `version` tag in Datadog. Use it to enable [Deployment Tracking](https://docs.datadoghq.com/tracing/deployment_tracking/).
Expand Down
39 changes: 39 additions & 0 deletions lib/datadog/core/header_collection.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
module Datadog
module Core
# A some-what abstract class representing a collection of headers.
#
# Use the `HeaderCollection.from_hash` function to create a header collection from a `Hash`.
# Another option is to use `HashHeaderCollection` directly.
class HeaderCollection
# Gets a single value of the header with the given name, case insensitive.
#
# @param [String] header_name Name of the header to get the value of.
# @returns [String, nil] A single value of the header, or nil if the header with
# the given name is missing from the collection.
def get(header_name)
nil
end

# Create a header collection that retrieves headers from the given Hash.
#
# This can be useful for testing or other trivial use cases.
#
# @param [Hash] hash Hash with the headers.
def self.from_hash(hash)
HashHeaderCollection.new(hash)
end
end

# A header collection implementation that looks up headers in a Hash.
class HashHeaderCollection < HeaderCollection
def initialize(hash)
super()
@hash = hash.transform_keys(&:downcase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash#transform_keys is only available in Ruby 2.5 or later:

puts({}.transform_keys(&:downcase))
#  == 2.4.10 ==
# -e:1:in `<main>': undefined method `transform_keys' for {}:Hash (NoMethodError)
# Did you mean?  transform_values
#
#  == 2.5.6 ==
# {}

Is this method being hit during our test runs? It should show up as a failure in 2.1-2.4 test cases, if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in spec/datadog/tracing/client_ip_spec, odd. I'll try to see why it isn't called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end

def get(header_name)
@hash[header_name.downcase]
end
end
end
end
167 changes: 167 additions & 0 deletions lib/datadog/tracing/client_ip.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# typed: true

require 'ipaddr'

require_relative '../core/configuration'
require_relative 'metadata/ext'
require_relative 'span'

module Datadog
module Tracing
# Common functions for supporting the `http.client_ip` span attribute.
module ClientIp
DEFAULT_IP_HEADERS_NAMES = %w[
x-forwarded-for
x-real-ip
x-client-ip
x-forwarded
x-cluster-client-ip
forwarded-for
forwarded
via
true-client-ip
].freeze

TAG_MULTIPLE_IP_HEADERS = '_dd.multiple-ip-headers'.freeze

# Sets the `http.client_ip` tag on the given span.
#
# This function respects the user's settings: if they disable the client IP tagging,
# or provide a different IP header name.
#
# If multiple IP headers are present in the request, this function will instead set
# the `_dd.multiple-ip-headers` tag with the names of the present headers,
# and **NOT** set the `http.client_ip` tag.
#
# @param [Span] span The span that's associated with the request.
# @param [HeaderCollection, #get, nil] headers A collection with the request headers.
# @param [String, nil] remote_ip The remote IP the request associated with the span is sent to.
def self.set_client_ip_tag(span, headers, remote_ip)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question: should this have ! in the name?

Copy link
Contributor

@TonyCTHsu TonyCTHsu Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! is a Ruby convention for hinting side-effect. For example, Array#map returns a new array while Array#map! returns a mutated self. Another common pattern is whether a method raise an exception, such as find_by vs find_by!.

I don't think this method deserve !, but this one seems to be a better candidate.


Since headers, remote_ip are optional. It is awkward to see

client_ip.set_client_ip_tag(span, nil, '1.1.1.1').

Consider making those keyword argument, so the caller makes the invokation a bit more explicit like,

client_ip.set_client_ip_tag(span, remote_ip: '1.1.1.1') or
client_ip.set_client_ip_tag(span, headers: headers).


Control flow using exception handling is slow in Ruby. This method need to be changed for

  1. Reconsider whether if the scenario is truly an exception
  2. Streamline the exceptional handling cases to avoid nested rescue/raise

Copy link
Contributor Author

@barweiss barweiss Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks!

As for using exceptions, I was indeed wondering whether this would have a performance impact. I'll change it so it doesn't use any exceptions.

return if configuration.disabled

begin
address = raw_ip_from_request(headers, remote_ip)
if address.nil?
# `address` can be `nil` if a custom header is configured but not present in the request.
# In that case, assume misconfiguration and avoid setting the tag.
return
end

ip = strip_decorations(address)

validate_ip(ip)

span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP, ip)
rescue InvalidIpError
# Do nothing, assuming logs will spam here.
rescue MultipleIpHeadersError => e
span.set_tag(TAG_MULTIPLE_IP_HEADERS, e.header_names.join(','))
end
end

# Returns the value of an IP-related header or the request's remote IP.
#
# The client IP is looked up by the following logic:
# * If the user has configured a header name, return that header's value.
# * If exactly one of the known IP headers is present, return that header's value.
# * If none of the known IP headers are present, return the remote IP from the request.
#
# Raises a [MultipleIpHeadersError] if multiple IP-related headers are present.
#
# @param [Datadog::Core::HeaderCollection, #get, nil] headers The request headers
# @param [String] remote_ip The remote IP of the request.
# @return [String] An unprocessed value retrieved from an
# IP header or the remote IP of the request.
def self.raw_ip_from_request(headers, remote_ip)
return headers && headers.get(configuration.header_name) if configuration.header_name

headers_present = ip_headers(headers)

case headers_present.size
when 0
remote_ip
when 1
headers_present.values.first
else
raise MultipleIpHeadersError, headers_present.keys
end
end

# Removes any port notations or zone specifiers from the IP address without
# verifying its validity.
def self.strip_decorations(address)
return strip_ipv4_port(address) if likely_ipv4?(address)

address = strip_ipv6_port(address)

strip_zone_specifier(address)
end

def self.strip_zone_specifier(ipv6)
ipv6.gsub(/%.*/, '')
end

def self.strip_ipv4_port(ip)
ip.gsub(/:\d+\z/, '')
end

def self.strip_ipv6_port(ip)
if /\[(.*)\](?::\d+)?/ =~ ip
Regexp.last_match(1)
else
ip
end
end

# Returns whether the given value is more likely to be an IPv4 than an IPv6 address.
#
# This is done by checking if a dot (`'.'`) character appears before a colon (`':'`) in the value.
# The rationale is that in valid IPv6 addresses, colons will always preced dots,
# and in valid IPv4 addresses dots will always preced colons.
def self.likely_ipv4?(value)
dot_index = value.index('.') || value.size
colon_index = value.index(':') || value.size

dot_index < colon_index
end

def self.validate_ip(ip)
# IPs with netmasks are invalid.
raise InvalidIpError if ip.include?('/')

begin
IPAddr.new(ip)
rescue IPAddr::Error
raise InvalidIpError
end
end

def self.ip_headers(headers)
return {} unless headers

DEFAULT_IP_HEADERS_NAMES.each_with_object({}) do |name, result|
value = headers.get(name)
result[name] = value unless value.nil?
end
end

def self.configuration
Datadog.configuration.tracing.client_ip
end

class InvalidIpError < RuntimeError
end

# An error that represents that multiple IP headers were present in a request,
# thus a singular IP value could not be determined.
class MultipleIpHeadersError < RuntimeError
attr_reader :header_names

def initialize(header_names)
super
@header_names = header_names
end
end
end
end
end
6 changes: 6 additions & 0 deletions lib/datadog/tracing/configuration/ext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ module Transport
ENV_DEFAULT_PORT = 'DD_TRACE_AGENT_PORT'.freeze
ENV_DEFAULT_URL = 'DD_TRACE_AGENT_URL'.freeze
end

# @public_api
module ClientIp
ENV_DISABLED = 'DD_TRACE_CLIENT_IP_HEADER_DISABLED'.freeze
ENV_HEADER_NAME = 'DD_TRACE_CLIENT_IP_HEADER'.freeze
end
end
end
end
Expand Down
35 changes: 35 additions & 0 deletions lib/datadog/tracing/contrib/rack/header_collection.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
require_relative '../../../core/header_collection'

module Datadog
module Tracing
module Contrib
module Rack
# Classes and utilities for handling headers in Rack.
module Header
# An implementation of a header collection that looks up headers from a Rack environment.
class RequestHeaderCollection < Datadog::Core::HeaderCollection
# Creates a header collection from a rack environment.
def initialize(env)
super()
@env = env
end

# Gets the value of the header with the given name.
def get(header_name)
@env[Header.to_rack_header(header_name)]
end

# Tests whether a header with the given name exists in the environment.
def key?(header_name)
@env.key?(Header.to_rack_header(header_name))
end
end

def self.to_rack_header(name)
"HTTP_#{name.to_s.upcase.gsub(/[-\s]/, '_')}"
end
end
end
end
end
end
36 changes: 19 additions & 17 deletions lib/datadog/tracing/contrib/rack/middlewares.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
require 'date'

require_relative '../../../core/environment/variable_helpers'
require_relative '../../client_ip'
require_relative '../../metadata/ext'
require_relative '../../propagation/http'
require_relative '../analytics'
require_relative '../utils/quantization/http'
require_relative 'ext'
require_relative 'header_collection'
require_relative 'request_queue'
require_relative '../utils/quantization/http'

module Datadog
module Tracing
Expand Down Expand Up @@ -133,8 +135,9 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
url = env['REQUEST_URI'] || original_env['PATH_INFO']
request_headers = parse_request_headers(env)
response_headers = parse_response_headers(headers || {})
request_header_collection = Header::RequestHeaderCollection.new(env)
request_headers_tags = parse_request_headers(request_header_collection)
response_headers_tags = parse_response_headers(headers || {})

# The priority
# 1. User overrides span.resource
Expand Down Expand Up @@ -177,6 +180,10 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP).nil?
Tracing::ClientIp.set_client_ip_tag(request_span, request_header_collection, env['REMOTE_ADDR'])
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil?
request_obj = ::Rack::Request.new(env)

Expand All @@ -195,12 +202,12 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
end

# Request headers
request_headers.each do |name, value|
request_headers_tags.each do |name, value|
request_span.set_tag(name, value) if request_span.get_tag(name).nil?
end

# Response headers
response_headers.each do |name, value|
response_headers_tags.each do |name, value|
request_span.set_tag(name, value) if request_span.get_tag(name).nil?
end

Expand All @@ -219,14 +226,13 @@ def configuration
Datadog.configuration.tracing[:rack]
end

def parse_request_headers(env)
{}.tap do |result|
whitelist = configuration[:headers][:request] || []
whitelist.each do |header|
rack_header = header_to_rack_header(header)
if env.key?(rack_header)
result[Tracing::Metadata::Ext::HTTP::RequestHeaders.to_tag(header)] = env[rack_header]
end
def parse_request_headers(headers)
whitelist = configuration[:headers][:request] || []
whitelist.each_with_object({}) do |header, result|
header_value = headers.get(header)
unless header_value.nil?
header_tag = Tracing::Metadata::Ext::HTTP::RequestHeaders.to_tag(header)
result[header_tag] = header_value
end
end
end
Expand All @@ -248,10 +254,6 @@ def parse_response_headers(headers)
end
end
end

def header_to_rack_header(name)
"HTTP_#{name.to_s.upcase.gsub(/[-\s]/, '_')}"
end
end
end
end
Expand Down
1 change: 1 addition & 0 deletions lib/datadog/tracing/metadata/ext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ module HTTP
TYPE_OUTBOUND = 'http'
TYPE_PROXY = 'proxy'
TYPE_TEMPLATE = 'template'
TAG_CLIENT_IP = 'http.client_ip'

# General header functionality
module Headers
Expand Down
Loading