Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XRay not working #832

Closed
scar-lovevery opened this issue Jun 22, 2021 · 10 comments
Closed

XRay not working #832

scar-lovevery opened this issue Jun 22, 2021 · 10 comments

Comments

@scar-lovevery
Copy link
Contributor

scar-lovevery commented Jun 22, 2021

Not sure if it's related to:
open-telemetry/opentelemetry-collector#3405
or
aws-observability/aws-otel-collector#537
Seems like both of these indicate that the trace id wasnt correctly generated, which i'm also guessing is the issue i'm facing.

I'm getting a similar behavior to the above issues.

When sending data to the aws-otel-collector I'm getting the "Permanent error: SerializationException" much like they are. I have tried many different configurations.

I know that the aws-otel-collector is configured correctly if I run a java example, I get traces forwarded to XRay:

# create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: aws-otel-eks
  labels:
    name: aws-otel-eks
---
# create deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: aws-otel-eks
  namespace: aws-otel-eks
  labels:
    name: aws-otel-eks
spec:
  replicas: 1
  selector:
    matchLabels:
      name: aws-otel-eks
  template:
    metadata:
      labels:
        name: aws-otel-eks
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
        - name: aws-otel-emitter
          image: "aottestbed/aws-otel-collector-java-sample-app:0.9.0"
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://aws-otel-collector.example.svc.cluster.local:4317"
            - name: OTEL_RESOURCE
              value: ClusterName=example
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: "service.namespace=AWSObservability,service.name=CloudWatchEKSService"
            - name: S3_REGION
              value: eu-central-1
            - name: LISTEN_ADDRESS
              value: "0.0.0.0:4567"
          imagePullPolicy: Always
        - name: curl
          imagePullPolicy: Always
          image: ellerbrock/alpine-bash-curl-ssl:latest
          command: [ "/bin/bash", "-c", "sleep 10; while :; do curl localhost:4567/outgoing-http-call > /dev/null 1>&1; sleep 2; curl localhost:4567/aws-sdk-call > /dev/null 2>&1; sleep 5; done" ]

I'm attempting to use the XRay propagator (0.18.0)
Tried(environment var) OTEL_PROPAGATORS:
value: xray,tracecontext,baggage
&
value: tracecontext,baggage,xray
& (this one is very angry)
value: xray

Is that all that needs to be configured to use that?

The ruby app is configured correctly to export using otlp. With the collector I'm able to print the trace data if I just use the logging exporter, just doesnt work with XRay.

Wondering if the XRay propagator works? Or more likely I'm using it incorrectly? I'd love some help or ideas to try.

Seems like it's not sending 'X-Amzn-Trace-Id' in the logs of the collector, Is there an example somewhere how to configure the ruby sdk with XRay correctly?

@arielvalentin
Copy link
Contributor

@SamirHafez @schanjr do you have any insights that you could share here?

@scar-lovevery
Copy link
Contributor Author

More information! After playing with the example (https://github.com/open-telemetry/opentelemetry-ruby/tree/main/examples/http) and doing some instrumentation (logging, haha). Seems like environment var should be OTEL_PROPAGATORS: 'xray', well in the simpler example this seems to be happy with faraday instrumented client.

(I can fork and push this example if that'd help.)

However with the rack (auto) instrumentation for installed i'm getting the stack trace:

2021-06-23 15:51:55 +0000: Rack app error handling request { GET /api/v1/health }
#<NoMethodError: undefined method `match' for nil:NilClass>
/workspace/vendor/bundle/ruby/2.6.0/gems/opentelemetry-propagator-xray-0.18.0/lib/opentelemetry/propagator/xray/text_map_propagator.rb:91:in `parse_header'
/workspace/vendor/bundle/ruby/2.6.0/gems/opentelemetry-propagator-xray-0.18.0/lib/opentelemetry/propagator/xray/text_map_propagator.rb:41:in `extract'
/workspace/vendor/bundle/ruby/2.6.0/gems/opentelemetry-instrumentation-rack-0.18.0/lib/opentelemetry/instrumentation/rack/middlewares/tracer_middleware.rb:62:in `call'
/workspace/vendor/bundle/ruby/2.6.0/gems/rack-cors-1.1.1/lib/rack/cors.rb:100:in `call'
/workspace/vendor/bundle/ruby/2.6.0/gems/railties-6.0.3.7/lib/rails/engine.rb:527:in `call'
/workspace/vendor/bundle/ruby/2.6.0/gems/puma-4.3.8/lib/puma/configuration.rb:228:in `call'
/workspace/vendor/bundle/ruby/2.6.0/gems/puma-4.3.8/lib/puma/server.rb:718:in `handle_request'
/workspace/vendor/bundle/ruby/2.6.0/gems/puma-4.3.8/lib/puma/server.rb:472:in `process_client'
/workspace/vendor/bundle/ruby/2.6.0/gems/puma-4.3.8/lib/puma/server.rb:328:in `block in run'
/workspace/vendor/bundle/ruby/2.6.0/gems/puma-4.3.8/lib/puma/thread_pool.rb:134:in `block in spawn_thread'

This is happening for all health checks from the Kubernetes readiness/liveness checks. So in the case of a client that isn't instrumented, is there a different method that I should be using?

@scar-lovevery
Copy link
Contributor Author

scar-lovevery commented Jun 23, 2021

Even more info!

Modified the server example to be a little simpler and include XRay:

#!/usr/bin/env ruby
# frozen_string_literal: true

# Copyright The OpenTelemetry Authors
#
# SPDX-License-Identifier: Apache-2.0

require 'rubygems'
require 'bundler/setup'
require 'sinatra/base'

require 'faraday'


# Require otel-ruby
require 'opentelemetry/sdk'
require 'opentelemetry/propagator/xray'
require 'opentelemetry/instrumentation/sinatra'
require 'opentelemetry/instrumentation/faraday'

# Export traces to console by default
ENV['OTEL_TRACES_EXPORTER'] ||= 'console'
ENV['OTEL_PROPAGATORS'] ||= 'xray'
ENV['OTEL_LOG_LEVEL'] ||= 'debug'

host = ENV.fetch('HTTP_EXAMPLE_HOST', '0.0.0.0')

OpenTelemetry::SDK.configure do |c|
  c.use 'OpenTelemetry::Instrumentation::Sinatra'
  c.use 'OpenTelemetry::Instrumentation::Faraday'
end

class App < Sinatra::Base
  set :bind, '0.0.0.0'
  #use OpenTelemetryMiddleware

  get '/hello' do
    connection = Faraday.new("http://#{host}:4567")
    url = '/world'
    response = connection.get(url)
    "Hello! #{response}"
  end
  get '/world' do
    'World!'
  end

  run! if app_file == $0
end

When curling:

curl 127.0.0.1:4567/hello
NoMethodError: undefined method `match' for nil:NilClass
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/propagator/xray/lib/opentelemetry/propagator/xray/text_map_propagator.rb:103:in `parse_header'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/propagator/xray/lib/opentelemetry/propagator/xray/text_map_propagator.rb:43:in `extract'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/instrumentation/sinatra/lib/opentelemetry/instrumentation/sinatra/middlewares/tracer_middleware.rb:18:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-protection-2.0.7/lib/rack/protection/xss_header.rb:18:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-protection-2.0.7/lib/rack/protection/path_traversal.rb:16:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-protection-2.0.7/lib/rack/protection/json_csrf.rb:26:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-protection-2.0.7/lib/rack/protection/base.rb:50:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-protection-2.0.7/lib/rack/protection/base.rb:50:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-protection-2.0.7/lib/rack/protection/frame_options.rb:31:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-2.2.3/lib/rack/null_logger.rb:11:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/sinatra-2.0.7/lib/sinatra/show_exceptions.rb:22:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/sinatra-2.0.7/lib/sinatra/base.rb:194:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/sinatra-2.0.7/lib/sinatra/base.rb:1950:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/sinatra-2.0.7/lib/sinatra/base.rb:1502:in `block in call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/sinatra-2.0.7/lib/sinatra/base.rb:1729:in `synchronize'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/sinatra-2.0.7/lib/sinatra/base.rb:1502:in `call'
	/Users/scar/workspace/repos/3rdparty/opentelemetry-ruby/examples/http/vendor/bundle/ruby/2.6.0/gems/rack-2.2.3/lib/rack/handler/webrick.rb:95:in `service'
	/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/webrick/httpserver.rb:140:in `service'
	/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/webrick/httpserver.rb:96:in `run'
	/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/webrick/server.rb:307:in `block in start_thread'

Seems like the same error.

If I add a "return context unless header" between like 40 and 41 on https://github.com/open-telemetry/opentelemetry-ruby/blob/main/propagator/xray/lib/opentelemetry/propagator/xray/text_map_propagator.rb#L41
Seems like it fixes this.

I wonder if I'm allowed to PR this?

@arielvalentin
Copy link
Contributor

Yes! We would appreciate any contributions. Please open PR!

🥳

@scar-lovevery
Copy link
Contributor Author

scar-lovevery commented Jun 23, 2021

Tried to push a branch for PR, wont let me create a branch. Read the manual, submitted PR. shrug It's a one line fix, I think, haven't tested in an actual environment that has IAM for XRAY.

@scar-lovevery
Copy link
Contributor Author

Unfortunately it's still angry.

Collector logs(just a subset, lots of the same thing over and over):

2021-06-23T20:23:53.019Z	debug	[email protected]/awsxray.go:65	Error translating span.	{"kind": "exporter", "name": "awsxray", "error": "invalid xray traceid: 1f3a071fbdad1df1007e1a47060a7845"}
2021-06-23T20:23:53.020Z	DEBUG	loggingexporter/logging_exporter.go:48	ResourceSpans #0
Resource labels:
     -> service.name: STRING(app-name)
     -> process.pid: INT(1)
     -> process.command: STRING(/workspace/vendor/bundle/ruby/2.6.0/bin/puma)
     -> process.runtime.name: STRING(ruby)
     -> process.runtime.version: STRING(2.6.5)
     -> process.runtime.description: STRING(ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-linux-musl])
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.language: STRING(ruby)
     -> telemetry.sdk.version: STRING(1.0.0.rc1)
     -> service.namespace: STRING(namespace)
InstrumentationLibrarySpans #0
InstrumentationLibrary OpenTelemetry::Instrumentation::PG 0.18.0
Span #0
    Trace ID       : 1f3a071fbdad1df1007e1a47060a7845
    Parent ID      :
    ID             : ec6d8d5702615071
    Name           : SELECT db_name
    Kind           : SPAN_KIND_CLIENT
    Start time     : 2021-06-23 20:23:49.015204897 +0000 UTC
    End time       : 2021-06-23 20:23:49.016361863 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message :
Attributes:
     -> db.system: STRING(postgresql)
     -> db.user: STRING(XXXXX)
     -> db.name: STRING(XXXXX)
     -> net.peer.name: STRING(XXXXX.eu-central-1.rds.amazonaws.com)
     -> net.transport: STRING(IP.TCP)
     -> net.peer.port: STRING(5432)
     -> db.operation: STRING(SELECT)
     -> db.statement: STRING(SELECT 1)

Wondering if the contrib package isn't picking up the correct ID? Hmm. So that PR definitely fixes an issue, but the fundamental problem still exists. (Traces aren't making it to XRay)

@fbogsany
Copy link
Contributor

AWS XRay has specific requirements around trace ID generation - see https://docs.aws.amazon.com/xray/latest/api/API_PutTraceSegments.html for details. The default trace ID generation in the SDK is incompatible with AWS XRay requirements, but a valid XRay trace ID can be encoded as a valid OpenTelemetry trace ID, so you just need to replace the generator.

I apologize for the complete lack of documentation (at least, that I could find), but you can plugin an alternative ID generator in your OpenTelemetry::SDK.configure block with c.id_generator = .... Your generator needs to implement generate_trace_id and generate_span_id - you can look at the OpenTelemetry::Trace module for details of those methods.

@scar-lovevery
Copy link
Contributor Author

Threw together a PR that seems to work. I'm a ruby noob, so, any guidance is appreciated. Have questions in the PR (#840) hopefully someone point me in the correct direction here.

@schanjr
Copy link
Contributor

schanjr commented Jun 28, 2021

@scar-lovevery late to the conversation, I was able to successfully create correct trace ids, but was not able to get it pushed to a real AWS environment in the Jruby world. Might be slightly different from your use case, I had issues with the collector.

My last update on pushing the traces to the collector was like this:

OTLP receiver --> Xray Exporter (broken in JRUBY. protobuf gem is not working, protocolbuffers/protobuf#7923)

Jaeger receiver --> Xray Exporter (broken at the ADOT collector. It does not transform trace id format to Xray format. aws-observability/aws-otel-collector#562)

I got around with creating the correct traces doing some manual instrumentation at the Rack level, mainly utilizing the xray propagator extract and inject methods manually.

# a middleware for rack
# Trace::GlobalPropagator is using OpenTelemetry::Propagator::XRay::TextMapPropagator.new

def call(env)
      context = nil
      trace_id_provided = false
      # if trace id is given at the header level, extract it and use it
      unless env[Trace::Constants::RACK_CONTEXT_KEY].nil?
        context = Trace::GlobalPropagator.extract(env)
        trace_id_provided = true
      end

      status = 200
      headers = {}
      response_body = ''
      # For attribute naming, see
      # https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/data-semantic-conventions.md#http-server

      # Span kind MUST be `:server` for a HTTP server span
      span = @tracer.start_span(
        env['PATH_INFO'],
        attributes: {
          'component' => 'http',
          'http.method' => env['REQUEST_METHOD'],
          'http.route' => env['PATH_INFO'],
          'http.url' => env['REQUEST_URI']
        },
        kind: :server,
        with_parent: context
      )
      OpenTelemetry::Trace.with_span(span) do |child_span|
        # if trace id was not given, generate one using the inject method, and use it
        unless trace_id_provided
          id = {}
          Trace::GlobalPropagator.inject(id)
          env[Trace::Constants::RACK_CONTEXT_KEY] = id[Trace::Constants::OUTGOING_TRACE_ID_HEADER_KEY]
        end
        # Run application stack.
        status, headers, response_body = @app.call(env)
        child_span.set_attribute('http.status_code', status)
      end

      [status, headers, response_body]
    ensure
      span&.finish
    end
require 'opentelemetry-propagator-xray'

module Trace
  class GlobalPropagator
    @propagator ||= OpenTelemetry::Propagator::XRay::TextMapPropagator.new

    class << self
      def extract(carrier,
                  context: OpenTelemetry::Context.current,
                  getter: OpenTelemetry::Context::Propagation.rack_env_getter)
        @propagator.extract(carrier, context: context, getter: getter)
      end

      def inject(carrier,
                 context: OpenTelemetry::Context.current,
                 setter: OpenTelemetry::Context::Propagation.text_map_setter)
        @propagator.inject(carrier, context: context, setter: setter)
      end

    end
  end
end

@scar-lovevery
Copy link
Contributor Author

@schanjr
Thanks for the info. For me it was ultimately an ID generation issue. I PR'd a fix #840

Using:

require 'opentelemetry/propagator/xray'

OpenTelemetry::SDK.configure do |c|
  c.id_generator = OpenTelemetry::Propagator::XRay::IDGenerator
end

Fixes this issue for me. I'm able to get traces from my instrumented app through the aws-otel-collector into XRay. I wonder if I'm having a similar issue to you and haven't noticed? It's possible.

Since the root of my issue is fixed, I'm going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants