Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/tracecontext integration test #228

Conversation

toumorokoshi
Copy link
Member

This introduces the w3c tracing validation service as an integration test for opentelemetry's tracecontext implementation, enabling us to validate our tracecontext implementation without authoring another complete suite.

Currently the integration test fails due to w3c/trace-context#341

This includes fixes to the tracecontexthttptextformatter to adhere to the specification.

@toumorokoshi
Copy link
Member Author

there are two outstanding issues (in addition to the ows failing test case that I believed needs to be fixed upstream):

  1. wsgi integration tests are failing (due to tracecontext now being the default implementation, and it's specification to return new spans in the case where it's unable to parse or find headers)

  2. black code formatting considers the ./target/ directory, which contains cloned code and thus should not be considered.

I am working toward fixing these issues.

@@ -44,60 +44,6 @@ def test_no_traceparent_header(self):
span_context = FORMAT.extract(get_as_list, output)
self.assertTrue(isinstance(span_context, trace.SpanContext))

def test_from_headers_tracestate_entry_limit(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests were invalid behaviors in reference to the spec. I removed them as the integration test will validate it is working as intended.

@@ -304,6 +305,31 @@ def format_span_id(span_id: int) -> str:
return "0x{:016x}".format(span_id)


def generate_span_id() -> int:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the w3c tracecontext spec calls for generating valid span contexts in the case where one cannot parse tracecontext headers. As such it was necessary to lift this code into the API.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because I think we need to think the change with generate_spancontext through. I think we should rather fix the handling of INVALID_SPANCONTEXT in the SDK. Moving generate_spancontext to the API would break open-telemetry/oteps#58 (even though I'm not sure if that will be merged anytime soon, I'd rather not make it impossible to implement). Note that we encountered this problem already in #226 which uses a different solution for the same problem.

@mauriciovasquezbernal
Copy link
Member

2. black code formatting considers the ./target/ directory, which contains cloned code and thus should not be considered.

I had the same problem while implementing the Jaeger exporter, I had to add exceptions to the different checkers to solve that. Maybe you can take some inspiration from it: #174.

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding there are three different places where to handle the case when the incoming request doesn't contain valid trace information:

  1. The propagator (proposed in this PR).
  2. The integration (proposed in ext/wsgi: use current span when extracting fails #226).
  3. SDK (what @Oberon00 is proposing).

I'm not sure at this point what is the best place to do that, the only thing I can say is that we have to document it so people implementing it understand it's their responsibility to take care of it.

)
return "hello"


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to move this to something under the tests, it could confuse people that is looking at the example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move this, just wanted to re-use existing examples. But I see the argument to not pollute short and sweet ones.

@toumorokoshi
Copy link
Member Author

@Oberon00 I'm fine with moving that to the SDK as long as the API alone does not need to propagate valid tracecontext headers. I believe I was told that was a requirement.

I've started a ticket around how we should deal with invalid spancontext from formatters. Let's discuss that aspect there:

#233

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, great to see that we're using the W3C tests.

You might want to rebase to pick up #229 and make sure this still works as expected.

I agree that verify_tracecontext should move into its own test file, even if it means duplicating code with the example. I also think we should find a way of getting the W3C test code that doesn't involve cloning it again with every run, but I don't know that a git submodule is the best solution.

Are you planning to fix the other test errors before merging?

# clone w3c tracecontext tests
mkdir -p target
rm -rf ./target/trace-context
git clone https://github.com/w3c/trace-context ./target/trace-context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: why do it this way instead of adding as a submodule?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think submodules work fine here, but generally track a branch vs a commit. I'll look through and may re-add as a followup, or fix it here depending on approvers.

@@ -0,0 +1,26 @@
#!/bin/bash
# set -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, will re-add.

@@ -0,0 +1,26 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#!/bin/bash
#!/usr/bin/env bash

I think /bin/bash is usually safe, but better to use env in case the user wants a different bash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK only /bin/sh is safe (sh is guaranteed by POSIX to exist)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to take lowest common denominator to reduced potential issues. I'll go with /bin/sh.

# send a sigint, to ensure
# it is caught as a KeyboardInterrupt in the
# example service.
kill -2 $EXAMPLE_SERVER_PID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to kill the development server?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative way is to run the server, spawn a separate unit test, fetch the test result and shutdown the server. I've explored this here and it seems to work well.
With this approach we don't need to wait for indefinite time and later kill the server process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to find a quick way to do this in bash. I can send a standard kill signal which will kill the server, but at the cost of not calling shutdown.

In general I'll try to vet this a little more. Thanks @reyang for the example, I'll switch to a python script if I can't get the bash one fixed up more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c24t for clarification, I've resolved the issue that kept the dev server from being killed (just switched it back to a sigkill). Was there something else you think needed to be improved aside from that?

The approach Reiley linked will not print out the traces as the unit tests enact them, which I found was critical to debugging the issues. The current output intermingles them as shown in https://travis-ci.org/open-telemetry/opentelemetry-python/jobs/604220488#L1083

tox.ini Outdated
basepython: python3.7
deps =
# needed for tracecontext
aiohttp~=3.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aiohttp~=3.6
aiohttp~=3.6

tox.ini Outdated
pip install -e {toxinidir}/ext/opentelemetry-ext-http-requests
pip install -e {toxinidir}/ext/opentelemetry-ext-wsgi

commands =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
commands =
commands =

@c24t
Copy link
Member

c24t commented Oct 22, 2019

I'm still seeing these errors:

======================================================================
ERROR: test_tracestate_ows_handling (__main__.TraceContextTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 640, in test_tracestate_ows_handling
    ['tracestate', 'foo=1 '],
  File "test.py", line 113, in make_single_request_and_get_tracecontext
    return (self.get_traceparent(headers), self.get_tracestate(headers))
  File "test.py", line 108, in get_tracestate
    tracestate.from_string(value)
  File "/Users/libc/src/opentelemetry-python/target/trace-context/test/tracecontext/tracestate.py", line 54, in from_string
    raise ValueError('illegal key-value format {!r}'.format(member))
ValueError: illegal key-value format 'foo=1 '

======================================================================
FAIL: test_multiple_requests_with_illegal_traceparent (__main__.AdvancedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 847, in test_multiple_requests_with_illegal_traceparent
    self.assertEqual(len(parent_ids), 3)
AssertionError: 1 != 3

======================================================================
FAIL: test_multiple_requests_with_valid_traceparent (__main__.AdvancedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 818, in test_multiple_requests_with_valid_traceparent
    self.assertEqual(len(parent_ids), 3)
AssertionError: 1 != 3

======================================================================
FAIL: test_multiple_requests_without_traceparent (__main__.AdvancedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 831, in test_multiple_requests_without_traceparent
    self.assertEqual(len(parent_ids), 3)
AssertionError: 1 != 3

----------------------------------------------------------------------
Ran 40 tests in 1.009s

FAILED (failures=3, errors=1)

type(self).__name__,
format_trace_id(self.trace_id),
format_span_id(self.span_id),
repr(self.trace_state),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"trace_state={!r}".format(self.trace_state)

@@ -1,5 +1,4 @@
# Copyright 2019, OpenTelemetry Authors
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to remove this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably my fat finger, will remove.

}
trap on-shutdown EXIT
cd ./target/trace-context/test
python test.py http://127.0.0.1:5000/verify-tracecontext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line before EOF

@Oberon00 Oberon00 dismissed their stale review October 23, 2019 08:43

Dismissing my review, as I don't have time to follow this at the moment

@toumorokoshi toumorokoshi force-pushed the feature/tracecontext-integration-test branch from 791baaf to bdb607b Compare October 25, 2019 05:02
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great to me that the tracecontext test is now integrated.

I left some general comments.

@@ -14,6 +14,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, weird that linting didn't catch that

@@ -44,7 +46,9 @@


@app.route("/")
def hello():
def index():
"""An example which starts a span within the span created for

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice comment but I think it's not related to this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously added it when the validation server was joined. I've rebased in c24t's changes so this will disappear.

# typing.Dict's update is not recognized by pylint:
# https://github.com/PyCQA/pylint/issues/2420
tracestate[key] = value # pylint:disable=E1137
if len(tracestate) > _TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this check be moved inside the loop by using a counter?
Having the check outside means the parsing of all headers is done before. Maybe this could be a treat for DoS attacks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably DoS could be achieved other ways, like adding a huge list of values that need to be parsed by regex. but agreed this could be a quick shortcut out.

@@ -62,6 +62,7 @@
"""

import enum
import random

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird.. shouldn't linting have caught that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove.

},
)
self.assertEqual(span_context, trace.INVALID_SPAN_CONTEXT)
self.assertNotEqual(span_context.span_id, "1234567890123456")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this assertion is not wrong, it is implicit in the previous one.

@@ -213,3 +169,31 @@ def test_propagate_invalid_context(self):
output = {} # type:typing.Dict[str, str]
FORMAT.inject(trace.INVALID_SPAN_CONTEXT, dict.__setitem__, output)
self.assertFalse("traceparent" in output)

def test_tracestate_empty_header(self):
"""Do not propagate invalid trace context.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry copy-pasted boilerplate. will fix.

@@ -301,7 +300,6 @@ def set_status(self, status: trace_api.Status) -> None:

def generate_span_id() -> int:
"""Get a new random span ID.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to remove these empty lines?

from opentelemetry.ext import http_requests
from opentelemetry.ext.wsgi import OpenTelemetryMiddleware
from opentelemetry.sdk.trace import Tracer
from opentelemetry.sdk.trace.export import (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the consol exporter used for?
To have debug information in case the tests fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. If you look at the output of the build, it actually logs the trace information as the test is executing, which was really helpful when it's a bit harder to pdb into separate processes.

@@ -0,0 +1,75 @@
#!/usr/bin/env python3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts about the location of this file. Shouldn't it be on opentelemetry-api/tests/context/propagation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a good place, but I felt it wasn't accurate as currently the tracecontext test suite would fail without SDK behavior (specifically creating new spans when a spancontext is invalid from the propagator).

Verifying that our tracecontext is compliant with the w3c
tracecontext reference is valuable. Adding a tox command
to verify that the TraceContext propagator adheres to the
w3c spec.
As the tracecontext spec calls for the creation of new, valid
spans in the case of recieving invalid data from headers,
it is necessary to have functions that generate valid span
and trace ids.
…m ones.

This fixes all the errors, leaving 6 failures.
The tracecontexthttptextformat now adheres completely to the w3c
tracecontext test suite.

moving the test endpoint to a non-root, to
ensure that the basic example is clear.

Adding unit tests to test_tracecontexhttptextformat
that were helpful.
moving the generate span / trace id methods back to API. no longer needed due to open-telemetry#235

moving test service to it's own module.

modifying shell script to use bourne shell, using posix standard location
Ensuring resources installed to the target directory are not included
in style and linting.

Modifying tox invocation to include python version to ensure it's
called by travis-ci.

Fixing tests that are no longer valid due to previous changes (
tracecontext returning INVALID_SPAN, start_as_current_span called)
@toumorokoshi toumorokoshi force-pushed the feature/tracecontext-integration-test branch from 8d24e0a to 16a95d5 Compare October 28, 2019 20:50
@toumorokoshi toumorokoshi force-pushed the feature/tracecontext-integration-test branch from 16a95d5 to 2bf7d12 Compare October 28, 2019 20:54
Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I think there's still some cleanup to do in the tox file and script (in particular, running the test server as #228 (comment)), but the tests are valuable enough that I think we ought to merge them in and follow up with improvements.

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I think further improvements can be done in follow up PRs so we can have it for our today's release.

@toumorokoshi
Copy link
Member Author

@mauriciovasquezbernal it looks like you approved the changes, but not the full PR. can you approve the full PR? Currently only c24t is listed as approved.

@mauriciovasquezbernal
Copy link
Member

@toumorokoshi I am not an approver on this repo so mine doesn't count. I just marked it as approved to signal that it looks good to me.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants