Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add general timeout mechanism in the export layer #385

Conversation

dgzlopes
Copy link
Contributor

@dgzlopes dgzlopes commented Jan 28, 2020

This PR implements:

  • A general timeout mechanism in utils.py using signal. Example usage:
from opentelemetry.sdk.util import timeout_in_seconds
try:
  with timeout_in_seconds(5):
    func_example()
except TimeoutError:
  print('func_example took too long to complete')

Added set_timeout_signal_handler() for it's usage on multithreaded code (all signal handler functions must be set in the main thread).

  • Timeouts to exports in SimpleExportSpanProcessor() and BatchExportSpanProcessor().

Closes #346

Signed-off-by: Daniel González Lopes [email protected]

Signed-off-by: Daniel González Lopes <[email protected]>
Signed-off-by: Daniel González Lopes <[email protected]>
@dgzlopes dgzlopes requested a review from a team January 28, 2020 15:52
@@ -66,18 +70,23 @@ class SimpleExportSpanProcessor(SpanProcessor):
passes ended spans directly to the configured `SpanExporter`.
"""

def __init__(self, span_exporter: SpanExporter):
def __init__(
self, span_exporter: SpanExporter, timeout: int = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to have no timeout as the default timeout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for a default timeout, maybe 60 sec that could return an warning to the user that something's gone wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! 60s sounds like a reasonable time to wait :)

Signed-off-by: Daniel González Lopes <[email protected]>
@codecov-io
Copy link

codecov-io commented Jan 28, 2020

Codecov Report

Merging #385 into master will decrease coverage by 0.16%.
The diff coverage is 70.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #385      +/-   ##
==========================================
- Coverage   85.22%   85.06%   -0.17%     
==========================================
  Files          38       38              
  Lines        1916     1942      +26     
  Branches      226      230       +4     
==========================================
+ Hits         1633     1652      +19     
- Misses        218      224       +6     
- Partials       65       66       +1
Impacted Files Coverage Δ
...sdk/src/opentelemetry/sdk/trace/export/__init__.py 84.48% <60%> (-2.31%) ⬇️
opentelemetry-sdk/src/opentelemetry/sdk/util.py 85.14% <81.25%> (-0.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0173263...8a18978. Read the comment docs.



@contextmanager
def timeout_in_seconds(seconds=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of having a general purpose timeout mechanism is great, but keep in mind that the specification requires export and shutdown to not block indefinitely. This approach (using a @contextmanager to make a context object that is used when export and shutdown are called) puts the responsibility of being specification compliant on the code that does the calling of export and shutdown instead of putting the responsibility on these 2 methods themselves not to block indefinitely.

Please consider making a decorator that is then applied directly to the functions export and shutdown. In this way, they can be called directly and the caller won't have to worry about making them not block indefinitely, something like this:

def timeout(function):

    def inner(*args, timeout=90, **kwargs):

        print(timeout)

        function(*args, **kwargs)

    return inner


@timeout
def export(first, second):
    print(first)
    print(second)


export(1, 2, timeout=9)
9
1
2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporters might have a better way to implement timeouts, e.g. in the zipkin exporter we'd pass a timeout arg to requests.post which sets a timeout on the underlying socket.

I think a general purpose timeout in processor is fine, but exporters should implement timeout logic themselves: #346 (comment).



@contextmanager
def timeout_in_seconds(seconds=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is better if you document the function attribute instead of using the name of the function as documentation for the attribute itself, something like this:

def timeout(time):
    """
    ...
    time: time to wait before timing out, in seconds
    """
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocelotl you and @toumorokoshi can duke it out over hungarian notation. 😄

def set_timeout_signal_handler():
"Signal timeout setter."

def signal_handler_function(signum, frame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a function, there is no need to add _function to its name. The name of the function should describe what the function does, in this case, it raises a TimeoutError.

@@ -26,6 +28,31 @@
from collections import Sequence


def set_timeout_signal_handler():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function defines a inner function and then registers it into signal, but it is being called in an __init__. Keep in mind that you most likely want to do this only once, not every time that an object is initialized. Consider getting rid of this function and just do the registering into signal when this module is imported.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a great idea to have a general-purpose timeout in span processors, but I don't think we should do this with signals. As written, this is going to interfere with applications that use SIGALRM, e.g. by firing the signal after the instrumented app registered a different handler.

You could implement this instead using cancelable Futures or async awaitables.

try:
yield
finally:
signal.alarm(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to be a general-purpose timeout func, shouldn't it reset the old signal handler after cancelling the alarm?

@@ -131,6 +141,9 @@ def __init__(
None
] * self.max_export_batch_size # type: typing.List[typing.Optional[Span]]
self.worker_thread.start()
self.timeout = timeout
# used by general timeout mechanism
set_timeout_signal_handler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're setting the handler here because export runs in a worker thread? But since we initialize the span processor right at startup, this is going to change the SIGALRM handler for the entire life of the process, not just while we're waiting for an export to timeout.

@c24t
Copy link
Member

c24t commented May 21, 2020

We could still use a timeout like this, but not using SIGALRM. Closing for now, we can consider a different implementation later.

@c24t c24t closed this May 21, 2020
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020
* chore: gRPC example using proto-loader

* chore: gRPC example using proto-loader

* fix: review comments

* fix: we use latest OT package versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trace (and all) exporters should have an export timeout
5 participants