Make GlobalTracer uses more threadsafe #186

ribrewguy · 2017-09-22T16:17:31Z

I have a multithreaded app which causes some unpredictable behaviors when trying to get a globally registered tracer. A specific complication appears to have some relationship to byteman when using byteman rules. I have found that sometimes I call GlobalTracer.get() and receive a noop, while other times I get a tracer initialized from some byteman execution. I attempted to resolve this by conditionally registering the tracer myself:

        Tracer tracer;
        if (GlobalTracer.isRegistered()) {
            tracer = GlobalTracer.get();
        } else {
            tracer = TracerResolver.resolveTracer();
            GlobalTracer.register(tracer);
        }

However, this was affected by a race condition somewhere. I was able to resolve it by synchronizing on the GlobalTracer.class:

        Tracer tracer;
        synchronized (GlobalTracer.class) {
            if (GlobalTracer.isRegistered()) {
                tracer = GlobalTracer.get();
            } else {
                tracer = TracerResolver.resolveTracer();
                GlobalTracer.register(tracer);
            }
        }

but this seems like something that would be nice to incorporate into the API itself. Note that this still causes the agent to log an exception:

! java.lang.IllegalStateException: There is already a current global Tracer registered.
! at io.opentracing.util.GlobalTracer.register(GlobalTracer.java:92)
! at io.opentracing.contrib.agent.OpenTracingHelper.initTracer(OpenTracingHelper.java:80)

but it appears to work anyway.

I'll submit a method proposal for streamlining this shortly.

The text was updated successfully, but these errors were encountered:

ribrewguy · 2017-09-22T16:20:57Z

It's rough, but here's the general idea: ribrewguy@42faf24

wu-sheng · 2017-09-23T00:45:43Z

@RediJedi By using either public static synchronized Tracer getOrRegister(...) and synchronized (GlobalTracer.class), I can resolve the race condition, but you are facing the performance problem. Every time you try to get the tracer, you must get the Lock first, one way or another.

I prefer the current API. And for your circumstance, you can try to initial/resolver/register the GlobalTracer first. For a multithreaded app, I think you need multi threads in providing service or business logic, but you can use single thread to process the tracer initialization stage.

ribrewguy · 2017-09-23T01:20:07Z

While you're right, I'd venture to guess that many non trivial apps these days operate in such an environment that it may be unclear where, if ever, the global tracer had been set. In these cases, if you want your system to not crash with the above error, you have little choice to do otherwise. I'd be willing to bet that in the number of cases you would end up synchronizing the impact would be minimal. If you wanted to I suppose you could double check lock it for better average performance too. But again, I'm betting the real world impact would be negligible for the trade off of stability.

In the end, not having something like this in the API forces users to code around it which is not insurmountable, but will no doubt catch folk unaware.

yurishkuro · 2017-09-23T01:28:57Z

@RediJedi what's an example where the main() function of the application does not have control over the order of initialization?

wu-sheng · 2017-09-23T01:31:31Z

I think double check lock would be better for both resolved your concern and my performance concern, right? May be you can send a pr about that, and the team can discuss more.

BTW, impact is only a big deal for high throughputs server, but only for every app. But this kind of app depends on tracing very much. Please understand my concern. Thanks.

wu-sheng · 2017-09-23T01:33:37Z

what's an example where the main() function of the application does not have control over the order of initialization?

@yurishkuro I think @RediJedi 's opinion is, how to use is not clear, so he try to resolve this in the way of implementation codes.

yurishkuro · 2017-09-23T01:43:02Z

that doesn't really explain the issue to me. I prefer not to jump to quick-fix solutions without knowing what the issue is. "something to do with byteman" - what exactly? Byteman is used in javaagent, but javaagent's role should be limited to instrumenting the frameworks, not initializing the global tracer. It seems there should always be a way to control the order of initialization of the app.

ribrewguy · 2017-09-23T02:08:28Z

You're not wrong. No one should jump to quick fixes. As you are no doubt aware debugging multithreaded applications can be difficult. What I have observed through numerous trials is that in my Dropwizard application, I can initialize the tracer using TraceResolver.resolveTracer and then set that into GlobalTracer during Dropwizard's initialization phase and sometimes this will work with the OpenTracingHelper printing a recoverable stack trace regarding the global tracer having already been set, but other times the byteman instrumentation, through the OpenTracingHelper, has already set the global reference by the time I attempt to. This causes the Dropwizard app to bomb out irrecoverably.

In the larger context the problem is evident when creating components for use throughout a large code base. When creating Dropwizard bundles - tho this would apply to spring libraries or any "module" where the behavior of the caller is unknown - that you want to trace you need to get the global tracer so that all spans are nested correctly. However you cannot be certain as to whether that has already been initialized by simply calling isRegistered in some cases because in a larger system you cannot be sure who got to it first.

I believe this is what I'm experiencing based on the unpredictable nature of the behavior I'm observing. I've run dozens of tests and could not get it to run properly without the sync block. So it is a strong correlation for my application.

Again, I think in the larger context of applications that are built out of many components each having their own view of the world it would be nice to have a safer option to retrieve/set the global tracer built into the API.

Alternatively, I first explored the option of ensuring that the global tracer was initialized with the implementation from TracerResolver.resolveTracer, but the project dependency direction was suboptimal. Having a working tracer return from the get would have prevented me from even needing to set it myself and subsequently finding difficulties doing so in a multi-threaded world. The noop was an "interesting" choice there. It failed the test of least surprise at any rate. Because it doesn't error or log any obvious warnings I had no idea it wouldn't do what I expected: give me an existing, or initialize a new, tracer to use globally via my implementation. But that's somewhat tangential to this issue.

yurishkuro · 2017-09-23T03:28:42Z

However you cannot be certain as to whether that has already been initialized by simply calling isRegistered in some cases because in a larger system you cannot be sure who got to it first.

That sounds a lot like a mix-up of responsibilities. Instrumentation has no business checking for or registering the tracer, that's the job of the main() function and/or whatever initialization process/framework the application uses. But I have to admit, I don't know well the lifecycle of the bytecode manipulation done by the bytemap.

objectiser · 2017-09-23T10:06:13Z

Resolving the tracer inside the agent is a convenience for when the application has no tracing logic. The only way it has to obtain the tracer is via the TracerResolver mechanism.

Therefore the other option, if the application wishes to take control of initialising the tracer, is to not use the TracerResolver mechanism - and therefore the agent will just make use of any tracer that has been explicitly registered by the application.

@RediJedi Just checking - did you try not registering the tracer in your code, and just relying on the initialisation via the agent? Did that cause any issues with tracing the service invocations?

ribrewguy · 2017-09-25T12:33:55Z

@objectiser yes I did. The behavior was inconsistent. When not initializing the tracer it would sometimes work and others not. So I removed it and that would sometimes work and sometimes not. I assumed there was a race condition somewhere.

I took note of the classes taking part in my process and while investigating the OpenTracingHelper I found that it would init a global tracer if none was present. I noted that while the GlobalTracer.isRegistered and GlobalTracer.register methods are already synchronized, there is no guarantee that you can safely call the latter after the former in any environment unless it is completely singly threaded or you know where every interaction with OT is. Frankly, I do not believe these are reasonably expectations in apps these days - at least not apps using Rx-this, reactive-that, futures, etc. through multiple modules developed across teams or by third parties.

I noted that a sync on the class would ensure a reliable check and set. I tested it and now I see reliable behavior. I made this new method suggestion because I believe my use case is common enough that the API should recognize that it can occur, and should provide developers with a means to address it in a simple way that doesn't require knowing as much about the internals of the GlobalTracer.

To expand on that last point: In this case I knew from my IDE that the static methods in question were synced so I knew they would use the class as the lock and I synced accordingly. However, if these methods chose to use an internal lock object, then I would have had to look into the source to get an understanding of how I could correctly block this race condition. I think we can all agree that is undesirable. The developers using the API should not need to concern themselves with the internals. So given this method, a simple javadoc on the existing methods or in the class usage docs regarding the possibility of race conditions and a reference to the new method that would allow them to handle it would be developer-friendly, IMO.

sjoerdtalsma · 2017-09-25T13:09:22Z

Would the following implementation resolve this problem without having to resort to public static synchronized Tracer getOrRegister(...) ?

public static synchronized boolean registerIfAbsent(Tracer t) {
    if (!isRegistered()) {
        register(t);
        return true;
    } else {
        return false;
    }
}

ribrewguy · 2017-09-25T13:28:54Z

@sjoerdtalsma that would work - it's basically the other side of the same coin. Depends on preferred semantics, I guess. As far as the locking, that's an orthogonal concern. In either case the possible race condition needs to be handled. The question is how it's handled. I've updated my proposed method to use the double-check locking: ribrewguy@d10d91e.

I'd like to note that although some concern has been raised about the synchronized nature of the proposed process, both isRegistered and register are already synchronized. While it's true that holding the sync block open a tiny bit longer may have some overhead versus calling the two separate sync'd methods alone (tho 2 sync calls is probably slower than 1), we're really majoring in the minors here.

sjoerdtalsma · 2017-09-25T18:08:14Z

Having the get unsynchronized is more important I guess.

carlosalberto · 2018-04-13T13:06:19Z

Trying to re-take this issue: sounds like maybe @sjoerdtalsma's take with registerIfAbsent() is the 'safe' way to go here? To me, as an external observer, registering a Tracer could happen multiple times, and this case would be perfectly covered with the mentioned call.

And I'm afraid that getOrRegister() may be abused - usually you would need to do a registration + immediate usage of the Tracer only in a few places, making this an uncommon scenario.

If this could work, I could cook a PR. Let me know.

sjoerdtalsma · 2018-04-13T13:13:18Z

Just a thought, with java 8 being common nowadays, would the following make sense?

public interface TracerSupplier {
    Tracer supply();
}

public static synchronized boolean registerIfAbsent(TracerSupplier supplier) {
    if (supplier != null && !isRegistered()) {
        register(supplier.supply());
        return true;
    } else {
        return false;
    }
}

public static boolean registerIfAbsent(final Tracer t) {
    return registerIfAbsent(new TracerSupplier() {
        public Tracer supply() {
            return t;
        }
    });
}

That way actually initializing a tracer could be put in a lambda like:

GlobalTracer.registerIfAbsent(() -> TracerResolver.resolveTracer());

or even

GlobalTracer.registerIfAbsent(TracerResolver::resolveTracer);

sjoerdtalsma · 2018-04-13T13:36:58Z

Also, I think register() should be deprecated in this case because semantically it should only be called if absent (i.e. the new method).
Then, for 1.0.0 we can make it a private method.

sjoerdtalsma · 2018-04-13T13:39:58Z

@carlosalberto @RediJedi should I wrap up the above in a quick PR and continue the discussion there?

As discussed in opentracing#186 replace `void register(Tracer)` with a more atomic `boolean registerIfAbsent(Tracer)` method.

carlosalberto · 2018-04-13T15:53:24Z

@sjoerdtalsma Sounds great to me!

sjoerdtalsma · 2018-04-13T16:04:01Z

Please feel free to comment on #269 of course

sjoerdtalsma · 2018-06-06T06:04:35Z

This issue is several months old, and review of #269 seems to have reached a consensus.
@RediJedi It may be a good idea to comment on the PR if you want this merged.

@deprecated

* Make GlobalTracer uses more threadsafe As discussed in #186 replace `void register(Tracer)` with a more atomic `boolean registerIfAbsent(Tracer)` method. * Throw NPE (with message) for registerIfAbsent(null) calls * Add '@deprecated' annotation + replace nested if by '&&'. * Remove registerIfAbsent(Tracer) as unwanted duplicate * Replace TracerSupplier interface with Callable<Tracer> * Document thrown exceptions * Combine javadoc for checked + unchecked exceptions from provider * Wording for rethrown exceptions from provider * Add brackets around 'if' statement * Add unit tests for registerIfAbsent + restore tests for register methods * Clarify wording used in unit test

carlosalberto · 2018-07-13T01:21:53Z

@RediJedi @sjoerdtalsma It looks like we should close this issue now that #269 has been merged. I will close it by next week if you guys are ok with it (or we could close it sooner ;) )

sjoerdtalsma · 2018-07-13T04:21:58Z

I’m ok with closing this

…

Op 13 jul. 2018 om 03:21 heeft Carlos Alberto Cortez ***@***.***> het volgende geschreven: @RediJedi @sjoerdtalsma It looks like we should close this issue now that #269 has been merged. I will close it by next week if you guys are ok with it (or we could close it sooner ;) ) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pavolloffay · 2018-07-13T08:21:31Z

I will close it and open a new one for adding a registerIfAbsent(Tracer tracer)

carlosalberto self-assigned this Apr 10, 2018

sjoerdtalsma mentioned this issue Apr 13, 2018

Make GlobalTracer uses more threadsafe #269

Merged

pavolloffay closed this as completed Jul 13, 2018

pavolloffay mentioned this issue Jul 13, 2018

Add registerIfAbsent(Tracer tracer) to GlobalTracer #286

Closed

pavolloffay mentioned this issue Apr 26, 2019

Add a global registry open-telemetry/opentelemetry-java#218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make GlobalTracer uses more threadsafe #186

Make GlobalTracer uses more threadsafe #186

ribrewguy commented Sep 22, 2017

ribrewguy commented Sep 22, 2017

wu-sheng commented Sep 23, 2017

ribrewguy commented Sep 23, 2017

yurishkuro commented Sep 23, 2017

wu-sheng commented Sep 23, 2017

wu-sheng commented Sep 23, 2017

yurishkuro commented Sep 23, 2017

ribrewguy commented Sep 23, 2017

yurishkuro commented Sep 23, 2017

objectiser commented Sep 23, 2017 •

edited

Loading

ribrewguy commented Sep 25, 2017

sjoerdtalsma commented Sep 25, 2017 •

edited

Loading

ribrewguy commented Sep 25, 2017

sjoerdtalsma commented Sep 25, 2017

carlosalberto commented Apr 13, 2018

sjoerdtalsma commented Apr 13, 2018 •

edited

Loading

sjoerdtalsma commented Apr 13, 2018

sjoerdtalsma commented Apr 13, 2018

carlosalberto commented Apr 13, 2018

sjoerdtalsma commented Apr 13, 2018

sjoerdtalsma commented Jun 6, 2018

carlosalberto commented Jul 13, 2018

sjoerdtalsma commented Jul 13, 2018 via email

pavolloffay commented Jul 13, 2018

Make GlobalTracer uses more threadsafe #186

Make GlobalTracer uses more threadsafe #186

Comments

ribrewguy commented Sep 22, 2017

ribrewguy commented Sep 22, 2017

wu-sheng commented Sep 23, 2017

ribrewguy commented Sep 23, 2017

yurishkuro commented Sep 23, 2017

wu-sheng commented Sep 23, 2017

wu-sheng commented Sep 23, 2017

yurishkuro commented Sep 23, 2017

ribrewguy commented Sep 23, 2017

yurishkuro commented Sep 23, 2017

objectiser commented Sep 23, 2017 • edited Loading

ribrewguy commented Sep 25, 2017

sjoerdtalsma commented Sep 25, 2017 • edited Loading

ribrewguy commented Sep 25, 2017

sjoerdtalsma commented Sep 25, 2017

carlosalberto commented Apr 13, 2018

sjoerdtalsma commented Apr 13, 2018 • edited Loading

sjoerdtalsma commented Apr 13, 2018

sjoerdtalsma commented Apr 13, 2018

carlosalberto commented Apr 13, 2018

sjoerdtalsma commented Apr 13, 2018

sjoerdtalsma commented Jun 6, 2018

carlosalberto commented Jul 13, 2018

sjoerdtalsma commented Jul 13, 2018 via email

pavolloffay commented Jul 13, 2018

objectiser commented Sep 23, 2017 •

edited

Loading

sjoerdtalsma commented Sep 25, 2017 •

edited

Loading

sjoerdtalsma commented Apr 13, 2018 •

edited

Loading