Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Make GlobalTracer uses more threadsafe #186

Closed
ribrewguy opened this issue Sep 22, 2017 · 24 comments
Closed

Make GlobalTracer uses more threadsafe #186

ribrewguy opened this issue Sep 22, 2017 · 24 comments
Assignees

Comments

@ribrewguy
Copy link

I have a multithreaded app which causes some unpredictable behaviors when trying to get a globally registered tracer. A specific complication appears to have some relationship to byteman when using byteman rules. I have found that sometimes I call GlobalTracer.get() and receive a noop, while other times I get a tracer initialized from some byteman execution. I attempted to resolve this by conditionally registering the tracer myself:

        Tracer tracer;
        if (GlobalTracer.isRegistered()) {
            tracer = GlobalTracer.get();
        } else {
            tracer = TracerResolver.resolveTracer();
            GlobalTracer.register(tracer);
        }

However, this was affected by a race condition somewhere. I was able to resolve it by synchronizing on the GlobalTracer.class:

        Tracer tracer;
        synchronized (GlobalTracer.class) {
            if (GlobalTracer.isRegistered()) {
                tracer = GlobalTracer.get();
            } else {
                tracer = TracerResolver.resolveTracer();
                GlobalTracer.register(tracer);
            }
        }

but this seems like something that would be nice to incorporate into the API itself. Note that this still causes the agent to log an exception:

! java.lang.IllegalStateException: There is already a current global Tracer registered.
! at io.opentracing.util.GlobalTracer.register(GlobalTracer.java:92)
! at io.opentracing.contrib.agent.OpenTracingHelper.initTracer(OpenTracingHelper.java:80)

but it appears to work anyway.

I'll submit a method proposal for streamlining this shortly.

@ribrewguy
Copy link
Author

It's rough, but here's the general idea: ribrewguy@42faf24

@wu-sheng
Copy link
Member

@RediJedi By using either public static synchronized Tracer getOrRegister(...) and synchronized (GlobalTracer.class), I can resolve the race condition, but you are facing the performance problem. Every time you try to get the tracer, you must get the Lock first, one way or another.

I prefer the current API. And for your circumstance, you can try to initial/resolver/register the GlobalTracer first. For a multithreaded app, I think you need multi threads in providing service or business logic, but you can use single thread to process the tracer initialization stage.

@ribrewguy
Copy link
Author

While you're right, I'd venture to guess that many non trivial apps these days operate in such an environment that it may be unclear where, if ever, the global tracer had been set. In these cases, if you want your system to not crash with the above error, you have little choice to do otherwise. I'd be willing to bet that in the number of cases you would end up synchronizing the impact would be minimal. If you wanted to I suppose you could double check lock it for better average performance too. But again, I'm betting the real world impact would be negligible for the trade off of stability.

In the end, not having something like this in the API forces users to code around it which is not insurmountable, but will no doubt catch folk unaware.

@yurishkuro
Copy link
Member

@RediJedi what's an example where the main() function of the application does not have control over the order of initialization?

@wu-sheng
Copy link
Member

I think double check lock would be better for both resolved your concern and my performance concern, right? May be you can send a pr about that, and the team can discuss more.

BTW, impact is only a big deal for high throughputs server, but only for every app. But this kind of app depends on tracing very much. Please understand my concern. Thanks.

@wu-sheng
Copy link
Member

what's an example where the main() function of the application does not have control over the order of initialization?

@yurishkuro I think @RediJedi 's opinion is, how to use is not clear, so he try to resolve this in the way of implementation codes.

@yurishkuro
Copy link
Member

that doesn't really explain the issue to me. I prefer not to jump to quick-fix solutions without knowing what the issue is. "something to do with byteman" - what exactly? Byteman is used in javaagent, but javaagent's role should be limited to instrumenting the frameworks, not initializing the global tracer. It seems there should always be a way to control the order of initialization of the app.

@ribrewguy
Copy link
Author

You're not wrong. No one should jump to quick fixes. As you are no doubt aware debugging multithreaded applications can be difficult. What I have observed through numerous trials is that in my Dropwizard application, I can initialize the tracer using TraceResolver.resolveTracer and then set that into GlobalTracer during Dropwizard's initialization phase and sometimes this will work with the OpenTracingHelper printing a recoverable stack trace regarding the global tracer having already been set, but other times the byteman instrumentation, through the OpenTracingHelper, has already set the global reference by the time I attempt to. This causes the Dropwizard app to bomb out irrecoverably.

In the larger context the problem is evident when creating components for use throughout a large code base. When creating Dropwizard bundles - tho this would apply to spring libraries or any "module" where the behavior of the caller is unknown - that you want to trace you need to get the global tracer so that all spans are nested correctly. However you cannot be certain as to whether that has already been initialized by simply calling isRegistered in some cases because in a larger system you cannot be sure who got to it first.

I believe this is what I'm experiencing based on the unpredictable nature of the behavior I'm observing. I've run dozens of tests and could not get it to run properly without the sync block. So it is a strong correlation for my application.

Again, I think in the larger context of applications that are built out of many components each having their own view of the world it would be nice to have a safer option to retrieve/set the global tracer built into the API.

Alternatively, I first explored the option of ensuring that the global tracer was initialized with the implementation from TracerResolver.resolveTracer, but the project dependency direction was suboptimal. Having a working tracer return from the get would have prevented me from even needing to set it myself and subsequently finding difficulties doing so in a multi-threaded world. The noop was an "interesting" choice there. It failed the test of least surprise at any rate. Because it doesn't error or log any obvious warnings I had no idea it wouldn't do what I expected: give me an existing, or initialize a new, tracer to use globally via my implementation. But that's somewhat tangential to this issue.

@yurishkuro
Copy link
Member

However you cannot be certain as to whether that has already been initialized by simply calling isRegistered in some cases because in a larger system you cannot be sure who got to it first.

That sounds a lot like a mix-up of responsibilities. Instrumentation has no business checking for or registering the tracer, that's the job of the main() function and/or whatever initialization process/framework the application uses. But I have to admit, I don't know well the lifecycle of the bytecode manipulation done by the bytemap.

@objectiser
Copy link
Contributor

objectiser commented Sep 23, 2017

Resolving the tracer inside the agent is a convenience for when the application has no tracing logic. The only way it has to obtain the tracer is via the TracerResolver mechanism.

Therefore the other option, if the application wishes to take control of initialising the tracer, is to not use the TracerResolver mechanism - and therefore the agent will just make use of any tracer that has been explicitly registered by the application.

@RediJedi Just checking - did you try not registering the tracer in your code, and just relying on the initialisation via the agent? Did that cause any issues with tracing the service invocations?

@ribrewguy
Copy link
Author

@objectiser yes I did. The behavior was inconsistent. When not initializing the tracer it would sometimes work and others not. So I removed it and that would sometimes work and sometimes not. I assumed there was a race condition somewhere.

I took note of the classes taking part in my process and while investigating the OpenTracingHelper I found that it would init a global tracer if none was present. I noted that while the GlobalTracer.isRegistered and GlobalTracer.register methods are already synchronized, there is no guarantee that you can safely call the latter after the former in any environment unless it is completely singly threaded or you know where every interaction with OT is. Frankly, I do not believe these are reasonably expectations in apps these days - at least not apps using Rx-this, reactive-that, futures, etc. through multiple modules developed across teams or by third parties.

I noted that a sync on the class would ensure a reliable check and set. I tested it and now I see reliable behavior. I made this new method suggestion because I believe my use case is common enough that the API should recognize that it can occur, and should provide developers with a means to address it in a simple way that doesn't require knowing as much about the internals of the GlobalTracer.

To expand on that last point: In this case I knew from my IDE that the static methods in question were synced so I knew they would use the class as the lock and I synced accordingly. However, if these methods chose to use an internal lock object, then I would have had to look into the source to get an understanding of how I could correctly block this race condition. I think we can all agree that is undesirable. The developers using the API should not need to concern themselves with the internals. So given this method, a simple javadoc on the existing methods or in the class usage docs regarding the possibility of race conditions and a reference to the new method that would allow them to handle it would be developer-friendly, IMO.

@sjoerdtalsma
Copy link
Contributor

sjoerdtalsma commented Sep 25, 2017

Would the following implementation resolve this problem without having to resort to public static synchronized Tracer getOrRegister(...) ?

public static synchronized boolean registerIfAbsent(Tracer t) {
    if (!isRegistered()) {
        register(t);
        return true;
    } else {
        return false;
    }
}

@ribrewguy
Copy link
Author

@sjoerdtalsma that would work - it's basically the other side of the same coin. Depends on preferred semantics, I guess. As far as the locking, that's an orthogonal concern. In either case the possible race condition needs to be handled. The question is how it's handled. I've updated my proposed method to use the double-check locking: ribrewguy@d10d91e.

I'd like to note that although some concern has been raised about the synchronized nature of the proposed process, both isRegistered and register are already synchronized. While it's true that holding the sync block open a tiny bit longer may have some overhead versus calling the two separate sync'd methods alone (tho 2 sync calls is probably slower than 1), we're really majoring in the minors here.

@sjoerdtalsma
Copy link
Contributor

Having the get unsynchronized is more important I guess.

@carlosalberto carlosalberto self-assigned this Apr 10, 2018
@carlosalberto
Copy link
Collaborator

Trying to re-take this issue: sounds like maybe @sjoerdtalsma's take with registerIfAbsent() is the 'safe' way to go here? To me, as an external observer, registering a Tracer could happen multiple times, and this case would be perfectly covered with the mentioned call.

And I'm afraid that getOrRegister() may be abused - usually you would need to do a registration + immediate usage of the Tracer only in a few places, making this an uncommon scenario.

If this could work, I could cook a PR. Let me know.

@sjoerdtalsma
Copy link
Contributor

sjoerdtalsma commented Apr 13, 2018

Just a thought, with java 8 being common nowadays, would the following make sense?

public interface TracerSupplier {
    Tracer supply();
}

public static synchronized boolean registerIfAbsent(TracerSupplier supplier) {
    if (supplier != null && !isRegistered()) {
        register(supplier.supply());
        return true;
    } else {
        return false;
    }
}

public static boolean registerIfAbsent(final Tracer t) {
    return registerIfAbsent(new TracerSupplier() {
        public Tracer supply() {
            return t;
        }
    });
}

That way actually initializing a tracer could be put in a lambda like:

GlobalTracer.registerIfAbsent(() -> TracerResolver.resolveTracer());

or even

GlobalTracer.registerIfAbsent(TracerResolver::resolveTracer);

@sjoerdtalsma
Copy link
Contributor

Also, I think register() should be deprecated in this case because semantically it should only be called if absent (i.e. the new method).
Then, for 1.0.0 we can make it a private method.

@sjoerdtalsma
Copy link
Contributor

@carlosalberto @RediJedi should I wrap up the above in a quick PR and continue the discussion there?

sjoerdtalsma added a commit to sjoerdtalsma/opentracing-java that referenced this issue Apr 13, 2018

Verified

This commit was signed with the committer’s verified signature.
sjoerdtalsma Sjoerd Talsma
As discussed in opentracing#186 replace `void register(Tracer)` with a more
atomic `boolean registerIfAbsent(Tracer)` method.
@carlosalberto
Copy link
Collaborator

@sjoerdtalsma Sounds great to me!

@sjoerdtalsma
Copy link
Contributor

Please feel free to comment on #269 of course

@sjoerdtalsma
Copy link
Contributor

This issue is several months old, and review of #269 seems to have reached a consensus.
@RediJedi It may be a good idea to comment on the PR if you want this merged.

pavolloffay pushed a commit that referenced this issue Jul 10, 2018
* Make GlobalTracer uses more threadsafe

As discussed in #186 replace `void register(Tracer)` with a more
atomic `boolean registerIfAbsent(Tracer)` method.

* Throw NPE (with message) for registerIfAbsent(null) calls

* Add '@deprecated' annotation + replace nested if by '&&'.

* Remove registerIfAbsent(Tracer) as unwanted duplicate

* Replace TracerSupplier interface with Callable<Tracer>

* Document thrown exceptions

* Combine javadoc for checked + unchecked exceptions from provider

* Wording for rethrown exceptions from provider

* Add brackets around 'if' statement

* Add unit tests for registerIfAbsent + restore tests for register methods

* Clarify wording used in unit test
@carlosalberto
Copy link
Collaborator

@RediJedi @sjoerdtalsma It looks like we should close this issue now that #269 has been merged. I will close it by next week if you guys are ok with it (or we could close it sooner ;) )

@sjoerdtalsma
Copy link
Contributor

sjoerdtalsma commented Jul 13, 2018 via email

@pavolloffay
Copy link
Member

I will close it and open a new one for adding a registerIfAbsent(Tracer tracer)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants