Race condition caused by variable scope getting lifted from a multithreaded context #14948

jrevels · 2016-02-05T19:52:14Z

I ran into this very strange bug while exploring the new multithreading stuff. The smallest reproducible case I can come up with:

using Base.Threads

function right(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    @threads for t in 1:nthreads()
        w = ws[t]
        for i in 1:length(v)
            w[i] = v[i]
        end
    end
    w2 = 1 # unused binding to an unused variable name
    return ws
end

function wrong(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    @threads for t in 1:nthreads()
        w = ws[t]
        for i in 1:length(v)
            w[i] = v[i]
        end
    end
    w = 1 # unused binding to the name of the previously used loop variable
    return ws
end

Note that the only difference between the above two functions is w2 = 1 and w = 1, and that neither variable is actually used after being assigned.

Running the above in the REPL:

julia> v = collect(1:3)

julia> right(v) # gives the correct output every time
([1,2,3],[1,2,3],[1,2,3],[1,2,3])

julia> wrong(v)
([4624170704,4658607408,4624170720],[4563189968,4574717712,4563189968],[1,2,3],[1,2,3])

julia> wrong(v)
([1,2,3],[1,2,3],[1,2,3],[4624170800,4658607472,4624170816])

julia> wrong(v)
([1,2,3],[1,2,3],[1,2,3],[1,2,3])

julia> wrong(v)
([1,4658458992,4624197552],[1,2,3],[4624197648,4624197664,4624197680],[1,2,3])

It appears that some iterations of the multithreaded loop in wrong(v) aren't always being executed, or possibly that the binding of the loop variable w to ws[t] is broken. Either way, one can induce the correct behavior by making sure the post-loop binding to 1 doesn't match the loop variable name.

I thought maybe that this was a escaping/scoping bug in _threadsfor, but I didn't see anything that jumped out at me.

julia> versioninfo()
Julia Version 0.5.0-dev+2502
Commit baf336f (2016-02-04 18:01 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-4288U CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1

@mlubin

The text was updated successfully, but these errors were encountered:

yuyichao · 2016-02-05T20:22:10Z

Hmm, without looking into the detail, my guess is that the assignment to w lift the scope of the variable and effectively introduces a race condition between threads (since all of them are modifying and reading the variable) Does it help if you protect the access of w with a lock?

yuyichao · 2016-02-05T20:24:10Z

i.e. something like

function wrong1(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    @threads for t in 1:nthreads()
        for i in 1:length(v)
            w = ws[t]
            w2 = w
            w2[i] = v[i]
        end
    end
    w = 1 # unused binding to the name of the previously used loop variable
    return ws
end

vs

function wrong2(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    lock = SpinLock()
    @threads for t in 1:nthreads()
        for i in 1:length(v)
            lock!(lock)
            w = ws[t]
            w2 = w
            unlock!(lock)
            w2[i] = v[i]
        end
    end
    w = 1 # unused binding to the name of the previously used loop variable
    return ws
end

jrevels · 2016-02-05T20:39:32Z

The lock fixes it, I think you're right about the scope lifting of w. Thus, another way to fix this - without introducing the lock - is by declaring w as local inside the loop. I guess this is semantically consistent with Julia's current scoping behavior, but it means that the current scoping behavior is scary in a multithreaded context.

yuyichao · 2016-02-05T20:52:51Z

If we want the behavior to be as close as possible to a non-threading version, we could make all assignments in a @thread local instead (no matter threading is enabled or not). Hopefully if a loop is declared as @thread it usually shouldn't be to assign to any local variable. However, in case it is actually what the user want (find a unique match in a threaded loop maybe?) I think there should be a way to do that (like python's nonlocal).

This doesn't help if the user is calling another closure that assigning to another local variable in the outer scope but I don't think we can fix that without completely break the scoping rule.

JeffBezanson · 2016-02-05T20:58:39Z

This is actually what the hacky localize_vars thing was introduced for --- to copy bindings so that assignments can't leak outside of a certain block. Originally for the distributed case, where you don't want binding relationships to depend on which machine the code runs on.

kpamnany · 2016-02-05T23:20:27Z

You do want to be able to access shared variables in an outer scope from inside a parallel loop; it's a simple and useful pattern. I don't think we can force local-only assignments.

I think the right answer is to generate a warning when a variable's scope is lifted after there has been an assignment to it. Practical?

JeffBezanson · 2016-02-06T00:38:01Z

What do you mean by "local-only assignments"? For example, I'm not sure what this code is meant to do:

x = 1
@threads for i = 1:n
    x = i
    f(x)
end

In this case, why is it useful to share x among the threads?

yuyichao · 2016-02-06T00:47:00Z

And I think this can be done with sth like,

x = Ref(0)
@threads for i in 1:n
    f(i) && x[] = 1
end

even if we "localize" all the variables in a @threads block.

kpamnany · 2016-02-06T01:01:49Z

By "local-only assignments", I was referring to @yuyichao's suggestion to make all assignments in an @threads block local. I don't think this is the right answer because it is useful to share outer-scope variables among threads. It doesn't seem useful to share x among the threads for the case you mention, but:

x = Atomic()
@threads for i = 1:n
    atomic_add!(x, foo(i))
end
bar(x[])

Or:

x = ConcurrentMap()
@threads for i = 1:n
    if foo(i)
        append!(x, (bar(i), baz(i))) # or something like this
    end
end
# iterate through x doing something

yuyichao · 2016-02-06T01:04:32Z

Both of the code you have does not have assignment in the loop. There's mutations of mutable object but no assignment so they won't be affected at all.

kpamnany · 2016-02-06T01:06:51Z

Given foo(x) returns true for only one element of some collection C, then:

p = false
@threads for i = 1:length(C)
    if foo(C[i])
        p = true
    end
end
if p
# do something

The principle is the same.

yuyichao · 2016-02-06T01:07:14Z

This pattern can be covered with Ref I mentioned above.

kpamnany · 2016-02-06T01:08:09Z

Sure. There are all sorts of ways to get those semantics, but are they clear semantics? My feeling is not.

yuyichao · 2016-02-06T01:16:04Z

What's not clear about this?

I think allowing assignment to local variable by default is not very clear given the example in the original post. Raising a warning based on some guessing doesn't seem too nice either since the actual problem is not about where the variable is defined in the enclosing scope but that there's code that works with single threaded (that reuses a variable in the enclosing scope) but not with multi-threading.

The alternative schematic is that you can't assign to a local variable in a @threads block. That seems pretty clear to me (even though it's different from a normal for loop. OTOH it's closer to a parallel for loop, which is basically what a @threads loop is). On top of this, we only need a way to pass a value from the @threads to the outer-scope, which can either be done with a keyword like python's nonlocal or explicitly with a local mutable object in the outer scope. Comparing these two approaches, I think using a Ref explicitly is more flexible and has a very clear schematics.

kpamnany · 2016-02-06T01:33:51Z

To me, the following two snippets are inconsistent.

x = 0
@threads for i = 1:n
    if foo(i)
        x = threadid()
    end
end
# x = 0

x = Int[0]
@threads for i = 1:n
    if foo(i)
        x[1] = threadid()
    end
end
# x[1] != 0

If there is a well-understood difference between assignment to a variable and mutation via a call among Julia programmers, then I guess this is just me.

Still, the other thing to consider is that the serial elision of the first loop (i.e. remove just @threads) will have different behavior.

yuyichao · 2016-02-06T01:38:13Z

difference between assignment to a variable and mutation via a call

I think this is a very fundamental distinction in the schematics.

Still, the other thing to consider is that the serial elision of the first loop (i.e. remove just @threads) will have different behavior.

It doesn't have too. Same with the behavior on a non-threading build.

kpamnany · 2016-02-06T01:43:45Z

It doesn't have too. Same with the behavior on a non-threading build.

Not sure what you mean? The localization approach does mean that the behavior of the code will be different if you run the loop serially vs. if you run it in parallel.

One of the nicest things about Cilk is serial equivalence (https://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs). I'm not sure we actually have it with Julia threads at the moment, but it is a worthwhile goal.

JeffBezanson · 2016-02-06T01:45:01Z

I think I agree with @kpamnany here; with threads it is possible and efficient to keep the same scope behavior we have now. The only downside is race conditions, but those are possible anyway with assignment to arrays.

I think the only reason an issue was filed here is that a variable can be added to a scope by an assignment that occurs after a use of the variable. I can see why this is confusing, but it's what we've always done. It tends to make a lot of sense in a case like

function f()
   g(x) = h(x)
   h(x) = g(x)
end

yuyichao · 2016-02-06T01:49:59Z

The localization approach does mean that the behavior of the code will be different if you run the loop serially vs. if you run it in parallel.

No, it means that the behavior is different when @threads is added but as long as it's in a @threads block, the behavior will be the same.

I think the only reason an issue was filed here is that a variable can be added to a scope by an assignment that occurs after a use of the variable

I don't agree. There can be code like,

# <...>
x = ... # this is as likely to happen as if x is used as a temporary variable after the loop
# <...>
@threads for ...
    # <...>
    x = ...
    # <...>
end
# don't care about the value of x here

The code uses x as a variable name inside and outside the threading block (maybe because it is a name that makes sense for such thing) and the code works fine when running with a single thread and fails when there's more than one thread.

JeffBezanson · 2016-02-06T01:52:32Z

Yes, it's really easy to write parallel code with race conditions. Nothing new there.

yuyichao · 2016-02-06T01:57:47Z

Sure. I'm fine if we just say this is a user error and we will do nothing about it (and the user should add local to all the variables that will otherwise be accidentally shared).

I just don't think the problem is the order and we shouldn't be adding a warning for such case.

JeffBezanson · 2016-02-06T02:06:39Z

I just don't think the problem is the order

I'm very glad to hear that, because I don't really want to change this :)

kpamnany · 2016-02-06T02:13:10Z

Can you explain this please?

function f()
   g(x) = h(x)
   h(x) = g(x)
end

In the absence of understanding of the above, I still think that lifting a variable's scope after use of that variable (lexically speaking) should cause a warning. There's no guesswork in such a rule, but I guess there are other priorities.

A race condition checker should be on the TODO list somewhere. :-/

JeffBezanson · 2016-02-06T02:15:45Z

@kpamnany In that code h is referenced before it is introduced by assignment, and this is an idiomatic way to write mutually-recursive inner functions.

threads · 2016-02-06T14:07:23Z

Dont use @threads onless you want me in on this guys...

On Sat, Feb 6, 2016, 03:16 Jeff Bezanson [email protected] wrote:

@kpamnany https://github.com/kpamnany In that code h is referenced
before it is introduced by assignment, and this is an idiomatic way to
write mutually-recursive inner functions.

—
Reply to this email directly or view it on GitHub
#14948 (comment).

yuyichao · 2016-02-06T14:09:03Z

@threads Sorry, I guess I forgot to quote it in code block when quoting the reply ...

eschnett · 2016-02-06T15:32:53Z

Why does the code following @threads behave like a block? It really should behave like a function instead. It's a cute syntax that you can just throw an @threads in front of a for loop, but the reality is that this introduces a rather deep change, and one has to be quite careful about which loops remain correct when doing so. As we've learned, it's not even sufficient to only study the content of the loop, and this is deeply surprising even to experts.

We can of course keep the syntax, but we can define that the internal representation corresponds to a lambda expression, so that all local variable assignments remain hidden. This transformation can still be done when multi-threading is disabled; it's actually probably useful for SIMD-vectorization etc. as well.

carnaval · 2016-02-06T16:46:01Z

it is lowered to a lambda and has the exact same scoping behavior. whether this behavior (mutating closed over variables) is a good idea is another question. I think in some cases you do want that.

sirinath · 2016-03-21T12:25:44Z

Just to add my 2c. I think should warn or even prevent on access to shared variable or overlapping array ranges in different threads or any parallel code. If you can deduce outer scope access is mutually exclusive or race condition free by using a lock or CAS operation then only it compile normally. Also the same checking infrastructure can be used in other parallelization constructs.

carnaval · 2016-03-21T15:00:24Z

This problem (deciding dynamic array ranges overlap) is undecidable. Worse, it's hard for all but the simplest cases. You get to pick between a deluge of false positives or a safety feature that almost never triggers.

The solution to this issue must be a consistent rule. It's fine for SMP programming to be hard if you want to do it manually, let's just find a way to avoid mistakes that are avoidable, such as the one in this issue, rather than making a full blown static race condition checker :-)

sirinath · 2016-03-21T15:29:53Z

What every works. Languages like Nim (http://nim-lang.org/) uses a proof assistant to check these conditions. Having said this as you say perhaps you can start with the most common and gradually increase what concurrency errors you might get perhaps using a formal approach so one day nearly every case is checked.

kpamnany · 2016-04-15T20:56:30Z

I think the conclusion here is that this is intended behavior and there are no current plans to change it. If so, are you happy to close this @jrevels?

jrevels · 2016-04-17T16:40:43Z

The consensus here makes sense to me, but I could see this issue being a common pitfall for new users, so I'd like to make sure that it eventually gets a mention in our multithreading documentation (which doesn't yet exist, but I'm assuming will one day once multithreading has stabilized). I could close this and open a new docs issue, if people prefer, otherwise I'll just leave this issue open.

sirinath · 2016-04-18T09:39:33Z

More help from the compiler with an error message when there is a definitive race condition and a warning when there is a potential race condition would be the best way forward. You can add it to the docs but best is to know this when you are compiling and there is a issue with the code.

vtjnash · 2016-07-12T16:55:26Z

Closing since this is by-design and fixed by resolving the race-condition.

andreasnoack · 2016-11-17T15:51:11Z

Combined with our assignment rules in if/else blocks, e.g.

if use_threads
    @threads for i = 1:nthreads()
        Xlocal = copy(X) # because I want a thread local copy of X
    end
else
    Xlocal = X # I don't need a copy when I don't use threads but suddenly `Xlocal` in the other branch is not local anymore.
end

I think the current behavior of @threads will cause some confusion for Julia users that are not very familiar with scoping rules. If that is a group we care about, I think we should reopen this issue.

jrevels · 2016-11-17T20:43:49Z

Even if this behavior doesn't change, this issue was closed despite presenting a still uncompleted action item - that's why I left it open in the first place. I'm assuming that nobody would argue against documenting this behavior explicitly (if this is the behavior we're keeping)? Since that documentation still doesn't exist, I'm going to reopen this.

kpamnany · 2016-11-17T21:30:37Z

@andreasnoack: the comment in your code snippet is truncated, so I'm not sure what the confusion is?

andreasnoack · 2016-11-17T21:55:14Z

@kpamnany Sorry. Have updated the comment. The problem is that the use of Xlocal in the second branch changes the behavior of Xlocal in the first branch.

kpamnany · 2016-11-18T00:02:34Z

OpenMP deals with this by supporting private and shared clauses to indicate which of the variables that are already in scope should have per-thread copies and which should not. But these are the sort of fiddly details that hamper ease of use, and I don't think we should add controls for every such thing.

I believe that once some documentation is in, this will not be very confusing. We can certainly leave the issue open to remind us (me I guess) to write this up.

yuyichao · 2016-11-18T00:21:27Z

We have local, but the main difference between us and C/C++ is that we don't require explicit variable declaration so the scope of a variable is less clear for the reader.

andreasnoack · 2016-11-19T17:52:15Z

In my applications, it seems as if local variable are more common, e.g. the code that made me post here would require ten locals. The solution I ended up with wrapped all of the threaded code in a function like

if use_threads
    fooThreads!(a,b,c,d,e)
else
    fooSeq!(a,b,c,d,e)
end

with

function fooThreads!
    @threads for i in 1:nthreads()
        ...
    end
end

I've ended up with a similar pattern in other use cases of @threads where it helped performance so I'm wondering if the behavior of the @threads for is the most useful possible at the moment.

kpamnany · 2016-11-19T18:22:59Z

I'd try to not have anything like use_threads -- when you run serially, nthreads() is 1, so the code should just work as it does in the multithreaded case. Which is the whole point of serial equivalence. Is that not a viable approach?

andreasnoack · 2016-11-19T18:37:38Z

In the final version, it should be fine to control threading with JULIA_NUM_THREADS only but here I made the branch to have an easy way to investigate how @threads affected the results. It is then ironic that the threaded code only had problems because I added a branch to check if it was correct.

NHDaly · 2019-12-19T06:10:17Z

I guess this is more or less closed by #33119 (and #30896 is basically a duplicate of this issue), but this was left open because we lack documentation explaining how to avoid this problem. I guess now we should add documentation explaining how to use $-interpolation in @async and @spawn to avoid this variable lifting.

In fact, it looks like @spawn still isn't mentioned at all in the https://docs.julialang.org/en/v1.4-dev/manual/parallel-computing/ page, so there's probably a lot of work to be done on documentation..

dandanua · 2024-02-18T17:06:29Z

I think this issue has nothing to do with threads. The confusion comes from the scoping rules - the assignment to a variable after the cycle makes this variable non-local in the cycle. So, it's about ordering. And from a newbie perspective this is not intuitive. This is also inconsistent with the interactive global scoping rules (for an obvious reason - we can't know the future global scope).

I've edited the second part of this comment since I found this motivation for the scoping rule. This is sensible indeed. BTW, that comment explains the rule more clearly than docs. I think docs should emphasize on this more since Julia is different from many other languages in this regard.

eschnett · 2024-02-20T14:55:46Z

It would make sense to emit a compiler warning if a variable leaves a scope only because of how it is used later in the same function. That could be extended by a simple mechanism to declare local variables further up in a function.

Stefan Karpinski's argument (semantics should be independent of order) implies that a variables' scope begins before it is used the first time. This is counter-intuitive inside a function where statements are always executed in order, and it catches people by surprise.

jrevels added the multithreading Base.Threads and related functionality label Feb 5, 2016

jrevels changed the title ~~Inconsistent variable binding behavior when using @threads~~ Race condition caused by variable scope getting lifted from a multithreaded context Feb 5, 2016

KristofferC mentioned this issue Feb 6, 2016

Explicit capture of variables in a closure. #14959

Open

mauro3 mentioned this issue Feb 8, 2016

Issues with scope-documentation #14903

Open

4 tasks

jiahao mentioned this issue Apr 19, 2016

Comment on method addition change from 0.4 to 0.5 #15950

Closed

yuyichao mentioned this issue Jun 28, 2016

Scoping issue in @testset #17170

Closed

vtjnash closed this as completed Jul 12, 2016

yuyichao mentioned this issue Aug 30, 2016

added n-loops and tuple argument ability to @threads #18278

Closed

jrevels reopened this Nov 17, 2016

andreasnoack mentioned this issue Dec 13, 2016

Stop using @threads? JuliaStats/GLM.jl#169

Closed

yuyichao mentioned this issue Jun 9, 2017

always create new bindings for for loop iteration variables #22314

Closed

KristofferC mentioned this issue May 9, 2023

Multithreaded loops can return incorrect output in 1.8.5 #49707

Closed

adienes mentioned this issue Oct 6, 2024

Shadowing in closures in let blocks influences bindings in neighboring functions #56002

Open

Race condition caused by variable scope getting lifted from a multithreaded context #14948

Race condition caused by variable scope getting lifted from a multithreaded context #14948

Comments

jrevels commented Feb 5, 2016

yuyichao commented Feb 5, 2016

yuyichao commented Feb 5, 2016

jrevels commented Feb 5, 2016

yuyichao commented Feb 5, 2016

JeffBezanson commented Feb 5, 2016

kpamnany commented Feb 5, 2016

JeffBezanson commented Feb 6, 2016

yuyichao commented Feb 6, 2016

kpamnany commented Feb 6, 2016

yuyichao commented Feb 6, 2016

kpamnany commented Feb 6, 2016

yuyichao commented Feb 6, 2016

kpamnany commented Feb 6, 2016

yuyichao commented Feb 6, 2016

kpamnany commented Feb 6, 2016

yuyichao commented Feb 6, 2016

kpamnany commented Feb 6, 2016

JeffBezanson commented Feb 6, 2016

yuyichao commented Feb 6, 2016

JeffBezanson commented Feb 6, 2016

yuyichao commented Feb 6, 2016

JeffBezanson commented Feb 6, 2016

kpamnany commented Feb 6, 2016

JeffBezanson commented Feb 6, 2016

threads commented Feb 6, 2016

yuyichao commented Feb 6, 2016

eschnett commented Feb 6, 2016

carnaval commented Feb 6, 2016

sirinath commented Mar 21, 2016

carnaval commented Mar 21, 2016

sirinath commented Mar 21, 2016

kpamnany commented Apr 15, 2016

jrevels commented Apr 17, 2016

sirinath commented Apr 18, 2016

vtjnash commented Jul 12, 2016

andreasnoack commented Nov 17, 2016 • edited Loading

jrevels commented Nov 17, 2016

kpamnany commented Nov 17, 2016

andreasnoack commented Nov 17, 2016

kpamnany commented Nov 18, 2016

yuyichao commented Nov 18, 2016

andreasnoack commented Nov 19, 2016

kpamnany commented Nov 19, 2016

andreasnoack commented Nov 19, 2016

NHDaly commented Dec 19, 2019

dandanua commented Feb 18, 2024 • edited Loading

eschnett commented Feb 20, 2024

andreasnoack commented Nov 17, 2016 •

edited

Loading

dandanua commented Feb 18, 2024 •

edited

Loading