-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heisenbug in "test guarded Random.seed!" #42752
Comments
I thought I was seeing this quite a bit on builds at random this week |
Is it architecture-specific? |
When I run the test in question with the global seed given from the build log, I get the same result as the julia> function testfun()
Random.seed!(0x2f5d1b5dc893fdf5ee2a1080933cd16d)
@testset "test guarded Random.seed!" begin
seed = rand(UInt)
orig = copy(Random.default_rng())
@test guardseed(()->rand(), seed) == guardseed(()->rand(),
seed)
@test guardseed(()->rand(Int), seed) ==
guardseed(()->rand(Int), seed)
r1, r2 = MersenneTwister(0), MersenneTwister(0)
a, b = guardseed(r1) do
Random.seed!(r1, 0)
rand(r1), rand(r1, Int)
end::Tuple{Float64,Int}
c, d = guardseed(r2) do
Random.seed!(r2, 0)
rand(r2), rand(r2, Int)
end::Tuple{Float64,Int}
@test a == c == rand(r1) == rand(r2)
@test b == d == rand(r1, Int) == rand(r2, Int)
@show orig
@show Random.default_rng()
@test orig == Random.default_rng()
@test rand(orig) == rand()
end
end
testfun (generic function with 1 method)
julia> testfun()
orig = Xoshiro(0x947b9af2ba269897, 0x95ae8d9a31382b32, 0x8a532dc2baa61f2e, 0x7213512241a4dfbd)
Random.default_rng() = TaskLocalRNG()
Test Summary: | Pass Total
test guarded Random.seed! | 6 6
Test.DefaultTestSet("test guarded Random.seed!", Any[], 6, false, false) Sadly we don't have the other internal state in the log :/ I checked all combinations of 2 flipped bits, so 65536 RNGs, but none showed a match. We do have the result of the next We'd basically have to solve this: s0::UInt, s3::UInt
tmp = s0 + s3
res = (((tmp << 23) | (tmp >> 41)) + s0) >> 11
res == 0x00191bda5154fb80 which I got by seeing that the random values are generated via
This is due to the |
FWIW I also saw it with 1.7.0-rc2 on x86_64, but with different values:
https://kojipkgs.fedoraproject.org//work/tasks/9369/77709369/build.log (This was the first time I built 1.7.0-rc2 so it may not be so rare.) |
Maybe something to do with #42201? Although that was a while ago. |
If we modify the test here to show the true state of I tried to debug this by letting Z3 spit out a bunch of number pairs that would create those values, but there wasn't really a pattern in them. Please excuse the bad quality - I hadn't yet setup my screenshotting tool. This screenshot shows the number of times a bit flip had to have occured in the generated numbers in each nibble (4-bit group) of each UInt. Z3 spit out 15 values with exactly 9 bitflips (non-consecutive) and didn't find any possible values with fewer. The largest number of bitflips possible to still find solutions is 18 (but again no pattern in those). |
Is this expected/related behavior? [sukera@tempman ~]$ julia -q -t1
julia> t = current_task()
Task (runnable) @0x00007f6723fdc010
julia> state = (t.rngState0, t.rngState1, t.rngState2, t.rngState3)
(0xf6e7025b1a108853, 0x1949f9d36755c8cf, 0xdc13b6c30461a2b5, 0xb199766836484420)
julia> using Random
julia> copy(Random.default_rng()) # yep, same state as the rng in our task
Xoshiro(0xf6e7025b1a108853, 0x1949f9d36755c8cf, 0xdc13b6c30461a2b5, 0xb199766836484420)
julia> Random.default_rng()
TaskLocalRNG()
julia> function Base.show(io::IO, ::MIME"text/plain", x::Random.TaskLocalRNG)
t = current_task()
write(io, "TaskLocalRNG(")
show(io, MIME"text/plain"(), t.rngState0)
write(io, ',')
show(io, MIME"text/plain"(), t.rngState1)
write(io, ',')
show(io, MIME"text/plain"(), t.rngState2)
write(io, ',')
show(io, MIME"text/plain"(), t.rngState3)
print(io, ')')
end
julia> state1 = (t.rngState0, t.rngState1, t.rngState2, t.rngState3) # ok, our state didn't change..
(0xf6e7025b1a108853, 0x1949f9d36755c8cf, 0xdc13b6c30461a2b5, 0xb199766836484420)
julia> t = current_task() # still the same task
Task (runnable) @0x00007f6723fdc010
julia> state1 = (t.rngState0, t.rngState1, t.rngState2, t.rngState3) # still the same state
(0xf6e7025b1a108853, 0x1949f9d36755c8cf, 0xdc13b6c30461a2b5, 0xb199766836484420)
julia> copy(Random.default_rng()) # should just copy rng of current task, so far so good
Xoshiro(0xf6e7025b1a108853, 0x1949f9d36755c8cf, 0xdc13b6c30461a2b5, 0xb199766836484420)
julia> Random.default_rng() # ?! shouldn't this be the same state we just copied from?
TaskLocalRNG(0x6c4866be67e41b35,0x190098b53e17879a,0xe342dc7b9c59fdeb,0xb156895802bc4cb0)
julia> using Base.Threads
julia> Threads.nthreads()
1
julia> versioninfo()
Julia Version 1.8.0-DEV.798
Commit 3ae25059fb (2021-10-22 06:13 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake) I have no idea why the copy differs from the original - as far as I can tell, even the current task is the same, so no new RNG was split off! Am I doing something wrong with printing here? It's basically the same as for regular Xoshiro, and printing itself doesn't seem to change RNG either: julia> Random.default_rng()
TaskLocalRNG(0x6c4866be67e41b35,0x190098b53e17879a,0xe342dc7b9c59fdeb,0xb156895802bc4cb0)
julia> Random.default_rng()
TaskLocalRNG(0x6c4866be67e41b35,0x190098b53e17879a,0xe342dc7b9c59fdeb,0xb156895802bc4cb0)
julia> Random.default_rng()
TaskLocalRNG(0x6c4866be67e41b35,0x190098b53e17879a,0xe342dc7b9c59fdeb,0xb156895802bc4cb0) |
Ok, when I just |
These numbers don't appear to be random, but is advanced exactly 8 numbers from where it was supposed to be generating:
|
which means the 2 states were probably:
|
Very interesting! That's also consistent with the other observation posted above statesjulia> orig = Xoshiro(0x98eb785bda6da52d, 0xa71f2324bc045d76, 0x9739d712cfb7c053, 0xa001e265c6af9015)
Xoshiro(0x98eb785bda6da52d, 0xa71f2324bc045d76, 0x9739d712cfb7c053, 0xa001e265c6af9015)
julia> rand(orig)
0.9755253173603154
julia> rand(orig)
0.6218595743287779
julia> rand(orig)
0.18401662900007387
julia> rand(orig)
0.4942449913515099
julia> rand(orig)
0.19740142179010944
julia> rand(orig)
0.5657614748780961
julia> rand(orig)
0.5698126213082862
julia> rand(orig)
0.6621331406776867
julia> orig
Xoshiro(0x23cdaf44f4fd1c54, 0x84dec7f31d026a9c, 0x6fdbeb04a2abd21a, 0x4cd9c15a6e89152c)
julia> rand(orig)
0.45116343295850014 so I'd be inclined to say that finding what advances RNGs by exactly 8 numbers makes it likely to find the cause. |
The |
I just saw what appears to be the same bug, on Since this is on |
Yep, exhibits the same behavior of being 8 |
Still broken, but somewhat less so?? (or more so??) https://build.julialang.org/#/builders/2/builds/76 We are still generating the wrong random number, and it is still off by 8, but now the contents of the copy and the Task are exactly the same. |
I got an rr trace: https://buildkite.com/julialang/julia-master/builds/11189#1be3920d-621e-439a-8d43-7be5ba357a1b Please note that the rr trace will be deleted from Buildkite in six months.
|
Here is the cause of the fault: (rr) bt
#0 0x00007f64c6633201 in jl_genrandom (rngState=0x7f64661a1468) at /cache/build/default-amdci4-4/julialang/julia-master/src/task.c:726
#1 rng_split (to=0x7f64632c9880, from=0x7f64661a1430) at /cache/build/default-amdci4-4/julialang/julia-master/src/task.c:753
#2 ijl_new_task (start=0x7f646283de60, completion_future=<optimized out>, ssize=<optimized out>) at /cache/build/default-amdci4-4/julialang/julia-master/src/task.c:794
#3 0x00007f64af996040 in _Task () at boot.jl:444
#4 Task () at task.jl:5
#5 Task () at task.jl:5
#6 errormonitor () at task.jl:539
#7 julia_process_worker_67608 () at /cache/build/default-amdci4-4/julialang/julia-master/usr/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:328
#8 0x00007f64af99c00e in send_del_client_no_lock () at /cache/build/default-amdci4-4/julialang/julia-master/usr/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:306
#9 julia_finalize_ref_67633 () at /cache/build/default-amdci4-4/julialang/julia-master/usr/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:103
#10 0x00007f64af99c06c in jfptr_finalize_ref_67634.clone_1 () at abstractarray.jl:668
#11 0x00007f64c660b977 in _jl_invoke (world=<optimized out>, mfunc=<optimized out>, nargs=1, args=0x7f64a31503f8, F=0x7f64b2d320d0 <jl_system_image_data+36638224>)
at /cache/build/default-amdci4-4/julialang/julia-master/src/gf.c:2373
#12 ijl_apply_generic (F=<optimized out>, args=args@entry=0x7f64a31503f8, nargs=nargs@entry=1) at /cache/build/default-amdci4-4/julialang/julia-master/src/gf.c:2574
#13 0x00007f64c6669d3d in jl_apply (nargs=2, args=0x7f64a31503f0) at /cache/build/default-amdci4-4/julialang/julia-master/src/julia.h:1838
#14 run_finalizer (o=<optimized out>, ff=<optimized out>, ct=<optimized out>) at /cache/build/default-amdci4-4/julialang/julia-master/src/gc.c:280
#15 0x00007f64c666acd5 in jl_gc_run_finalizers_in_list (ct=ct@entry=0x7f64661a1430, list=list@entry=0x7f64a3150570) at /cache/build/default-amdci4-4/julialang/julia-master/src/gc.c:370
#16 0x00007f64c666af69 in run_finalizers (ct=0x7f64661a1430) at /cache/build/default-amdci4-4/julialang/julia-master/src/gc.c:400
#17 run_finalizers (ct=0x7f64661a1430) at /cache/build/default-amdci4-4/julialang/julia-master/src/gc.c:378
#18 0x00007f64c666c1e9 in jl_gc_run_pending_finalizers (ct=<optimized out>, ct@entry=0x7f64661a1430) at /cache/build/default-amdci4-4/julialang/julia-master/src/gc.c:411
#19 0x00007f64c5cce88c in jl_mutex_unlock (lock=<optimized out>) at /cache/build/default-amdci4-4/julialang/julia-master/src/julia_locks.h:131
#20 jl_generate_fptr_impl (mi=<optimized out>, world=33829) at /cache/build/default-amdci4-4/julialang/julia-master/src/jitlayers.cpp:361
#21 0x00007f64c660aad8 in jl_compile_method_internal (world=33829, mi=0x7f649bd55410) at /cache/build/default-amdci4-4/julialang/julia-master/src/gf.c:2106
#22 jl_compile_method_internal (mi=0x7f649bd55410, world=33829) at /cache/build/default-amdci4-4/julialang/julia-master/src/gf.c:2047
#23 0x00007f64c660bb38 in _jl_invoke (world=33829, mfunc=<optimized out>, nargs=2, args=0x7f64a31508a8, F=0x7f64b5232a70 <jl_system_image_data+75438000>)
at /cache/build/default-amdci4-4/julialang/julia-master/src/gf.c:2384
#24 ijl_apply_generic (F=<optimized out>, args=args@entry=0x7f64a31508a8, nargs=nargs@entry=2) at /cache/build/default-amdci4-4/julialang/julia-master/src/gf.c:2574
#25 0x00007f64c662c7bd in jl_apply (nargs=3, args=0x7f64a31508a0) at /cache/build/default-amdci4-4/julialang/julia-master/src/julia.h:1838
#26 do_call (args=args@entry=0x7f649bd39078, nargs=nargs@entry=3, s=s@entry=0x7f64a3151090) at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:126
#27 0x00007f64c662bf77 in eval_value (e=e@entry=0x7f64a26eca70, s=s@entry=0x7f64a3151090) at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:215
#28 0x00007f64c662d0ac in eval_stmt_value (s=0x7f64a3151090, stmt=0x7f64a26eca70) at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:594
#29 eval_body (stmts=stmts@entry=0x3ebbdc00, s=s@entry=0x7f64a3151090, ip=<optimized out>, ip@entry=79, toplevel=toplevel@entry=1)
at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:594
#30 0x00007f64c662d750 in eval_body (stmts=stmts@entry=0x3ebbdc00, s=s@entry=0x7f64a3151090, ip=<optimized out>, ip@entry=40, toplevel=toplevel@entry=1)
at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:522
#31 0x00007f64c662d750 in eval_body (stmts=stmts@entry=0x3ebbdc00, s=s@entry=0x7f64a3151090, ip=<optimized out>, ip@entry=38, toplevel=toplevel@entry=1)
at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:522
#32 0x00007f64c662d750 in eval_body (stmts=<optimized out>, s=s@entry=0x7f64a3151090, ip=<optimized out>, ip@entry=0, toplevel=toplevel@entry=1)
at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:522
#33 0x00007f64c662e386 in jl_interpret_toplevel_thunk (m=m@entry=0x7f6463dff0d0, src=0x7f6468b50d90) at /cache/build/default-amdci4-4/julialang/julia-master/src/interpreter.c:750
#34 0x00007f64c66507df in jl_toplevel_eval_flex (m=m@entry=0x7f6463dff0d0, e=<optimized out>, fast=fast@entry=1, expanded=expanded@entry=0)
at /cache/build/default-amdci4-4/julialang/julia-master/src/toplevel.c:912
#35 0x00007f64c665163b in jl_toplevel_eval_flex (m=m@entry=0x7f6463dff0d0, e=e@entry=0x7f646886cd50, fast=fast@entry=1, expanded=expanded@entry=0)
at /cache/build/default-amdci4-4/julialang/julia-master/src/julia.h:996
#36 0x00007f64c6652b0c in ijl_toplevel_eval (m=m@entry=0x7f6463dff0d0, v=v@entry=0x7f646886cd50) at /cache/build/default-amdci4-4/julialang/julia-master/src/toplevel.c:921
#37 0x00007f64c6652c2b in ijl_toplevel_eval_in (m=0x7f6463dff0d0, ex=0x7f646886cd50) at /cache/build/default-amdci4-4/julialang/julia-master/src/toplevel.c:971
#38 0x00007f64b03eaa42 in eval () at boot.jl:370
#39 japi1_include_string_30440 () at loading.jl:1297 |
Just as a record, I saw:
https://build.julialang.org/#/builders/7/builds/4863/steps/5/logs/stdio
in #42725. It vanished after re-running the test by merging
master
even though there doesn't seem a change that could fix the result a4903fd...76c2431.The text was updated successfully, but these errors were encountered: