Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scary error when generating large number of RuntimeGeneratedFunctions #13

Open
tisztamo opened this issue Oct 13, 2020 · 5 comments
Open

Comments

@tisztamo
Copy link

tisztamo commented Oct 13, 2020

The MWE fills an array with simple generated functions and calls them in a loop. Works for me when n=50_000, but not when n=65535.

julia> using RuntimeGeneratedFunctions
       RuntimeGeneratedFunctions.init(@__MODULE__)

       function genfuncs(n)
           fs = Any[]
           for i=1:n
               a = rand()
               ex = :(function f(x)
                   return x * $a
               end)
               f = @RuntimeGeneratedFunction(ex)
               push!(fs, f)
           end
           return fs
       end

       function calc(fs)
           sum = 0.0
           for i = eachindex(fs) 
               sum += fs[i](42.0)
           end
           return sum
       end

       fs = genfuncs(100_000)
       calc(fs)
ERROR: UndefVarError: mul_float not defined
Stacktrace:
 [1] * at ./float.jl:405 [inlined]
 [2] macro expansion at ./REPL[1]:9 [inlined]
 [3] macro expansion at /home/krisztian/.julia/packages/RuntimeGeneratedFunctions/fIcZp/src/RuntimeGeneratedFunctions.jl:80 [inlined]
 [4] macro expansion at ./none:0 [inlined]
 [5] generated_callfunc at ./none:0 [inlined]
 [6] (::RuntimeGeneratedFunctions.RuntimeGeneratedFunction{var"#_RuntimeGeneratedFunctions_ModTag",(0x92, 0xb5, 0xf6, 0xed, 0xbb, 0x56, 0xdf, 0xe8, 0x21, 0x03, 0xee, 0xa3, 0xe5, 0x65, 0x4c, 0xf8, 0x80, 0xb3, 0xe2, 0x57, 0x59, 0x5a, 0x10, 0x91, 0xd3, 0x39, 0xc9, 0xe2, 0x7b, 0xc5, 0x09, 0xcd, 0xbf, 0xa5, 0xcd, 0x04, 0xa1, 0x99, 0xac, 0x0d, 0xf0, 0x49, 0x08, 0x53, 0xf3, 0x01, 0x0e, 0xff, 0x94, 0x1e, 0x19, 0x76, 0xd1, 0x78, 0xe1, 0x43, 0xc3, 0xa5, 0x47, 0xc1, 0xda, 0x0b, 0x8a, 0xf4),(:x,)})(::Float64) at /home/krisztian/.julia/packages/RuntimeGeneratedFunctions/fIcZp/src/RuntimeGeneratedFunctions.jl:68
 [7] calc(::Array{Any,1}) at ./REPL[1]:20
 [8] top-level scope at REPL[1]:26

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
@ChrisRackauckas
Copy link
Member

Uhh... does Julia start to discard functions if we define too many? I'll have to ask some people. This is odd to say the least.

@c42f
Copy link
Contributor

c42f commented May 5, 2022

I spent an hour looking into this and reading the C code for method specializations in the runtime.

I wasn't able to find anything obvious yet. One interesting thing I noticed was that inspecting one of the failing functions with Cthulhu.jl seemed to work fine. But then I got different inconsistent UndefVarErrors such as

julia> fs[65534](1.0)
ERROR: UndefVarError: REPL[3] not defined

and

julia> fs[65533](1.0)
ERROR: UndefVarError: /home/chris/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl not defined

which does suggest some weird corruption in the Julia runtime.

I thought this might have something to do with the large number of method specializations of RuntimeGeneratedFunctions.generated_callfunc that we cause the compiler to generate here. After playing around a bit in the REPL I also see that the specializations list has grown quite a lot bigger than the boundary approximately around 2^16 where the failures seem to start. Not sure what to make of that yet.

julia> length(methods(RuntimeGeneratedFunctions.generated_callfunc).ms[1].specializations)
132387

@pfitzseb
Copy link

pfitzseb commented May 5, 2022

This very much looks like an UInt16 overflow, since

julia> fs[2](1.0) == fs[65538](1.0)
true

julia> fs[3](1.0) == fs[65539](1.0)
true

julia> fs[4](1.0) == fs[65540](1.0)
true

65532-65537 aren't callable:

┌ Error: iteration 65532
│   exception = UndefVarError: mul_float not defined
└ @ Main REPL[14]:8
┌ Error: iteration 65534
│   exception = mul_float: types of a and b must match
└ @ Main REPL[14]:8
┌ Error: iteration 65535
│   exception = mul_float: types of a and b must match
└ @ Main REPL[14]:8
┌ Error: iteration 65536
│   exception = UndefVarError: /home/pfitzseb/.julia/packages/RuntimeGeneratedFunctions/KrkGo/src/RuntimeGeneratedFunctions.jl not defined
└ @ Main REPL[14]:8
┌ Error: iteration 65537
│   exception = UndefVarError: REPL[3] not defined
└ @ Main REPL[14]:8

@pfitzseb
Copy link

pfitzseb commented May 5, 2022

Jeff suggested running a debug build to check whether this code trips any assertions, and indeed, it does:

julia> calc(fs)
10000
20000
30000
40000
50000
60000
julia-debug: /home/pfitzseb/Documents/Git/julia/src/ircode.c:344: jl_encode_value_: Assertion `id <= UINT16_MAX' failed.

https://github.com/JuliaLang/julia/blob/902a5c199d590552758f7a91cc75c47ea67de5f2/src/ircode.c#L344

I spend a few minutes trying to adjust a bunch of ints to size_ts, but that didn't work out so well.

Patch
diff --git a/src/ircode.c b/src/ircode.c
index 73e99f2281..05cdb02533 100644
--- a/src/ircode.c
+++ b/src/ircode.c
@@ -42,7 +42,7 @@ static void tagged_root(rle_reference *rr, jl_ircode_state *s, int i)
static void literal_val_id(rle_reference *rr, jl_ircode_state *s, jl_value_t *v) JL_GC_DISABLED
{
   jl_array_t *rs = s->method->roots;
-    int i, l = jl_array_len(rs);
+    size_t i, l = jl_array_len(rs);
   if (jl_is_symbol(v) || jl_is_concrete_type(v)) {
       for (i = 0; i < l; i++) {
           if (jl_array_ptr_ref(rs, i) == v)
@@ -341,9 +341,9 @@ static void jl_encode_value_(jl_ircode_state *s, jl_value_t *v, int as_literal)
               write_uint8(s->s, id);
           }
           else {
-                assert(id <= UINT16_MAX);
+                assert(id <= UINT32_MAX);
               write_uint8(s->s, TAG_LONG_METHODROOT);
-                write_uint16(s->s, id);
+                write_uint32(s->s, id);
           }
           return;
       }
@@ -603,11 +603,11 @@ static jl_value_t *jl_decode_value(jl_ircode_state *s) JL_GC_DISABLED
       key = read_uint64(s->s);
       tag = read_uint8(s->s);
       assert(tag == TAG_METHODROOT || tag == TAG_LONG_METHODROOT);
-        return lookup_root(s->method, key, tag == TAG_METHODROOT ? read_uint8(s->s) : read_uint16(s->s));
+        return lookup_root(s->method, key, tag == TAG_METHODROOT ? read_uint8(s->s) : read_uint32(s->s));
   case TAG_METHODROOT:
       return lookup_root(s->method, 0, read_uint8(s->s));
   case TAG_LONG_METHODROOT:
-        return lookup_root(s->method, 0, read_uint16(s->s));
+        return lookup_root(s->method, 0, read_uint32(s->s));
   case TAG_SVEC: JL_FALLTHROUGH; case TAG_LONG_SVEC:
       return jl_decode_value_svec(s, tag);
   case TAG_COMMONSYM:
diff --git a/src/julia_internal.h b/src/julia_internal.h
index 21c46129b7..2ee6ed073c 100644
--- a/src/julia_internal.h
+++ b/src/julia_internal.h
@@ -538,7 +538,7 @@ void jl_resolve_globals_in_ir(jl_array_t *stmts, jl_module_t *m, jl_svec_t *spar
JL_DLLEXPORT void jl_add_method_root(jl_method_t *m, jl_module_t *mod, jl_value_t* root);
void jl_append_method_roots(jl_method_t *m, uint64_t modid, jl_array_t* roots);
int get_root_reference(rle_reference *rr, jl_method_t *m, size_t i);
-jl_value_t *lookup_root(jl_method_t *m, uint64_t key, int index);
+jl_value_t *lookup_root(jl_method_t *m, uint64_t key, size_t index);
int nroots_with_key(jl_method_t *m, uint64_t key);

int jl_valid_type_param(jl_value_t *v);
diff --git a/src/method.c b/src/method.c
index 7325670bd7..3df10f20fe 100644
--- a/src/method.c
+++ b/src/method.c
@@ -404,7 +404,7 @@ static void jl_code_info_set_ir(jl_code_info_t *li, jl_expr_t *ir)

   // Flags that need to be copied to slotflags
   const uint8_t vinfo_mask = 8 | 16 | 32 | 64;
-    int i;
+    size_t i;
   for (i = 0; i < nslots; i++) {
       jl_value_t *vi = jl_array_ptr_ref(vis, i);
       jl_sym_t *name = (jl_sym_t*)jl_array_ptr_ref(vi, 0);
@@ -590,7 +590,7 @@ JL_DLLEXPORT jl_code_info_t *jl_code_for_staged(jl_method_instance_t *linfo)

       // If this generated function has an opaque closure, cache it for
       // correctness of method identity
-        for (int i = 0; i < jl_array_len(func->code); ++i) {
+        for (size_t i = 0; i < jl_array_len(func->code); ++i) {
           jl_value_t *stmt = jl_array_ptr_ref(func->code, i);
           if (jl_is_expr(stmt) && ((jl_expr_t*)stmt)->head == jl_new_opaque_closure_sym) {
               linfo->uninferred = jl_copy_ast((jl_value_t*)func);
@@ -1090,7 +1090,7 @@ int get_root_reference(rle_reference *rr, jl_method_t *m, size_t i)

// get a root, given its key and index relative to the key
// this is the relocatable way to get a root from m->roots
-jl_value_t *lookup_root(jl_method_t *m, uint64_t key, int index)
+jl_value_t *lookup_root(jl_method_t *m, uint64_t key, size_t index)
{
   if (!m->root_blocks) {
       assert(key == 0);
diff --git a/src/support/rle.h b/src/support/rle.h
index f85d9f35c4..3d47d15bb8 100644
--- a/src/support/rle.h
+++ b/src/support/rle.h
@@ -34,7 +34,7 @@ int rle_iter_increment(rle_iter_state *state, /* number of items */ size_t len,
/* indexing */
typedef struct {
   uint64_t key;
-    int index;     // number of preceding items in the list with the same key
+    size_t index;     // number of preceding items in the list with the same key
} rle_reference;

void rle_index_to_reference(rle_reference *rr, /* item index */ size_t i, uint64_t *rletable, size_t npairs, uint64_t key0);

This fails with

julia> fs = genfuncs(70_000);
julia-debug: /home/pfitzseb/Documents/Git/julia/src/julia.h:989: jl_array_ptr_ref: Assertion `i < jl_array_len(a)' failed.

signal (6): Aborted
in expression starting at REPL[5]:1
__pthread_kill_implementation at /usr/lib/libc.so.6 (unknown line)
raise at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
__assert_fail_base.cold at /usr/lib/libc.so.6 (unknown line)
__assert_fail at /usr/lib/libc.so.6 (unknown line)
jl_array_ptr_ref at /home/pfitzseb/Documents/Git/julia/src/julia.h:989
lookup_root at /home/pfitzseb/Documents/Git/julia/src/method.c:1097
jl_decode_value at /home/pfitzseb/Documents/Git/julia/src/ircode.c:610
jl_decode_value_expr at /home/pfitzseb/Documents/Git/julia/src/ircode.c:512
jl_decode_value at /home/pfitzseb/Documents/Git/julia/src/ircode.c:630
jl_decode_value_array at /home/pfitzseb/Documents/Git/julia/src/ircode.c:455
jl_decode_value at /home/pfitzseb/Documents/Git/julia/src/ircode.c:625
ijl_uncompress_ir at /home/pfitzseb/Documents/Git/julia/src/ircode.c:835

and I didn't figure out why m->roots isn't properly growing.

The more realistic and immediate solution is probably to change init to pre-create multiple generated functions based on an additional user-defined parameter.

@pfitzseb
Copy link

pfitzseb commented May 6, 2022

JuliaLang/julia#45173 fixes this for (at least) n = 2e5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants