Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent crash attempting to embed Julia in Java via dynamically loading libjulia.so #36092

Closed
cnuernber opened this issue May 31, 2020 · 14 comments

Comments

@cnuernber
Copy link

I am playing with JNA bindings and consistently getting a crash on 1.4.2 if jl_init__threading is called at all and then followed with 1 or 2 System.gc() calls.

The crash I am seeing is:

fatal: error thrown and no exception handler available.
ReadOnlyMemoryError()

We are setting up the system with a julia stack related variable set:
https://github.com/cnuernber/julia-clj/blob/master/dockerfiles/Dockerfile#L32

Potentially we should be building julia from source in the docker container with some different variables set related to stack management.

Again, this is simply calling jl_init__threading and then System.gc() a few times. Oddly enough, it is consistent when connecting via a remote repl but not when using a repl from the command line.

If I startup a command line repl and initialize julia then the crash happens at the moment a remote repl connects; potentially this is spawning another thread or something along those lines.

Related issues are (at least) #32700 and #31104

Given we are running from a docker container in the first place all options are on the table; rebuilding julia, rebuilding openjdk, etc. I really appreciate any help with this and have some interest in accessing the excellent solvers (and GPU programming) in Julia via this pathway.

@mkitti
Copy link
Contributor

mkitti commented Oct 14, 2020

Hi @cnuernber, my sense is the signal handling between Java and Julia are getting crossed. We may need to set jl_options.handle_signals = JL_OPTIONS_HANDLE_SIGNALS_OFF as in https://github.com/JuliaLang/julia/blob/master/src/jloptions.c#L651

@cnuernber
Copy link
Author

cnuernber commented Oct 14, 2020

I agree, this looks very promising. It looks like I can call jl_parse_opts in libjulia before I attempt to initialize the system and see if that stabilizes it some.

I also recently realized that potentially the original pathway in javacall.jl of hosting the jvm as opposed to the jvm hosting julia may work better as that may disable some of the JVM's less community oriented features. It is a little tough then to find the symbols to bind to for 2-way bindings but still very solvable. I do know that a similar pathway in Python (python hosting the jvm) allows some uses cases that can't happen otherwise. Getting a Clojure repl started from javacall.jl and attempting the same GC loop would also be interesting.

@mkitti
Copy link
Contributor

mkitti commented Oct 15, 2020

jl_options is also exported in the DLL / shared library.

https://github.com/JuliaLang/julia/blob/master/src/julia.h#L1941-L1942

@mkitti
Copy link
Contributor

mkitti commented Nov 10, 2020

Signal chaining facilities may also be important when embedding in JVM:
https://docs.oracle.com/javase/8/docs/technotes/guides/vm/signal-chaining.html

@cnuernber
Copy link
Author

cnuernber commented Nov 23, 2020

Success! That indeed stopped the memory error:

julia-clj.core> (disable-julia-signals!)
Nov 23, 2020 11:31:19 PM clojure.tools.logging$eval153$fn__156 invoke
INFO: Library julia found at [:system "julia"]
nil
julia-clj.core> (println (.toString (JLOptions. (find-julia-symbol "jl_options"))))
JLOptions(native@0x7fab48a9ede0) (184 bytes) {
  byte quiet@0x0=0x0
  byte banner@0x1=0xFF
  Pointer julia_bindir@0x8=null
  Pointer julia_bin@0x10=null
  Pointer cmds@0x18=null
  Pointer image_file@0x20=null
  Pointer cpu_target@0x28=null
  int nthreads@0x30=0x0000
  int nprocs@0x34=0x0000
  Pointer machine_file@0x38=null
  Pointer project@0x40=null
  byte isinteractive@0x48=0x0
  byte color@0x49=0x0
  byte historyfile@0x4A=0x1
  byte startupfile@0x4B=0x0
  byte compile_enabled@0x4C=0x1
  byte code_coverage@0x4D=0x0
  byte malloc_log@0x4E=0x0
  byte opt_level@0x4F=0x2
  byte debug_level@0x50=0x1
  byte check_bounds@0x51=0x0
  byte depwarn@0x52=0x0
  byte warn_overwrite@0x53=0x0
  byte can_inline@0x54=0x1
  byte polly@0x55=0x1
  Pointer trace_compile@0x58=null
  byte fast_math@0x60=0x0
  byte worker@0x61=0x0
  Pointer cookie@0x68=null
  byte handle_signals@0x70=0x0
  byte use_sysimage_native_code@0x71=0x1
  byte use_compiled_modules@0x72=0x1
  Pointer bindto@0x78=null
  Pointer outputbc@0x80=null
  Pointer outputunoptbc@0x88=null
  Pointer outputo@0x90=null
  Pointer outputasm@0x98=null
  Pointer outputji@0xA0=null
  Pointer output_code_coverage@0xA8=null
  byte incremental@0xB0=0x0
  byte image_file_specified@0xB1=0x0
  byte warn_scope@0xB2=0x1
}
nil
julia-clj.core> (jl_init__threading)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> (System/gc)
nil
julia-clj.core> 

cnuernber/libjulia-clj@cf720be

@cnuernber
Copy link
Author

cnuernber commented Nov 24, 2020

OK, on to next crash:

  • any call to jl_get_global or jl_get_module_binding crashes with two of the system modules: jl_base_module and jl_core_module.

Also jl_module_name returns a symbol with crap where the name should be. But creating a new symbol returns a new symbol.

jl_eval_string can return something that can be read back:

libjulia-clj.impl.base> (-> (jl_eval_string "sqrt(2.0)")
                            (jl_unbox_float64))
1.4142135623730951

It seems like something still is not initialized.

@cnuernber
Copy link
Author

I think I figured out out. Symbols returned from jna's getGlobalVariableAddress are ptr-to-ptrs. basemod is located at the first pointer pointed to from the global address.

@cnuernber
Copy link
Author

Closing this issue for now. Will open a new one if I get truly stuck again but it looks like things are working according to the documentation.

@cnuernber
Copy link
Author

@mkitti - Thanks for your help :-). Will be in touch if I get something interesting.

@mkitti
Copy link
Contributor

mkitti commented Nov 25, 2020

This is awesome. I'm very interested in seeing how this works out going forward.

Are the Java and Clojure parts currently integrated or is it possible to have a more generic JNA/JNR Java embedding of Julia and a separate Clojure part?

Some other notes:

  1. If you need to use JNI from Julia, take a look at the JNI.jl submodule in JavaCall.jl. Especially note the init_current_vm function.
    https://github.com/JuliaInterop/JavaCall.jl/blob/master/src/JNI.jl

  2. If you get the chance, I'm wondering how well this works on Windows and Mac. Things get harry there especially when using multithreading / multitasking. JULIA_COPY_STACKS=1 causes Julia to crash immediately on Windows for me (although I can get the REPL operational for a while with julia --banner=no.

  3. I'm doing some work with java.nio over in Buffer-backed access (full implementation) imglib/imglib2#299 . I'm getting interested in using nio Buffer to share data between JVM and Julia.

@cnuernber
Copy link
Author

cnuernber commented Nov 25, 2020

Thank you, I appreciate all of this. I got past all blocking things and now I can scan a julia module and call functions out of like they are native Clojure functions:

user> (require '[libjulia-clj.impl.base :as base])
nil
user> (base/initialize!)
Nov 25, 2020 12:52:59 PM clojure.tools.logging$eval7895$fn__7898 invoke
INFO: Library /home/chrisn/dev/cnuernber/libjulia-clj/julia-1.5.3/lib/libjulia.so found at [:system "/home/chrisn/dev/cnuernber/libjulia-clj/julia-1.5.3/lib/libjulia.so"]
:okNov 25, 2020 12:53:00 PM clojure.tools.logging$eval7895$fn__7898 invoke
INFO: Reference thread starting

user> (require '[libjulia-clj.modules.Base :as Base])
nil
user> (Base/sqrt 2.0)
1.4142135623730951
user> 

I think next up will be trying to see how hard zerocopy of ND datastructures are.

If you are interested, the same underlying numerics system (https://github.com/cnuernber/dtype-next) allows:

And has an extremely performant dataframe abstraction.

This seems to start to parallel your imglib pathways. Note that nio buffers are limited to 2 billion entries, I switched to straight sun.misc.unsafe pathways and have things like the dataset library working buffers larger than 2GB (large text).

Julia is like the pinnacle of everything I mentioned immediately above :-) at least in terms of a language to write new numeric code in.

Getting any amount of the above working in pure java would be hideous from a time/LOC perspective but what I would recommend is I get as far as I can and then we hire someone to start moving pieces into pure Java.

Honestly I think it is just easier to put a Java wrapper over the Clojure than anything else. Then at least you aren't typing a godawful amount :-) and the base Clojure libraries are small and stable. Certainly nothing like the giant mess that is Scala.

@mkitti
Copy link
Contributor

mkitti commented Nov 25, 2020

I had noticed you had written JLOptions in Java, but I see most of the JNA interface is in Clojure.

There was a prior extension to ImgLib2 (better known as part of the ImageJ2 / FIJI distribution) using Unsafe that was also used to support Python mappings:
https://github.com/imglib/imglib2-unsafe
I am wary of using Unsafe because it seems there are now active efforts to phase it out:
https://blogs.oracle.com/javamagazine/the-unsafe-class-unsafe-at-any-speed

My work around for large arrays has been using com.sun.jna.Memory and obtaining ByteBuffer views of portions of the memory block as needed.

This is all a very exciting development. I'm aware only of JNI based efforts to do this so far but that requires that one has a compiler on hand:
https://github.com/rssdev10/julia4j

Cheers!

@cnuernber
Copy link
Author

I am also excited! You have been extremely helpful and found the missing piece!

There could be some more in Java for sure if I want to hand-code the java file. DirectMapped JNA is much faster than the generic find-symbol interface I use so it will make sense at some point to encode a few (or all) of the specific JNA methods into a Java class file. That just involves a compilation step and is thus a lot slower than when I can define functions ad-hoc with no meaningful recompilation step. The Clojure REPL is a quite powerful tool for exploration of codebases and maps really well to the JNA project :-).

Unsafe works on java8 and it works well on GraalVM, that author's issues aren't ones I encounter. I personally consider that blog post fairly silly as a very large portion of the java ecosystem (like spark and hadoop) sit on netty and the apache arrow project has an unsafe backend that works well on JDK8 and Graal. In any case changing the low level layer that essentially says 'write/read byte at this location' isn't exactly rocket science when the time comes; we aren't using it to define classes. It just needs to read and write the bytes... ;-). Maybe I should write a blog post: Life - unsafe at any speed - but no need to wet the bed.

I tried that swig project and it failed to compile for me.

I would like people who like Julia to enjoy using Clojure and vise versa and I for (great) personal reasons have limited time. But I think the potential is certainly there. Ideally if you have JULIA_HOME set the library just works in general across multiple Julia versions with no changes. That is one thing I would like and has really helped libpython-clj not be a complete nightmare when working with things like Conda. There is no compilation step at all nor project version changes in order to support different versions of Python (3.5-3.9) so that maps well to some set of desktop users. Aside we use Docker and k8 for production/orchestration.

I will need other people to test/stabilize other operating systems. I run Ubuntu and we use Docker for production and I don't game which means I only ever use or develop with 1 operating system.

@cnuernber
Copy link
Author

Initial outline working with API documents pulled from the online Julia documentation:
https://github.com/cnuernber/libjulia-clj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants