Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JNA Binding available to Java clients? #3

Closed
subes opened this issue Dec 25, 2021 · 51 comments
Closed

Make JNA Binding available to Java clients? #3

subes opened this issue Dec 25, 2021 · 51 comments

Comments

@subes
Copy link

subes commented Dec 25, 2021

Hi,

I would be interested to integrate julia via JNA here: https://github.com/invesdwin/invesdwin-context-julia/

Currently I tried to use Julia4j which gives memory access errors (maybe you have an Idea about what causes those) after a few commands and lacks error handling:
rssdev10/julia4j#2

Could you maybe provide a JNA layer that can be used from java code without clojure? Would be very much appreciated.

@cnuernber
Copy link
Owner

cnuernber commented Dec 25, 2021

I would be interested in a java wrapper that you could use without directly calling clojure.

Here is a blogpost about how the ffi system works in case you are interested.

@cnuernber
Copy link
Owner

I do have an idea about the crashing - https://cnuernber.github.io/libjulia-clj/signals.html.

@subes
Copy link
Author

subes commented Dec 25, 2021

Thanks for the input, I will try the signal chaining workaround. I guess with that one does not require the j_options function to disable signal handling (since Julia requires that for multithreading as I understood from your explanation).

Regarding Java16-FFI. I would prefer something that is compatible up to Java 8. Since there are still lots of companies stuck at Java 8 or Java 11.

So if you prefer JNR over JNA. It does not matter to me what I integrate as long as it gets the job done. JNR might be a bit faster from what I understand for situations where lots calls are made into Julia.

I guess it would be easy for you to port this to java, since you already have lots of experience with this. :)
I would be happy to integrate and test it then.

@cnuernber
Copy link
Owner

It would be difficult to port it directly to java - that is a bad assumption. Clojure is much more compact than java and furthermore the ffi layer it is built upon is a large part of the system. dtype-next is a complex piece of engineering that enables good support for algorithms across jvm-heap and native-heap datastructures. That is the piece that makes libraries like these so quick and it is a foundational piece - exactly the type that is not easily replaceable. If you would like to wrap this library in a pure java layer so users don't need to use Clojure to use it I would be interested in helping.

@subes
Copy link
Author

subes commented Dec 25, 2021

Well, if it is possible to use libjulia-clj from java, then I am fine with that. What do I need to do to initialize/call it?
I know from kotlin libraries that it is possible to use it as a normal jar from java. Examples being mapdb and okhttp.

@cnuernber
Copy link
Owner

Definitely possible. Glad you are open minded - will respond in detail soon.

@subes
Copy link
Author

subes commented Dec 26, 2021

Regarding the signal chaining workaround. That did the trick. Though Julia still causes JVM crashes when calling it from other threads. Thus I created a workaround to always call julia from the same executor thread. Currently implemented with julia4j.

To call libjulia-clj from java I guess we need :gen-class directives in the clojure files: https://stackoverflow.com/questions/2181774/calling-clojure-from-java
With the current version of libjulia-clj I can not access anything from Java (imports can not find any classes).

@cnuernber
Copy link
Owner

I am glad the signal chaining worked for you - that was some in-depth information took some work to figure out.

The best way to call clojure from java is to use the extremely minimal public api. This is so minimal that calling things like the initialize function would require some work such as constructing keywords and most likely persistent maps or something along those lines.

The gen-class pathway is possible but the above API would work without changes to the published jar nor requiring any AOT - it would, however, require more boilerplate initially. Given that julia4j is working for you are in interested in pursuing this further or should we close this issue?

@subes
Copy link
Author

subes commented Dec 26, 2021

I would still like to have multiple options integrated into invesdwin-context-julia. JNA has the benefit of not requiring a native compiled dll/so (which I currently only have compiled for linux using julia4j). Also error handling and getting string responses from julia is lacking with julia4j at the moment.

@cnuernber
Copy link
Owner

OK that is encouraging. Also note that libjulia-clj has zero copy support for dense nd objects so you can actually share memory between java and julia although you have to keep in mind that julia is column major while the underlying ND object system I use is row-major.

Do you have a minimal java project where you are trying these things out?

@subes
Copy link
Author

subes commented Dec 26, 2021

Here is the project that I prepared with a dependency to libjulia-clj:
https://github.com/invesdwin/invesdwin-context-julia/tree/main/invesdwin-context-julia-parent/invesdwin-context-julia-runtime-libjuliaclj

Zero copy support would be great. Regarding the minimal API provided by Clojure, I guess gen-class is preferable as it is generated and can not break by getting out of sync with the libjulia-clj implementation.

@cnuernber
Copy link
Owner

for gen-class to work we need a precompile step. Clojure libraries don't package the actual class files as they dynamically compile the .clj files upon require.

My recommendation is for me to create a small gen-class-based class that I will test with the unit testing system and for the build system of your the invesdwin libjuliaclj bindings run a simple compilation step that will create all of the required class files to any desired directory and then if you package those class files with your jar the java import step will work.

@cnuernber
Copy link
Owner

Or perhaps I can upload a version of libjulia with the gen-class and class files in it. I think this may be the best step for now. I will reach out when I have something that I think will work.

@subes
Copy link
Author

subes commented Dec 26, 2021

Sounds good.

@cnuernber
Copy link
Owner

cnuernber commented Dec 26, 2021

First attempt - jar - https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-3

API docs - https://cnuernber.github.io/libjulia-clj/libjulia-clj.java-api.html

So, there should be a class in package libjulia_clj named java_api:

  -rw-rw-rw-      1992  26-Dec-2021  10:52:10  libjulia_clj/java_api.class

The functions should be without the - prefix, that just indicates to clojure not to mangle the names in any way.

It is easiest to export the env var JULIA_HOME just as in the local env script. You can also pass "julia-home" in as the key in the options map to initialize.

Small example unit test is here.

@subes
Copy link
Author

subes commented Dec 26, 2021

I don't understand. Is it supposed to work by calling libjulia_clj.java_api.main("(japi/-initialize (jvm-map/hash-map {"n-threads" 8}))")?
I dont see other public methods or a way to get back return values.

@subes
Copy link
Author

subes commented Dec 26, 2021

Trying it:

    public static void main(final String[] args) {
        libjulia_clj.java_api.main(new String[] { "(japi/-initialize (jvm-map/hash-map {\"n-threads\" 8}))" });
    }

Results in:

Exception in thread "main" java.lang.UnsupportedOperationException: libjulia-clj.java-api/-main not defined
	at libjulia_clj.java_api.main(Unknown Source)
	at de.invesdwin.context.julia.runtime.libjuliaclj.internal.UnsafeJuliaEngineWrapper.main(UnsafeJuliaEngineWrapper.java:49)

This is what the generated class looks like:

// Warning: No line numbers available in class file
/*  */ 
/*  */ import clojure.lang.IFn;
/*  */ import clojure.lang.RT;
/*  */ import clojure.lang.Util;
/*  */ import clojure.lang.Var;
/*  */ 
/*  */ public class java_api {
/*  */   private static final Var main__var = Var.internPrivate("libjulia-clj.java-api", "-main");
/*  */   
/*  */   private static final Var equals__var = Var.internPrivate("libjulia-clj.java-api", "-equals");
/*  */   
/*  */   private static final Var toString__var = Var.internPrivate("libjulia-clj.java-api", "-toString");
/*  */   
/*  */   private static final Var hashCode__var = Var.internPrivate("libjulia-clj.java-api", "-hashCode");
/*  */   
/*  */   private static final Var clone__var = Var.internPrivate("libjulia-clj.java-api", "-clone");
/*  */   
/*  */   static {
/*  */     Util.loadWithClass("/libjulia_clj/java_api", java_api.class);
/*  */   }
/*  */   
/*  */   public boolean equals(Object paramObject) {
/*  */     equals__var.isBound() ? (IFn)equals__var.get() : null;
/*  */     return ((equals__var.isBound() ? (IFn)equals__var.get() : null) != null) ? ((Boolean)((IFn)(equals__var.isBound() ? (IFn)equals__var.get() : null)).invoke(this, paramObject)).booleanValue() : super.equals(paramObject);
/*  */   }
/*  */   
/*  */   public String toString() {
/*  */     toString__var.isBound() ? (IFn)toString__var.get() : null;
/*  */     return ((toString__var.isBound() ? (IFn)toString__var.get() : null) != null) ? (String)((IFn)(toString__var.isBound() ? (IFn)toString__var.get() : null)).invoke(this) : super.toString();
/*  */   }
/*  */   
/*  */   public int hashCode() {
/*  */     hashCode__var.isBound() ? (IFn)hashCode__var.get() : null;
/*  */     return ((hashCode__var.isBound() ? (IFn)hashCode__var.get() : null) != null) ? ((Number)((IFn)(hashCode__var.isBound() ? (IFn)hashCode__var.get() : null)).invoke(this)).intValue() : super.hashCode();
/*  */   }
/*  */   
/*  */   public Object clone() {
/*  */     clone__var.isBound() ? (IFn)clone__var.get() : null;
/*  */     return ((clone__var.isBound() ? (IFn)clone__var.get() : null) != null) ? ((IFn)(clone__var.isBound() ? (IFn)clone__var.get() : null)).invoke(this) : super.clone();
/*  */   }
/*  */   
/*  */   public static void main(String[] paramArrayOfString) {
/*  */     if ((main__var.isBound() ? (IFn)main__var.get() : null) != null) {
/*  */       ((IFn)(main__var.isBound() ? (IFn)main__var.get() : null)).applyTo(RT.seq(paramArrayOfString));
/*  */     } else {
/*  */       throw new UnsupportedOperationException("libjulia-clj.java-api/-main not defined");
/*  */     } 
/*  */   }
/*  */ }

@subes
Copy link
Author

subes commented Dec 26, 2021

This seems to work:

    public static void main(final String[] args) {
        final HashMap<String, Object> initParams = new HashMap<String, Object>() {
            {
                put("n-threads", 8);
            }
        };
        final Object call = libjulia_clj.java_api__init.const__3.invoke(initParams);
        System.out.println(call);
    }

Using this class:

/*    */  
/*    */    
/*    */    public static final Var const__0;
/*    */    public static final AFn const__1;
/*    */    public static final AFn const__2;
/*    */    public static final Var const__3;
/*    */    public static final AFn const__12;
/*    */    public static final Var const__13;
/*    */    public static final AFn const__16;
/*    */    public static final Var const__17;
/*    */    public static final AFn const__20;
/*    */    public static final Var const__21;
/*    */    public static final AFn const__24;
/*    */    public static final Var const__25;
/*    */    public static final AFn const__28;
/*    */    public static final Var const__29;
/*    */    public static final AFn const__32;
/*    */    
/*    */    public static void __init0() {
/*    */      const__0 = RT.var("clojure.core", "in-ns");
/*    */      const__1 = (AFn)Symbol.intern(null, "libjulia-clj.java-api");
/*    */      const__2 = (AFn)Symbol.intern(null, "clojure.core");
/*    */      const__3 = RT.var("libjulia-clj.java-api", "-initialize");
/*    */      const__12 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(Symbol.intern(null, "options")) })), RT.keyword(null, "doc"), "Initialize the julia interpreter.  See documentation for [[libjulia-clj.julia/initialize!]].\n  Options may be null or must be a map of string->value for one of the supported initialization\n  values.\n\n  Example:\n\n```clojure\n  (japi/-initialize (jvm-map/hash-map {\"n-threads\" 8}))\n```", RT.keyword(null, "line"), Integer.valueOf(13), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__13 = RT.var("libjulia-clj.java-api", "-runString");
/*    */      const__16 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(((IObj)Symbol.intern(null, "data")).withMeta(RT.map(new Object[] { RT.keyword(null, "tag"), Symbol.intern(null, "String") }))) })), RT.keyword(null, "doc"), "Run a string in Julia returning a jvm object if the return value is simple or\n  a julia object if not.  The returned object will have a property overloaded\n  toString method for introspection.", RT.keyword(null, "line"), Integer.valueOf(31), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__17 = RT.var("libjulia-clj.java-api", "-inJlContext");
/*    */      const__20 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(((IObj)Symbol.intern(null, "fn")).withMeta(RT.map(new Object[] { RT.keyword(null, "tag"), Symbol.intern(null, "Function") }))) })), RT.keyword(null, "doc"), "Execute a function in a context where all julia objects created will be released\n  just after the function returns.  The function must return pure JVM data - it cannot\n  return a reference to a julia object.", RT.keyword(null, "line"), Integer.valueOf(39), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__21 = RT.var("libjulia-clj.java-api", "-namedTuple");
/*    */      const__24 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(((IObj)Symbol.intern(null, "data")).withMeta(RT.map(new Object[] { RT.keyword(null, "tag"), Symbol.intern(null, "Map") }))) })), RT.keyword(null, "doc"), "Create a julia named tuple.  This is required for calling keyword functions.  The\n  path for calling keyword functions looks something like:\n\n  * `data` - must be an implementation of java.util.Map with strings as keys.\n\n```clojure\n(let [add-fn (jl \"function teste(a;c = 1.0, b = 2.0)\n    a+b+c\nend\")\n          kwfunc (jl \"Core.kwfunc\")\n          add-kwf (kwfunc add-fn)]\n      (is (= 38.0 (add-kwf (jl/named-tuple {'b 10 'c 20})\n                           add-fn\n                           8.0)))\n      (is (= 19.0 (add-kwf (jl/named-tuple {'b 10})\n                           add-fn\n                           8.0)))\n      (is (= 11.0 (add-kwf (jl/named-tuple)\n                           add-fn\n                           8.0)))\n\n      (is (= 38.0 (add-fn 8.0 :b 10 :c 20)))\n      (is (= 19.0 (add-fn 8 :b 10)))\n      (is (= 11.0 (add-fn 8))))\n```", RT.keyword(null, "line"), Integer.valueOf(49), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__25 = RT.var("libjulia-clj.java-api", "-createArray");
/*    */      const__28 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(Symbol.intern(null, "datatype"), Symbol.intern(null, "shape"), Symbol.intern(null, "data")) })), RT.keyword(null, "doc"), "Return julia array out of the tuple of datatype, shape, and data.\n\n  * `datatype` - must be one of the strings `[\"int8\" \"uint8\" \"int16\" \"uin16\"\n  \"int32\" \"uint32\" \"int64\" \"uint64\" \"float32\" \"float64\"].\n  * `shape` - an array or implementation of java.util.List that specifies the row-major\n  shape intended of the data.  Note that Julia is column-major so this data will appear\n  transposed when printed via Julia.\n  * `data` may be a java array or an implementation of java.util.List.  Ideally data is\n  of the same datatype as data.", RT.keyword(null, "line"), Integer.valueOf(79), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__29 = RT.var("libjulia-clj.java-api", "-arrayToJVM");
/*    */      const__32 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(Symbol.intern(null, "jlary")) })), RT.keyword(null, "doc"), "Returns a map with three keys - shape, datatype, and data.  Shape is an integer array,\n  datatype is a string denoting one of the supported datatypes, and data is a primitive\n  array of data.", RT.keyword(null, "line"), Integer.valueOf(96), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */    }
/*    */    
/*    */    static {
/*    */      __init0();
/*    */      Compiler.pushNSandLoader(RT.classForName("libjulia_clj.java_api__init").getClassLoader());
/*    */      try {
/*    */        load();
/*    */      } finally {
/*    */        Var.popThreadBindings();
/*    */      } 
/*    */    } }

Though is there maybe a way to properly name those handles?

@subes
Copy link
Author

subes commented Dec 26, 2021

Also would it be possible to have a fallback available so that I can set JULIA_HOME based on a system property instead of a ENV variable? Then I could configure it programmatically before calling libjulia-clj.

@subes
Copy link
Author

subes commented Dec 26, 2021

I got my tests green with this:
https://github.com/invesdwin/invesdwin-context-julia/blob/main/invesdwin-context-julia-parent/invesdwin-context-julia-runtime-libjuliaclj/src/main/java/de/invesdwin/context/julia/runtime/libjuliaclj/internal/UnsafeJuliaEngineWrapper.java
grafik
I also needed the workaround with using always the same thread. initparams.put("signals-enabled?" false) did not work as an alternative to the LD_PRELOAD workaround. Maybe the ? is too much?

Also the error handling looks good:
grafik

So good job so far!

With the JULIA_HOME system property or startup parameter (to configure it programmatically) I would be very happy. Naming the const__x handles a bit better would be a bonus.

@cnuernber
Copy link
Owner

Ah, I must have done something wrong w/r/t to exporting the various functions. The functions should be something like public static Object initialize(Object options) ...

@subes
Copy link
Author

subes commented Dec 26, 2021

Also this is by far the fastest integration so far:

Libjulia-clj:
grafik

Julia4j:
grafik

JuliaCaller:
grafik

Jajub:
grafik

Though I wonder why is it so much faster?

  • With JuliaCaller and Jajub I think it might be the REPL+Pipe/Socket overhead
  • With Julia4j it might be the workaround of getting strings via writing/reading files

@subes
Copy link
Author

subes commented Dec 26, 2021

Also I noticed you developed: https://github.com/clj-python/libpython-clj
Maybe we could also export java classes for that so I can integrate it into: https://github.com/invesdwin/invesdwin-context-python

@cnuernber
Copy link
Owner

Sure, libpython-clj could have nearly identical bindings but let's get this process down first. There is lot's more where that is concerned...

  • avclj is bindings to the ffmpeg shared libraries so you can encode/decode video.

  • We also have a data processing library that is damn fast. I explain why it is so fast here - for an independent developer to smash the bigger toolkits is no small feat.

  • The larger Clojure scicloj community has an ml package that can use the sklearn learners and comes with good smile bindings by default :-).

When I generate bindings for JNA every binding I generate is direct mapped. This gets similar speed to JNR. For granular function access JDK-17 is about twice as fast but it comes with some serious caveats in how it loads shared libraries. For graal native we can directly link the library into the final executable so that is another perf boost but it doesn't support generic callbacks so that is a serious weakness - hopefully the graal system supports the JDK-17 foreign api at some point.

I will rebuild the jar with correct symbols so the interfacing code isn't quite so harsh.

@cnuernber
Copy link
Owner

New jar is up: https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-4.

This should just have normal public static methods.

@subes
Copy link
Author

subes commented Dec 26, 2021

scicloj looks really cool!

Would be happy to integrate that here (closed source though):
grafik

Here the open source performance related stuff that I am working on:

@cnuernber
Copy link
Owner

That is a very impressive set of modules. tmd is much faster and more sophisticated than tablesaw and it supports things like memory mapped arrow files - something the arrow java SDK itself doesn't support. I had never heard of jquantlib before but I have users who would be interested in that.

The NoSQL database is interesting - I would argue that the machinery around DAO's isn't worth it - keeping things as columns will lead to faster processing times in general. In some specifics I could see things being different but my experience, and as I indicate in my talk, is that columns allow hotspot to emit vectorizing instructions while processing the data in row-major DAO form with objects or otherwise does not. I think this is probably an age-old argument that has many tradeoffs and caveats.

Weka is GPLv2 so we stay away from it. Note that isn't LGPL - it is the full GPL.

What is the primary user interface to this system?

@cnuernber
Copy link
Owner

cnuernber commented Dec 26, 2021

Another library then that you may like in the fast-data-pathways is tmducken which binds to duckdb at the C level. That one is just barely fleshed out but it works and is quite quick.

@subes
Copy link
Author

subes commented Dec 26, 2021

Here some documentation about my usage of GPL'd code: https://github.com/invesdwin/invesdwin-context-r#license-discussion
tl;dr: only use it for testing, it's a deployment/redistribution concern (personal usage is unaffected). If you want the gray-area-solution: wrap it in a CLI application (then it runs similar to gnu tools).

The NoSQL storage is only for keeping data compressed locally to a computation node. For processing there are way better formats (as you also suggest). I am using precomputed primitive arrays for my strategy generation stuff. Though I also have other backtesting engines that use multiple layers of selfoptimizing lookback caches on top of the database. For cases where data does not fit into memory or large portfolios are tested.

@subes
Copy link
Author

subes commented Dec 26, 2021

Also, from what I understand, nowadays SMILE is also GPL, not LGPL anymore.

@subes
Copy link
Author

subes commented Dec 26, 2021

Regarding DuckDB: anything that requires SQL parsing is too slow for my requirements (from what I have tested so far). Though I would be happy to create a benchmark if there is a java binding. Another no-go is the lack of compression when handling tick data.

@subes
Copy link
Author

subes commented Dec 26, 2021

The named methods look good now. Regarding JULIA_HOME I would suggest this:
grafik

@cnuernber
Copy link
Owner

That all makes a lot of sense. I do have a duckdb pathway - there is a way to test but I need to go for now. You can pass in "julia-home" as a key in the map and it will supercede env var. Agree name you suggest makes more sense.

@subes
Copy link
Author

subes commented Dec 26, 2021

This works:
grafik

I don't care about the name. julia-home follows the naming pattern of the other variables. So it is fine like this.

@subes
Copy link
Author

subes commented Dec 26, 2021

Regarding primary user interface:

@cnuernber
Copy link
Owner

Sweet, thanks, this is all fascinating.

@subes
Copy link
Author

subes commented Dec 26, 2021

Just watching some of your talks. Amazing work you are doing!

Not to be too greedy, but If it is possible to create a java binding for scicloj then I would also love to have a binding for: https://github.com/alanmarazzi/panthera

;)

@subes
Copy link
Author

subes commented Dec 27, 2021

I have benchmarked DuckDB, Hsqldb, H2 and SQLite via their JDBC drivers. Results have been added to the performance table below: https://github.com/invesdwin/invesdwin-context-persistence/blob/master/README.md#timeseries-module

DuckDB only outperforms the other embedded SQL-Databases in iterator speed (using ORDER BY ASC; unordered is ~3x as fast). The other metrics are worse. I would not expect a native binding to make it much more suitable for timeseries queries (get/getlatest). But I can still benchmark that if you provide a java binding, it could perform as well as QuestDB in iterator speed.
Though timeseries data has different requirements for storage than columnar data which is more commonly used with machine learning. For timeseries data the write speed is as important for data pipelines in live trading or when loading/calculating from tick streams.

Also regarding your comment above about DAOs, those are only part of the integration modules for JPA. The NoSQL database has nothing to do with DAOs or SQL.

@subes
Copy link
Author

subes commented Dec 27, 2021

I guess we can close this issue. I created a follow up issue here: clj-python/libpython-clj#191

@subes subes closed this as completed Dec 27, 2021
@cnuernber
Copy link
Owner

The timeseries implementation is really interesting. If I read the explanation correctly you store compressed batches of data in levelDB which means you get great read/write performance but you need a small caching layer in between the user and the data so the user doesn't see the compression system.

This is somewhat similar to how parquet works in that parquet compresses each column separately as you write it and it writes out chunks of each column on demand so you get this interleaved mix of column chunks on disk that get decompressed as you iterate through the data.

Your method seems clearer and simpler but I would bet that the per-column compressed of parquet gets better overall compression but it makes any sort of random access extremely difficult - I decompress the entire record set which is usually a few hundred MB's or so when I read a parquet file. I don't think overall there is a faster method than what you came up with especially with the in-memory cache in front of it although there most likely could be tweaks here or there for particular column types. Do you store each chunk in row-major or column-major format? And when decompress it do you decompress into records or columnwise into primitive arrays?

@cnuernber
Copy link
Owner

I incorrectly defined the java-api interface for inJlContext. New AOT version is up - https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-5.

@subes
Copy link
Author

subes commented Dec 27, 2021

Thanks, I upgraded to the new version.

Regarding the architecture of the database: LevelDB is only used as in index for the segment lookup. The segments are stored separately in a file. Writes are done with FileChannel, reads are done using MemoryMapped File (makes heavy use of OS file cache). Storing large payloads does not work with LevelDB (too much write amplification). Also the index is boosted via an Write-Through-In-Memory-Cache. The File is append only (only the last segment can be rewritten). Each segment is compressed using LZ4 High or Fast depending on configuration.

Each Segments contain 10k objects. Serialization/Deserialization is done using an implementation of ISerde using an IByteBuffer abstraction (https://github.com/invesdwin/invesdwin-util#byte-buffers).

The operations that are needed to be fast with time series data are:

  • getLatest: uses the index to find the proper segment. Then searches that segment for the correct value. From there normally one starts to iterate
  • Iterate: can be done forward and reverse. Forward is easier. The caches always fetch a bunch of elements to keep a dynamic self optimizing lookback: https://github.com/invesdwin/invesdwin-util#caches
  • For iterating backwards the FileBufferCache (an application level segment cache replicating the OS file cache, though on uncompressed and unmarshalled data) boosts performance a lot. Otherwise backwards iteration and cache misses in the lookback caches would be very expensive. It also boosts forward iteration between threads a lot. E.g. each thread runs a separate backtest, IO only occurs for the first thread that requires the data.

Column vs Row access is a consideration to make better use of cpu prefetching. Though the current solution already handles that on multiple layers and using zero copy at multiple stages. Though when the data is in the application level FileBufferCache I currently store the data as heap objects (thus columnar data). The alternative would be to use the flyweight pattern in a special ISerde implementation that projects the data from the underlying decompressed byte[] array.

  • There is a task for that: try to use flyweight pattern with FileBufferCache invesdwin/invesdwin-context-persistence#12
  • This could theoretically squeeze out an additional 30% performance.
  • Though this would have the drawback that the lifecycle of the uncompressed objects are bound to the lifecycle of the byte[] in the FileBufferCache. Evicting the FileBufferCache will then not free the byte[] as long as any one object is still referenced. Thus memory consumption will grow very large. Because sometimes user code is only interested in having one object of the segment (e.g. keeping an important value, like the first value of a backtest). There should be ways to balance this, but I have not yet started the refactoring to tackle this (since the 30% don't seem to be worth the effort yet).

The uncompressed objects are normally Ticks (time, ask, bid, askVolume, bidVolume) or Bars (time, open, high, low, close). For my machine learning engine I actually transform those objects into primitive arrays for each value separately. Thus using Columnar instead of Row based access as you suggest to get the in-memory-speed that I need for hundreds of thousands to millions of backtests per second (on the CPU to be clear). I extract more columnar primitive arrays for results of indicators and reuse calculations. Boolean expressions are stored even better via bitsets which allow combining &&/|| conditions very fast without having to do complex calculations of indicators multiple times. The engine uses as much memory as possible to test strategies as fast as possible. Also any pointer dereferencing is poison at those speeds.

Though for general live trading or complex portfolio scenarios, memory is more a concern. In that case the Row based access is kept and each data point has its own lifecycle. Pushing/Pulling new market data is easier that way and plotting old data can be queried dynamically in a thread safe (but slower) engine. I also have an engine that can use the columnar storage for complex portfolio backtests. Though it is limited by memory easily. I am working on a new circular buffer engine that uses columnar storage but is live capable (moving windows in primitive arrays) and allows thread safe access (required for plotting and semi automated trading). So in the end I also follow the principle there that more options allow better flexibility, so use the execution engine that best suits the task. Similar to being able to choose the language integration framework that best suits the task. Though doing this for backtesting and execution engines requires lots of testing to make sure that all engines produce the same outputs. I have thousands of test cases with gigabytes of reference files to ensure that. Also what I am writing here is only the tip of the iceberg of optimizations. Making backtests faster opens so many possibilities. The faster our tools become, the harder the problems we can solve.

With the primitive representation of the indicators, it would be possible to just load them into a GPU (the CPU handles all the complex calculations and data loading). Then let the GPU do what it does best: combine strategies in thousands of threads and find combinations that perform well based on some simple fitness criterion. The GPU then gives the best candidates as the output after some evolutionary (or some other ML approach) process (the Cuda/OpenCL code required here is not so much anymore and can be highly optimized). The CPU can then filter them, do risk management and robustness analysis to combine them into fully automated portfolios.

But let's leave it at that. I guess I drifted a bit too deep, sorry for that. :D

@cnuernber
Copy link
Owner

cnuernber commented Dec 27, 2021

That is great, I haven't digested all of it yet but I have an unrelated question. How sensetive are you to startup times? I would like to do without AOT but keeping the same java interface you are using. This will result in 2-3 second compliation times but is overall simpler and avoids some versioning issues.

@cnuernber cnuernber reopened this Dec 27, 2021
@subes
Copy link
Author

subes commented Dec 27, 2021

Startup times are not too important.

@cnuernber
Copy link
Owner

New libjulia-clj version is up with no AOT - https://clojars.org/cnuernber/libjulia-clj

@subes
Copy link
Author

subes commented Dec 28, 2021

This is the correct link: https://clojars.org/com.cnuernber/libjulia-clj
Here the new test timings:
grafik
Before:
grafik

So compilation seems to take about 7 seconds. Though I wonder why the parallel test got slower. Maybe clojure now has some additional overhead for each call to determine if lazy compilation is still needed?

@cnuernber
Copy link
Owner

Perhaps. Is that startup time OK with you? Personally it drives me crazy but the only other option is to have a parallel release of all of my jars as precompiled classes interact poorly with development time clojure practices as the class loaders get confused.

@subes
Copy link
Author

subes commented Dec 28, 2021

Here new tests against 1.000-aot-beta-5:
grafik
And new tests against 1.000-aot-beta-4:
grafik

So the compilation is more like 4 seconds. There is no runtime overhead. So everything is fine. Sometimes my notebook gets slower after returning from hibernation/suspend. I guess after a reboot the speed will be faster again.

@cnuernber
Copy link
Owner

Alright! We could eliminate those 4 seconds but as I said for at least this version it would be nice to be able to punt on that one.

@subes
Copy link
Author

subes commented Dec 28, 2021

Yes, those 4 seconds are ok. Normally the processes run a bit longer. The platform takes a few seconds to initialize anyway with its bootstrap and Load-Time-Weaving of AspectJ also adds some overhead. Though since libjulia-clj/libpython-clj are anyway optional dependencies and both only get compiled when the functionality is accessed, everything is fine.

If one wants to have fast startups times, the JVM is not the right tool. Either workaround with GraalVM or implement the tool in Golang or something else that starts up instantly.

@cnuernber
Copy link
Owner

Agreed there. I make sure all of my tools work with graal native and I have tested various things such as processing a parquet dataset to an arrow dataset and such. The JIT is weaker but startup is nearly instant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants