-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make JNA Binding available to Java clients? #3
Comments
I would be interested in a java wrapper that you could use without directly calling clojure. Here is a blogpost about how the ffi system works in case you are interested. |
I do have an idea about the crashing - https://cnuernber.github.io/libjulia-clj/signals.html. |
Thanks for the input, I will try the signal chaining workaround. I guess with that one does not require the j_options function to disable signal handling (since Julia requires that for multithreading as I understood from your explanation). Regarding Java16-FFI. I would prefer something that is compatible up to Java 8. Since there are still lots of companies stuck at Java 8 or Java 11. So if you prefer JNR over JNA. It does not matter to me what I integrate as long as it gets the job done. JNR might be a bit faster from what I understand for situations where lots calls are made into Julia. I guess it would be easy for you to port this to java, since you already have lots of experience with this. :) |
It would be difficult to port it directly to java - that is a bad assumption. Clojure is much more compact than java and furthermore the ffi layer it is built upon is a large part of the system. dtype-next is a complex piece of engineering that enables good support for algorithms across jvm-heap and native-heap datastructures. That is the piece that makes libraries like these so quick and it is a foundational piece - exactly the type that is not easily replaceable. If you would like to wrap this library in a pure java layer so users don't need to use Clojure to use it I would be interested in helping. |
Well, if it is possible to use libjulia-clj from java, then I am fine with that. What do I need to do to initialize/call it? |
Definitely possible. Glad you are open minded - will respond in detail soon. |
Regarding the signal chaining workaround. That did the trick. Though Julia still causes JVM crashes when calling it from other threads. Thus I created a workaround to always call julia from the same executor thread. Currently implemented with julia4j. To call libjulia-clj from java I guess we need |
I am glad the signal chaining worked for you - that was some in-depth information took some work to figure out. The best way to call clojure from java is to use the extremely minimal public api. This is so minimal that calling things like the initialize function would require some work such as constructing keywords and most likely persistent maps or something along those lines. The gen-class pathway is possible but the above API would work without changes to the published jar nor requiring any AOT - it would, however, require more boilerplate initially. Given that julia4j is working for you are in interested in pursuing this further or should we close this issue? |
I would still like to have multiple options integrated into invesdwin-context-julia. JNA has the benefit of not requiring a native compiled dll/so (which I currently only have compiled for linux using julia4j). Also error handling and getting string responses from julia is lacking with julia4j at the moment. |
OK that is encouraging. Also note that libjulia-clj has zero copy support for dense nd objects so you can actually share memory between java and julia although you have to keep in mind that julia is column major while the underlying ND object system I use is row-major. Do you have a minimal java project where you are trying these things out? |
Here is the project that I prepared with a dependency to libjulia-clj: Zero copy support would be great. Regarding the minimal API provided by Clojure, I guess gen-class is preferable as it is generated and can not break by getting out of sync with the libjulia-clj implementation. |
for gen-class to work we need a precompile step. Clojure libraries don't package the actual class files as they dynamically compile the .clj files upon require. My recommendation is for me to create a small gen-class-based class that I will test with the unit testing system and for the build system of your the invesdwin libjuliaclj bindings run a simple compilation step that will create all of the required class files to any desired directory and then if you package those class files with your jar the java import step will work. |
Or perhaps I can upload a version of libjulia with the gen-class and class files in it. I think this may be the best step for now. I will reach out when I have something that I think will work. |
Sounds good. |
First attempt - jar - https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-3 API docs - https://cnuernber.github.io/libjulia-clj/libjulia-clj.java-api.html So, there should be a class in package -rw-rw-rw- 1992 26-Dec-2021 10:52:10 libjulia_clj/java_api.class The functions should be without the It is easiest to export the env var JULIA_HOME just as in the local env script. You can also pass "julia-home" in as the key in the options map to initialize. Small example unit test is here. |
I don't understand. Is it supposed to work by calling |
Trying it:
Results in:
This is what the generated class looks like:
|
This seems to work:
Using this class:
Though is there maybe a way to properly name those handles? |
Also would it be possible to have a fallback available so that I can set JULIA_HOME based on a system property instead of a ENV variable? Then I could configure it programmatically before calling libjulia-clj. |
I got my tests green with this: Also the error handling looks good: So good job so far! With the JULIA_HOME system property or startup parameter (to configure it programmatically) I would be very happy. Naming the const__x handles a bit better would be a bonus. |
Ah, I must have done something wrong w/r/t to exporting the various functions. The functions should be something like public static Object initialize(Object options) ... |
Also I noticed you developed: https://github.com/clj-python/libpython-clj |
Sure, libpython-clj could have nearly identical bindings but let's get this process down first. There is lot's more where that is concerned...
When I generate bindings for JNA every binding I generate is direct mapped. This gets similar speed to JNR. For granular function access JDK-17 is about twice as fast but it comes with some serious caveats in how it loads shared libraries. For graal native we can directly link the library into the final executable so that is another perf boost but it doesn't support generic callbacks so that is a serious weakness - hopefully the graal system supports the JDK-17 foreign api at some point. I will rebuild the jar with correct symbols so the interfacing code isn't quite so harsh. |
New jar is up: https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-4. This should just have normal public static methods. |
scicloj looks really cool! Would be happy to integrate that here (closed source though): Here the open source performance related stuff that I am working on:
|
That is a very impressive set of modules. The NoSQL database is interesting - I would argue that the machinery around DAO's isn't worth it - keeping things as columns will lead to faster processing times in general. In some specifics I could see things being different but my experience, and as I indicate in my talk, is that columns allow hotspot to emit vectorizing instructions while processing the data in row-major DAO form with objects or otherwise does not. I think this is probably an age-old argument that has many tradeoffs and caveats. Weka is GPLv2 so we stay away from it. Note that isn't LGPL - it is the full GPL. What is the primary user interface to this system? |
Another library then that you may like in the fast-data-pathways is tmducken which binds to duckdb at the C level. That one is just barely fleshed out but it works and is quite quick. |
Here some documentation about my usage of GPL'd code: https://github.com/invesdwin/invesdwin-context-r#license-discussion The NoSQL storage is only for keeping data compressed locally to a computation node. For processing there are way better formats (as you also suggest). I am using precomputed primitive arrays for my strategy generation stuff. Though I also have other backtesting engines that use multiple layers of selfoptimizing lookback caches on top of the database. For cases where data does not fit into memory or large portfolios are tested. |
Also, from what I understand, nowadays SMILE is also GPL, not LGPL anymore. |
Regarding DuckDB: anything that requires SQL parsing is too slow for my requirements (from what I have tested so far). Though I would be happy to create a benchmark if there is a java binding. Another no-go is the lack of compression when handling tick data. |
That all makes a lot of sense. I do have a duckdb pathway - there is a way to test but I need to go for now. You can pass in "julia-home" as a key in the map and it will supercede env var. Agree name you suggest makes more sense. |
Regarding primary user interface:
|
Sweet, thanks, this is all fascinating. |
Just watching some of your talks. Amazing work you are doing! Not to be too greedy, but If it is possible to create a java binding for scicloj then I would also love to have a binding for: https://github.com/alanmarazzi/panthera ;) |
I have benchmarked DuckDB, Hsqldb, H2 and SQLite via their JDBC drivers. Results have been added to the performance table below: https://github.com/invesdwin/invesdwin-context-persistence/blob/master/README.md#timeseries-module DuckDB only outperforms the other embedded SQL-Databases in iterator speed (using ORDER BY ASC; unordered is ~3x as fast). The other metrics are worse. I would not expect a native binding to make it much more suitable for timeseries queries (get/getlatest). But I can still benchmark that if you provide a java binding, it could perform as well as QuestDB in iterator speed. Also regarding your comment above about DAOs, those are only part of the integration modules for JPA. The NoSQL database has nothing to do with DAOs or SQL. |
I guess we can close this issue. I created a follow up issue here: clj-python/libpython-clj#191 |
The timeseries implementation is really interesting. If I read the explanation correctly you store compressed batches of data in levelDB which means you get great read/write performance but you need a small caching layer in between the user and the data so the user doesn't see the compression system. This is somewhat similar to how parquet works in that parquet compresses each column separately as you write it and it writes out chunks of each column on demand so you get this interleaved mix of column chunks on disk that get decompressed as you iterate through the data. Your method seems clearer and simpler but I would bet that the per-column compressed of parquet gets better overall compression but it makes any sort of random access extremely difficult - I decompress the entire record set which is usually a few hundred MB's or so when I read a parquet file. I don't think overall there is a faster method than what you came up with especially with the in-memory cache in front of it although there most likely could be tweaks here or there for particular column types. Do you store each chunk in row-major or column-major format? And when decompress it do you decompress into records or columnwise into primitive arrays? |
I incorrectly defined the java-api interface for inJlContext. New AOT version is up - https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-5. |
Thanks, I upgraded to the new version. Regarding the architecture of the database: LevelDB is only used as in index for the segment lookup. The segments are stored separately in a file. Writes are done with FileChannel, reads are done using MemoryMapped File (makes heavy use of OS file cache). Storing large payloads does not work with LevelDB (too much write amplification). Also the index is boosted via an Write-Through-In-Memory-Cache. The File is append only (only the last segment can be rewritten). Each segment is compressed using LZ4 High or Fast depending on configuration. Each Segments contain 10k objects. Serialization/Deserialization is done using an implementation of ISerde using an IByteBuffer abstraction (https://github.com/invesdwin/invesdwin-util#byte-buffers). The operations that are needed to be fast with time series data are:
Column vs Row access is a consideration to make better use of cpu prefetching. Though the current solution already handles that on multiple layers and using zero copy at multiple stages. Though when the data is in the application level FileBufferCache I currently store the data as heap objects (thus columnar data). The alternative would be to use the flyweight pattern in a special ISerde implementation that projects the data from the underlying decompressed byte[] array.
The uncompressed objects are normally Ticks (time, ask, bid, askVolume, bidVolume) or Bars (time, open, high, low, close). For my machine learning engine I actually transform those objects into primitive arrays for each value separately. Thus using Columnar instead of Row based access as you suggest to get the in-memory-speed that I need for hundreds of thousands to millions of backtests per second (on the CPU to be clear). I extract more columnar primitive arrays for results of indicators and reuse calculations. Boolean expressions are stored even better via bitsets which allow combining &&/|| conditions very fast without having to do complex calculations of indicators multiple times. The engine uses as much memory as possible to test strategies as fast as possible. Also any pointer dereferencing is poison at those speeds. Though for general live trading or complex portfolio scenarios, memory is more a concern. In that case the Row based access is kept and each data point has its own lifecycle. Pushing/Pulling new market data is easier that way and plotting old data can be queried dynamically in a thread safe (but slower) engine. I also have an engine that can use the columnar storage for complex portfolio backtests. Though it is limited by memory easily. I am working on a new circular buffer engine that uses columnar storage but is live capable (moving windows in primitive arrays) and allows thread safe access (required for plotting and semi automated trading). So in the end I also follow the principle there that more options allow better flexibility, so use the execution engine that best suits the task. Similar to being able to choose the language integration framework that best suits the task. Though doing this for backtesting and execution engines requires lots of testing to make sure that all engines produce the same outputs. I have thousands of test cases with gigabytes of reference files to ensure that. Also what I am writing here is only the tip of the iceberg of optimizations. Making backtests faster opens so many possibilities. The faster our tools become, the harder the problems we can solve. With the primitive representation of the indicators, it would be possible to just load them into a GPU (the CPU handles all the complex calculations and data loading). Then let the GPU do what it does best: combine strategies in thousands of threads and find combinations that perform well based on some simple fitness criterion. The GPU then gives the best candidates as the output after some evolutionary (or some other ML approach) process (the Cuda/OpenCL code required here is not so much anymore and can be highly optimized). The CPU can then filter them, do risk management and robustness analysis to combine them into fully automated portfolios. But let's leave it at that. I guess I drifted a bit too deep, sorry for that. :D |
That is great, I haven't digested all of it yet but I have an unrelated question. How sensetive are you to startup times? I would like to do without AOT but keeping the same java interface you are using. This will result in 2-3 second compliation times but is overall simpler and avoids some versioning issues. |
Startup times are not too important. |
New libjulia-clj version is up with no AOT - https://clojars.org/cnuernber/libjulia-clj |
This is the correct link: https://clojars.org/com.cnuernber/libjulia-clj So compilation seems to take about 7 seconds. Though I wonder why the parallel test got slower. Maybe clojure now has some additional overhead for each call to determine if lazy compilation is still needed? |
Perhaps. Is that startup time OK with you? Personally it drives me crazy but the only other option is to have a parallel release of all of my jars as precompiled classes interact poorly with development time clojure practices as the class loaders get confused. |
Alright! We could eliminate those 4 seconds but as I said for at least this version it would be nice to be able to punt on that one. |
Yes, those 4 seconds are ok. Normally the processes run a bit longer. The platform takes a few seconds to initialize anyway with its bootstrap and Load-Time-Weaving of AspectJ also adds some overhead. Though since libjulia-clj/libpython-clj are anyway optional dependencies and both only get compiled when the functionality is accessed, everything is fine. If one wants to have fast startups times, the JVM is not the right tool. Either workaround with GraalVM or implement the tool in Golang or something else that starts up instantly. |
Agreed there. I make sure all of my tools work with graal native and I have tested various things such as processing a parquet dataset to an arrow dataset and such. The JIT is weaker but startup is nearly instant. |
Hi,
I would be interested to integrate julia via JNA here: https://github.com/invesdwin/invesdwin-context-julia/
Currently I tried to use Julia4j which gives memory access errors (maybe you have an Idea about what causes those) after a few commands and lacks error handling:
rssdev10/julia4j#2
Could you maybe provide a JNA layer that can be used from java code without clojure? Would be very much appreciated.
The text was updated successfully, but these errors were encountered: