-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make JNA Binding available to Java clients #191
Comments
This is great :-). Is this going to be used in the same classloader as the libjulia-clj java bindings? The reason I ask is because if so, then I need to release a dtype-next For the first test I can just regenerate the dtype-next classes but after we confirm this is working I will need to release aot versions of dependent libraries if this is going to be used in the same classloader as the libjulia-clj java bindings. |
Initial jar - https://clojars.org/clj-python/libpython-clj/versions/2.004-aot The initialize function calls the python executable from command line and gets it to output the setup information. Python has a bit more involved setup than just a shared library path as it also needs to know some level of module root path. Let's see if this works and we can tweak. Small unit test showing functionality - https://github.com/clj-python/libpython-clj/blob/master/test/libpython_clj2/java_api_test.clj |
Thanks a lot, I will test it as soon as possible.
|
I made some progress here: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/PythonEngine.java Though I can't figure out how to get/set globals. I get this exception:
getAttr/setAttr also does not work:
I also tried just using "globals" as a string instead of using the returned value from the persistent map from
|
Ah, this is a lot simpler than expected, one just has to put/get from/into the PersistentMap. |
True, but there isn't a way to get the globals map in a stand-alone fashion which I can see would be useful. And you did find an error in the api :-). I tried to expose the python objects as their java equivalents so a python dict will be returned as an implementation of java.util.Map and tuples/lists implement java.util.List and java.util.RandomAccess, etc. There is a somewhat complex pathway in there in Clojure for if you want to copy a python value completely into the JVM such as copying a JSON object or if you want to bridge/proxy it like what people want to do with modules. |
Also I want to expose a withGIL function so you can capture the GIL once and do a set of things. This is similar to inContext with the exception that it doesn't attempt to release all objects allocated within the scope. |
This is how I get the globals right now:
|
Yep, that will work. |
And yes, withGIL would be great. Though a lock/unlock function would be better so I can do:
|
OK, makes sense. Then you can make your own withGIL if you need it... |
I would just wrap it in an implementation of ILock (which my client code already uses). |
New API is up - it has two new functions for GIL management (as well as a fixed setItem call) - https://clojars.org/clj-python/libpython-clj/versions/2.004-aot-1. Note that the python ensureGIL call returns an integer that must be passed into releaseGIL so ensureGIL is 'reentrant' but you would have to keep track of that integer in your ILock impl. New api fns are |
Why doesn't this:
throw an exception? I have no PYTHON_HOME env variable set. |
Good question. It should throw. Will check in a moment. For sure python2 isn't supported and probably not pypy as I am sitting directly on the shared library. Also not all python distributions even come with the shared library, a lot of them just compile the symbols into the python executable which is why we have an embedded pathway so you can boot up your system from python itself. On top of that there are like 5 package managers for python and various ones require various tweaks - We have collected what we know into the environments document. |
New release is up (as is my son so I have to run for a bit :-) ) - https://clojars.org/clj-python/libpython-clj/versions/2.005 This release does runtime AOT so you should see the same 4 seconds as the libjulia version. One pathway we have optimized fairly thoroughly is taking a large nested JSON object and converting it to a JVM datastructure. Aside from that I will be curious as to how it libpython-clj stacks up. Pantera was built against a much older version of libpython-clj2. I can reach out to Alan and see if he is interesting in taking it further but there is not use case where it is faster than tech.ml.dataset to my knowledge and many where it is slower although especially for your users there are so many libraries available for pandas especially for quant stuff that may make the difference. Here is a quick test of the system from the java api perspective. |
Sorry - I made a mistake in the jar definition! I checked and the new release has the class files - https://clojars.org/clj-python/libpython-clj. |
2.006 works, thanks! Would it be possible to sandbox interpreters in libpython-clj like it is possible in Jep (http://ninia.github.io/jep/javadoc/3.9/jep/SubInterpreter.html)? Currently when I disable exclusive locking with libpython-clj the state between threads gets mixed which makes the testcases red (since I guess all use the same interpreter and share globals). So threads have to use the python binding one after the other. The design doc talks about multiple interpreters, but I don't see an API to use that: https://clj-python.github.io/libpython-clj/design.html Would also be interesting if a sandboxing like this could be possible with libjulia-clj. |
Regarding panthera, scicloj.ml or a future binding for keras:
Doesn't libpython-clj support generating clojure bindings for python APIs? Maybe this generator could be extended to also generate bindings for java? Or are those generated bindings not high level enough and that is the reason why someone needs to package them as panthera or scicloj.ml with some manually coded sugar? The question also goes into the direction: should a java binding be made for the panthera/scicloj.ml like you did for libjulia-clj and libpython-clj with a generated static wrapper/facade, or can this be done at a different layer? |
These are great thoughts, glad we have the start of something going. SubInterpretersYour explanation makes sense to me. I have stayed away from this pathway as without anyone asking for it as it causes some level of operational risk. If there are multiple interpreters and someone does That design document was written very early and the very first version of libpython-clj contained some basic multi-interpreter support but no one used it and the way they were using libpython-clj meant multiple interpreters were going to cause issues. I should correct the document. Honestly I would love to add that as it is the kind of thing I like to do. It would take some thought and careful engineering but probably not that many lines of code. Java Wrappers for Python Librarieslibpython-clj contains a pretty good system for generating meaningful metadata from python libraries. Upon this system we built a runtime code generation facility - require-python and the static code generation facility you mentioned. It wouldn't take too much effort to make a java library from the same metadata and it would have good javadoc comments but it would be primarily typeless so a bit of an odd java interface. In terms of panthera it was written before the static code generation facility - the code generation now is good enough IMO that it isn't a requirement. I am not sure how member variables of python instances would translate so my guess is that for a high quality library you would still need to wrap various class types to make the member functions clear to intellisense if that is important. Large Features Missing From Current Java Bindings
Many of the things mentioned above will need to be carefully thought through in terms their interaction with multiple interpreters. |
I guess such large refactorings/redesigns are not too high priority. So if you want to do them, I will test/incorporate them. Though if someone wants that functionality, he could just use jep instead of libjulia-clj. Jep also solves the modules thing via a feature called "shared modules", though also with some warnings. I was now able to do some benchmarks: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#results It seems the performance is rather bad because of some overhead in clojure, though more significantly some inefficient native string parsing: Dunno what that code exactly does, but I guess a map to lookup functions (since I guess this is what the code does) by hash could improve the performance a lot. Or maybe it is the overhead of always returning the Map<Object, Object> for the current global/local dicts. In that case maybe a second method |
Hmm. So one thing is we are running a script to get the global dict and not caching it which is odd. But in general I wouldn't call into python that way. It would look more like: pyEngine.eval("def calcSpread(bid,ask):\n\treturn ask-bid\n\n")
clojure.lang.IFn calcSpread = (clojure.lang.IFn)(pyEngine.getGlobal("calcSpread")
loop:
result = (Double)calcSpread.invoke(bid, ask)
end-loop: That wouldn't parse anything at all once things got going. |
This is just a simple example to see which library causes the most overhead. Note the text below the results: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#solution There are definitely better ways to write production code for this. ^^ |
Right but my point is that performance results aren't indicative as to what will actually happen because the calling convention. |
In your profile there are things I can fix. I am not certain that anything should be calling find-pylib-fn or the sequence operator or a few things from what I see. Caching the global map would probably make a large difference and furthermore caching the conversion of the map keys to python objects and using them directly in the get would also probably be quicker. Ideally you only parse the string once and return some level of parsed thing which is something I hadn't considered before. But all of that will still be quite a bit slower than just creating a function and calling it directly. |
Here a benchmark of the function convention:
=> 182.01/ms Python calls with 104.07/ms ticks (due to libpython-clj startup being included in the ticks measure, otherwise it should be 1-1) |
And here another version that keeps the GIL-lock all the time:
204.62/ms Python calls with 134.98/ms ticks |
The thing is a byte[][] isn't a matrix type - it could be ragged. A byte[] combined with a shape such as an int array is a matrix type. |
Ok, I should look into using createArray anyhow. I also wanted to use that in the libjulia-clj integration. But did not go for that optimization yet. |
I am setting up things to auto-call that in setGlobal for primtive arrays and primitive arrays-of-arrays in the case where the array-of-array has a constant inner length. There is also a copyData call now so if you have an allocated numpy array you can quick copy an appropriately typed flat array of data into it. That copy pathway will not work with array-of-arrays. New release is up - 2.012. This contains copyData and an updated setGlobal pathway that will convert simple arrays automagically into numpy arrays via the createArray pathway. |
I tried it a bit, but I found no good way to make numpy arrays as the default work across engines.
I think it would be better to also make libjuliy-clj transmit lists per default. And maybe have some way to opt in to automatic numpy transmission (could be a system property like "libpython_clj.manual_gil" or a setGlobalNumpy(...) or setGlobal(key, value, numpy=true) overload. I think it would be best to use the least common denominator here (even if it might be an interesting optimization) because it changes the semantics in unexpected ways when trying to reuse scripts. One could also imagine restricted environments or offline installations that don't have numpy support or can not install numpy. Or just let users explictly use createArray for the numpy fastpath. Though it should be opt-in to use numpy. Also here a benchmark that shows that appending data is faster with python lists: Here another benchmark with random numbers: |
I could see lists working better for small things for sure - very small. I like the opt-in approach honestly the most as it is the simplest from my end and the opt-in pattern works in general. I think for like 99% of the use cases that are meaningful it will be the slowest one, especially if you are running a model or something as you have to go to numpy anyway but as you said the users can solve that. The deeper integration and especially zero-copy are distinct advantages of using libpython-clj or libjulia-clj but they are specializations. And the fastest pathway would involve preallocating a set of numpy arrays and copying into them repeatedly. Not just recreating them via setting globals. I am fine backing off and going back to lists for setGlobal - that is at least standardized in terms of it will behave the same with function calls. |
Release |
Would be great if we can get the conversions of |
Sure, also to have a consistent API. You are running a bit into something that libpython-clj supports in that it by default allows both proxying python objects to java in addition to copying them. runStringAsFile wasn't wrapping the return value correctly. What getGlobal and runStringAsFile do is they return proxied objects. There is another api fn, copyToJVM, that will always ensure the data is the in the JVM in the correct format. This is optimized for the case of returning lists or returning things such as nested json objects. In any case both will return an implementation of java.util.List whether it is proxied or not. 2.014 is up and contains a fix for runStringAsInput. |
This works great now. I updated the benchmarks: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#results And here the updated benchmark for the fastcallable function optimization: public class PythonStrategy extends StrategySupport {
private final String instrumentId;
private IScriptTaskEngine pythonEngine;
private ITickCache tickCache;
private int countPythonCalls = 0;
private Instant start;
private Instant lastLog;
private AutoCloseable calcSpread;
private ILock gilLock;
public PythonStrategy(final String instrumentId) {
this.instrumentId = instrumentId;
}
@Override
public void onInit() {
tickCache = getBroker().getInstrumentRegistry()
.getInstrumentOrThrow(instrumentId)
.getDataSource()
.getTickCache();
}
@Override
public void onStart() {
// pythonEngine = Py4jScriptTaskEnginePython.newInstance();
// pythonEngine = JythonScriptTaskEnginePython.newInstance();
// pythonEngine = JepScriptTaskEnginePython.newInstance();
pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();
pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
gilLock = pythonEngine.getSharedLock();
gilLock.lock();
final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
final IFn calcSpreadFunction = (IFn) unwrap.get("calcSpread");
calcSpread = libpython_clj2.java_api.makeFastcallable(calcSpreadFunction);
start = new Instant();
lastLog = new Instant();
}
@Override
public void onTickTime() {
final ATick lastTick = tickCache.getLastTick(null);
final double pythonSpread = Doubles.checkedCast(
libpython_clj2.java_api.call(calcSpread, lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
countPythonCalls++;
Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
//CHECKSTYLE:OFF
System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
//CHECKSTYLE:ON
lastLog = new Instant();
}
}
@Override
public void onStop() {
if (pythonEngine != null) {
try {
calcSpread.close();
} catch (final Exception e) {
throw new RuntimeException(e);
}
gilLock.unlock();
pythonEngine.close();
pythonEngine = null;
}
}
} 333.32/ms python calls with 271.22/ms ticks (about 3x more ticks per second due to less python calls per tick) |
Here a benchmark for keeping GIL locked without the fastcallable function: public class PythonStrategy extends StrategySupport {
private final String instrumentId;
private IScriptTaskEngine pythonEngine;
private ITickCache tickCache;
private int countPythonCalls = 0;
private Instant start;
private Instant lastLog;
private ILock gilLock;
public PythonStrategy(final String instrumentId) {
this.instrumentId = instrumentId;
}
@Override
public void onInit() {
tickCache = getBroker().getInstrumentRegistry()
.getInstrumentOrThrow(instrumentId)
.getDataSource()
.getTickCache();
}
@Override
public void onStart() {
// pythonEngine = Py4jScriptTaskEnginePython.newInstance();
// pythonEngine = JythonScriptTaskEnginePython.newInstance();
// pythonEngine = JepScriptTaskEnginePython.newInstance();
pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();
gilLock = pythonEngine.getSharedLock();
gilLock.lock();
start = new Instant();
lastLog = new Instant();
}
@Override
public void onTickTime() {
final ATick lastTick = tickCache.getLastTick(null);
pythonEngine.getInputs().putDouble("ask", lastTick.getAskAbsolute());
countPythonCalls++;
pythonEngine.getInputs().putDouble("bid", lastTick.getBidAbsolute());
countPythonCalls++;
pythonEngine.eval("spread = abs(ask-bid)");
countPythonCalls++;
final double pythonSpread = pythonEngine.getResults().getDouble("spread");
countPythonCalls++;
Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
//CHECKSTYLE:OFF
System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
//CHECKSTYLE:ON
lastLog = new Instant();
}
}
@Override
public void onStop() {
if (pythonEngine != null) {
gilLock.unlock();
pythonEngine.close();
pythonEngine = null;
}
}
} 614.23/ms python calls with 139.61/ms ticks |
This is 2-3 times faster than Jep. So really good job there! You definitely improved the available python integration landscape for the JVM. This is a win for the java community and a great achievement for you personally. :) The only reason to use Jep instead of libpython-clj is now when one wants to do multithreading with sub-interpreters. That is the only situation where Jep can be faster right now. I guess each interpreter should have its separate GIL. Though according to this, multiple interpreters seem to share the same GIL: https://github.com/ninia/jep/wiki/Jep-and-the-GIL Also I am looking into integrating clojure via https://github.com/ato/clojure-jsr223 to make libraries like scicloj or tech.ml.dataset available. With that there might be no need to write a java binding for these libraries. Apart from allowing usage by people that don't want to write clojure. Though dunno how the clojure ScriptEngine performs. I guess if it reuses the compiled scripts it should work ok. |
Just found another alternative to integrate python via JavaCPP (https://github.com/bytedeco/javacpp-presets/tree/master/cpython). They also have a sort-of integration for scipy: https://github.com/bytedeco/javacpp-presets/tree/master/scipy Though don't know how useable that is. Seems very low level. |
tech.ml.dataset is a pandas-equivalent - that doesn't exist on the JVM. It has extremely fast grouping aggregations and joins. The issue with making a java api for it is that its api is very broad - as is pandas or dplyr or such. You would end up with I think hundreds of java functions and I think without someone definitively saying they would attempt it and make the javadoc nice and everything it would just be a ton of work when you can just use things from Clojure and be done with it. scicloj and the ml subsystem I think are a lot more generally useful but you need tmd for those... In your space scicloj allows people to efficiently use xgboost which is a damn fast and pretty good general model for lots of problems. In addition you can use any mode from scipy or smile (before GPL - I didn't know that!) but that is in general setting up a new ML community with docs and everything and we tried it and it is a ton of work and again, what is the value-add of going to Java when the system is designed for Clojure anyway? You are talking like years of work or at least a year when you could just learn Clojure, do what you need to and be done with it. My concern with javacpp is and always has been hardcoding it to a specific python version or environment. Perhaps it finds Python but I don't think so and the system we built over a lot of time and tears to find python given various python environments such as pyenv and conda means a lot of things just work that absolutely drive you crazy with other python integrations. Thanks for you patience with this! I love these types of benchmarks but it always takes a while to figure them out and put the best foot forward! This is truly a fantastic issue and now libpython-clj has a solid Java API and we know it is well thought out and efficient. That is a big step forward. The Java API has features in it, btw, that the public Clojure one doesn't specifically the fast execution of scripts and manual GIL control so that is interesting. |
Smile is still LGPL, I checked again and the files in smile-core for example still contain the LGPL header. I think they only put specific parts under GPL (something called SIMLE Shell I think, but did not check further). The main license file in the repo is now just a bit confusing because it gives the impression that it is GPL. The website or documentation does not tell what the license situation is. And yes, I agree that using clojure integration is easier. I don't know if the API will even translate well. |
As far as a step forward for the JVM, I think also the julia integration is key. The JVM just doesn't create great vectorized code and that is something the Julia compiler does very well. Julia really is complementary to the JVM while Python is interesting solely due to the libraries; anything done in Python could be done in the JVM while for Julia that isn't true. For some really out there stuff check out kmeans-mnist. The implementation is a really tight integration passing the data via zero-copy between Julia and the JVM and using each where each has the strongest advantage sort of weaving between the two of them. I also tried out TVM for a while but they just aren't general enough to really take the JVM forward. Julia is, however. For a fantastic paper on optimizing literally any computational problem I think this paper really hits the nail on the head specifically pages 34-43. |
Seems to be again wrong, Version 2.6.0 is LGPL: The current snapshot version is GPL: |
Yep, Haifeng just wants to get paid for his work and hasn't figured out how to do that. Hopefully he finds success via dual licensing but I think that is pretty tough. |
In fact, if I could get a group of people together I would love to get a JVM version of Julia working using JDK-17's vector intrinsics. I think it is possible to now to equal the speed of other systems especially if we can get the LLVM vectorized-bytecode and convert it to JDK vector intrinsics. So you have the same julia compiler frontend and just have it output optimized JVM bytecode. |
oh its all LGPL - that is huge. Thanks for finding that out that means upgrading is OK :-). |
Why is upgrading ok when they switch from LGPL to GPL? Upgrading to 2.6.0 is fine, beyond that not anymore. |
Sorry, I misread it. You are right, GPL is a bad deal for most businesses who are trying to make money from software. Yep, thanks, I think we are at 2.6.0. |
Regarding something like Renjin for Julia might be an interesting idea. Though Renjin has demand because R is horribly slow. Julia does not seem to be so slow. Dunno if people also use JRuby because of the speed improvements. Though being able to mix julia code with java directly could be beneficial. |
I think it could be interesting. We can now, just through libjulia-clj. But it has some drawbacks in that Julia does things at times with the callstack so callbacks back into the JVM don't always work regardless of the integration. Julia's compiler is about 100 times more involved than R's so I think trying to rebuild their compiler is a minefield but LLVM used to have a JVM bytecode generation pathway and it could always be rebuilt. With JDK-17 vector intrinsics or even without some of those optimizations are pretty powerful. |
I guess you mean that bytecode generator: https://github.com/davidar/lljvm How does TVM compare to tornadoVM, aparapi or rootbeer1? Is it only for matrix functions or can one also e.g. write a backtesting engine on the GPU with that? From what I looked at these things with automagic generation from java bytecode to opencl/cuda one always has some quirkyness or incomprehensible problems. Since now I though that a better approach might be to code a opencl/cuda program directly and integrate it on a higher level instead of trying to mix computation between CPU/GPU/ on a function level (similar to the other language integrations that I am doing). Regarding the machine learning backtests: currently I am already quite fast on the CPU with the generated strategies (which can also be easily scaled via cluster/cloud computing). I have to further work on robustness techniques to combat curve fitting for now. But at some point it will be interesting to also use the engine on tick data or larger portfolios. That is something where I think the performance could be improved by at least 10x if one goes from CPU to GPU in an intelligent way. Regarding monetization of open source products (as e.g. haifeng might be trying now):
I will read that paper you suggested later, thanks for the link! I am always interested to read interesting things. |
With TVM you literally program an AST and then ask TVM to compile that AST various different ways. So for something where you are controlling the code generation (generated strategy) it may work but it isn't a general programming language so it doesn't support everything you can think of. It is a very specialized programming language that keeps you within the bounds of something that can be translated to a GPU. Those other methods (impressively!) look like they take byte code and auto-gen the GPU bindings. With TVM you have to program the AST yourself. This requires you to know the TVM micro-language and stick to it both of which are quite tough IMO. For TVM you have three steps.
Definitely nothing automatic about it. And when I tested CPU pathways against Julia I found I was able to get more performance for at least mnist via Julia. I was, however, able to create a simple image resize algorithm that performed better than openCV across both CPU and GPU (both CUDA and OpenCV with CUDA being notably faster) but this took a fascinating but nearly heroic effort. TVM is specialized heavily towards optimizing neural network processing graphs so if your problem looks like a convolutional neural network then it will be better than anything else. Again my honest opinion based on my various explorations is that for general purpose processing Julia beats everything else. |
I haven't found a better llvm wrapper and the one you point to I believe is pretty far out of date. |
@subes - Found a tensorflow quant finance library -https://github.com/google/tf-quant-finance Seems relevant to your interests :-) |
Thanks, looks interesting. Nowadays there is also a tensorflow binding for java: https://www.tensorflow.org/jvm/install Since I already have a 3 suitable generators, I am currently focusing on other parts of the process. Currently I am working on machine learning or solver based portfolio algorithms. Among others, I have implemented these over the last few weeks and I am currently testing them in my automated processes:
I have a pitch for you: |
The problem with the java tensorflow is it doesn't include the python layers which have a ton of functionality. For instance that library, tf-finance, relies on tensorflow-probability which is a nontrivial python layer on top of tensorflow. So to get meaningful support of tensorflow on the JVM you need to run the python layers. This is also true for mxnet which I like a lot better architecturally -- their add-on systems for language processing, for example, are very nontrivial and completely done in python. pytorch is another example although for whatever reason pytorch doesn't work well with libpython-clj. I would love a screenshare to see the app in its full glory. Do you have an email address you like to use for these things? |
[email protected] or [email protected] is fine. I guess we could do the screen share in zoom. I am also on skype (gsubes). Just pick a time slot and a medium that suits you well. |
As discussed here: cnuernber/libjulia-clj#3
Please generate some Java classes for libpython-clj so I can integrate it here: https://github.com/invesdwin/invesdwin-context-python/tree/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj
Thanks a lot!
The text was updated successfully, but these errors were encountered: