Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JNA Binding available to Java clients #191

Closed
subes opened this issue Dec 27, 2021 · 100 comments
Closed

Make JNA Binding available to Java clients #191

subes opened this issue Dec 27, 2021 · 100 comments

Comments

@subes
Copy link

subes commented Dec 27, 2021

As discussed here: cnuernber/libjulia-clj#3
Please generate some Java classes for libpython-clj so I can integrate it here: https://github.com/invesdwin/invesdwin-context-python/tree/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj

Thanks a lot!

@cnuernber
Copy link
Collaborator

This is great :-). Is this going to be used in the same classloader as the libjulia-clj java bindings?

The reason I ask is because if so, then I need to release a dtype-next aot version and make the libjulia and this python version dependent on that so that we don't duplicate class files all over the place - the libjulia aot version I made had the required dtype-next class files generated inline.

For the first test I can just regenerate the dtype-next classes but after we confirm this is working I will need to release aot versions of dependent libraries if this is going to be used in the same classloader as the libjulia-clj java bindings.

@cnuernber
Copy link
Collaborator

cnuernber commented Dec 27, 2021

Initial jar - https://clojars.org/clj-python/libpython-clj/versions/2.004-aot
API docs - https://clj-python.github.io/libpython-clj/libpython-clj2.java-api.html

The initialize function calls the python executable from command line and gets it to output the setup information. Python has a bit more involved setup than just a shared library path as it also needs to know some level of module root path. Let's see if this works and we can tweak.

Small unit test showing functionality - https://github.com/clj-python/libpython-clj/blob/master/test/libpython_clj2/java_api_test.clj

@subes
Copy link
Author

subes commented Dec 27, 2021

Thanks a lot, I will test it as soon as possible.

  • Yes both Julia and Python should be able to be used in the same JVM process.
  • Is there an option so that I define a different path to the python executable to be used? I would like to make this configurable so it does not need to be in the PATH. Though the default value would be "python3" or something like that from the PATH. => found the settings for that already: https://clj-python.github.io/libpython-clj/libpython-clj2.python.html#var-initialize.21

@subes
Copy link
Author

subes commented Dec 28, 2021

I made some progress here: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/PythonEngine.java

Though I can't figure out how to get/set globals. I get this exception:

2021-12-28 14:59:13.994 [ |7-7:InputsAndResult] ERROR de.invesdwin.ERROR.process                                   - processing #00000007
de.invesdwin.context.log.error.LoggedRuntimeException: #00000007 clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        ... 12 omitted, see following cause or error.log
Caused by - clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        at clojure.lang.AFn.throwArity(AFn.java:429)
        at clojure.lang.AFn.invoke(AFn.java:40)
        at libpython_clj2.java_api$_setItem.invokeStatic(java_api.clj:106)
        at libpython_clj2.java_api$_setItem.invoke(java_api.clj:104)
        at libpython_clj2.java_api.setItem(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.PythonEngine.set(PythonEngine.java:48) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:42) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        ... 8 more, see error.log

getAttr/setAttr also does not work:

2021-12-28 15:03:03.024 [ |7-2:InputsAndResult] ERROR de.invesdwin.ERROR.process                                   - processing #00000002
de.invesdwin.context.log.error.LoggedRuntimeException: #00000002 java.lang.Exception: AttributeError: 'dict' object has no attribute 'hello'

        ... 12 omitted, see following cause or error.log
Caused by - java.lang.Exception: AttributeError: 'dict' object has no attribute 'hello'

        at libpython_clj2.python.ffi$check_error_throw.invokeStatic(ffi.clj:687)
        at libpython_clj2.python.ffi$check_error_throw.invoke(ffi.clj:685)
        at libpython_clj2.python.base$fn__10297.invokeStatic(base.clj:144)
        at libpython_clj2.python.base$fn__10297.invoke(base.clj:114)
        at libpython_clj2.python.protocols$fn__10100$G__10095__10109.invoke(protocols.clj:57)
        at libpython_clj2.python.bridge_as_jvm$generic_python_as_map$reify__10853$fn__10856.invoke(bridge_as_jvm.clj:281)
        at libpython_clj2.python.bridge_as_jvm$generic_python_as_map$reify__10853.set_attr_BANG_(bridge_as_jvm.clj:281)
        at libpython_clj2.python$set_attr_BANG_$fn__11372.invoke(python.clj:199)
        at libpython_clj2.python$set_attr_BANG_.invokeStatic(python.clj:199)
        at libpython_clj2.python$set_attr_BANG_.invoke(python.clj:196)
        at libpython_clj2.java_api$_setAttr.invokeStatic(java_api.clj:85)
        at libpython_clj2.java_api$_setAttr.invoke(java_api.clj:82)
        at libpython_clj2.java_api.setAttr(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.PythonEngine.set(PythonEngine.java:48) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:42) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        ... 8 more, see error.log

I also tried just using "globals" as a string instead of using the returned value from the persistent map from runString, but that fails also:

2021-12-28 15:04:17.408 [ |7-10:InputsAndResul] ERROR de.invesdwin.ERROR.process                                   - processing #00000010
de.invesdwin.context.log.error.LoggedRuntimeException: #00000010 clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        ... 12 omitted, see following cause or error.log
Caused by - clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        at clojure.lang.AFn.throwArity(AFn.java:429)
        at clojure.lang.AFn.invoke(AFn.java:40)
        at libpython_clj2.java_api$_setItem.invokeStatic(java_api.clj:106)
        at libpython_clj2.java_api$_setItem.invoke(java_api.clj:104)
        at libpython_clj2.java_api.setItem(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.PythonEngine.set(PythonEngine.java:48) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:42) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        ... 8 more, see error.log

@subes
Copy link
Author

subes commented Dec 28, 2021

Ah, this is a lot simpler than expected, one just has to put/get from/into the PersistentMap.

@cnuernber
Copy link
Collaborator

True, but there isn't a way to get the globals map in a stand-alone fashion which I can see would be useful. And you did find an error in the api :-).

I tried to expose the python objects as their java equivalents so a python dict will be returned as an implementation of java.util.Map and tuples/lists implement java.util.List and java.util.RandomAccess, etc. There is a somewhat complex pathway in there in Clojure for if you want to copy a python value completely into the JVM such as copying a JSON object or if you want to bridge/proxy it like what people want to do with modules.

@cnuernber
Copy link
Collaborator

Also I want to expose a withGIL function so you can capture the GIL once and do a set of things. This is similar to inContext with the exception that it doesn't attempt to release all objects allocated within the scope.

@subes
Copy link
Author

subes commented Dec 28, 2021

This is how I get the globals right now:

    @SuppressWarnings("unchecked")
    private Map<Object, Object> getGlobals() {
        final Map<?, ?> mainModule = libpython_clj2.java_api.runString("");
        return (Map<Object, Object>) mainModule.get("globals");
    }

@cnuernber
Copy link
Collaborator

Yep, that will work.

@subes
Copy link
Author

subes commented Dec 28, 2021

And yes, withGIL would be great. Though a lock/unlock function would be better so I can do:

gilLock.lock();
try{
...
} finally{
    gilLock.unlock();
}

@cnuernber
Copy link
Collaborator

OK, makes sense. Then you can make your own withGIL if you need it...

@subes
Copy link
Author

subes commented Dec 28, 2021

I would just wrap it in an implementation of ILock (which my client code already uses).

@cnuernber
Copy link
Collaborator

cnuernber commented Dec 28, 2021

New API is up - it has two new functions for GIL management (as well as a fixed setItem call) - https://clojars.org/clj-python/libpython-clj/versions/2.004-aot-1.

Note that the python ensureGIL call returns an integer that must be passed into releaseGIL so ensureGIL is 'reentrant' but you would have to keep track of that integer in your ILock impl.

New api fns are int lockGIL() and void unlockGIL(int).

@subes
Copy link
Author

subes commented Dec 28, 2021

Why doesn't this:

final Map<String, Object> initParams = new HashMap<>();
initParams.put("python-executable", "someWrongCommand");
libpython_clj2.java_api.initialize(initParams);

throw an exception? I have no PYTHON_HOME env variable set.
Also I guess python2 and pypy are not supported?

@cnuernber
Copy link
Collaborator

cnuernber commented Dec 28, 2021

Good question. It should throw. Will check in a moment.

For sure python2 isn't supported and probably not pypy as I am sitting directly on the shared library. Also not all python distributions even come with the shared library, a lot of them just compile the symbols into the python executable which is why we have an embedded pathway so you can boot up your system from python itself.

On top of that there are like 5 package managers for python and various ones require various tweaks - We have collected what we know into the environments document.

@cnuernber
Copy link
Collaborator

cnuernber commented Dec 28, 2021

New release is up (as is my son so I have to run for a bit :-) ) - https://clojars.org/clj-python/libpython-clj/versions/2.005

This release does runtime AOT so you should see the same 4 seconds as the libjulia version. One pathway we have optimized fairly thoroughly is taking a large nested JSON object and converting it to a JVM datastructure. Aside from that I will be curious as to how it libpython-clj stacks up.

Pantera was built against a much older version of libpython-clj2. I can reach out to Alan and see if he is interesting in taking it further but there is not use case where it is faster than tech.ml.dataset to my knowledge and many where it is slower although especially for your users there are so many libraries available for pandas especially for quant stuff that may make the difference.

Here is a quick test of the system from the java api perspective.

@subes
Copy link
Author

subes commented Dec 28, 2021

The new release is missing the java class libpython_clj2.java_api:
grafik

@cnuernber
Copy link
Collaborator

Sorry - I made a mistake in the jar definition! I checked and the new release has the class files - https://clojars.org/clj-python/libpython-clj.

@subes
Copy link
Author

subes commented Dec 28, 2021

2.006 works, thanks!

Would it be possible to sandbox interpreters in libpython-clj like it is possible in Jep (http://ninia.github.io/jep/javadoc/3.9/jep/SubInterpreter.html)?
The goal would be to be able to pool interpreter instances. In Jep one has to additionally bind them to a thread or else one gets issues with GIL. Though the idea would be to have separate globals state per interpreter and do multithreading from java. Python could itself decide which thread needs access to the GIL. Native python modules like numpy do most of their work without having a lock on GIL, so letting the python code decide when to acquire the GIL in a finer granularity could be faster for multi-threading with multiple interpreters.

Currently when I disable exclusive locking with libpython-clj the state between threads gets mixed which makes the testcases red (since I guess all use the same interpreter and share globals). So threads have to use the python binding one after the other. The design doc talks about multiple interpreters, but I don't see an API to use that: https://clj-python.github.io/libpython-clj/design.html

Would also be interesting if a sandboxing like this could be possible with libjulia-clj.

@subes
Copy link
Author

subes commented Dec 28, 2021

Regarding panthera, scicloj.ml or a future binding for keras:

  • worst case I can keep using pandas/sklearn/keras in python code that is called via scripts from java
  • though having a high level API in java for those would be a major benefit

Doesn't libpython-clj support generating clojure bindings for python APIs? Maybe this generator could be extended to also generate bindings for java? Or are those generated bindings not high level enough and that is the reason why someone needs to package them as panthera or scicloj.ml with some manually coded sugar? The question also goes into the direction: should a java binding be made for the panthera/scicloj.ml like you did for libjulia-clj and libpython-clj with a generated static wrapper/facade, or can this be done at a different layer?

@cnuernber
Copy link
Collaborator

These are great thoughts, glad we have the start of something going.

SubInterpreters

Your explanation makes sense to me. I have stayed away from this pathway as without anyone asking for it as it causes some level of operational risk. If there are multiple interpreters and someone does Object np = java_api.importModule( "numpy" ); then there is a question as to which interpreter they are talking to and it would be easy for someone to then call a method with data from the wrong interpreter. I agree with your reasoning and it matches my understanding -- my prediction is that it takes a somewhat more advanced user to be able to get any concrete gains with that method.

That design document was written very early and the very first version of libpython-clj contained some basic multi-interpreter support but no one used it and the way they were using libpython-clj meant multiple interpreters were going to cause issues. I should correct the document.

Honestly I would love to add that as it is the kind of thing I like to do. It would take some thought and careful engineering but probably not that many lines of code.

Java Wrappers for Python Libraries

libpython-clj contains a pretty good system for generating meaningful metadata from python libraries. Upon this system we built a runtime code generation facility - require-python and the static code generation facility you mentioned.

It wouldn't take too much effort to make a java library from the same metadata and it would have good javadoc comments but it would be primarily typeless so a bit of an odd java interface. In terms of panthera it was written before the static code generation facility - the code generation now is good enough IMO that it isn't a requirement. I am not sure how member variables of python instances would translate so my guess is that for a high quality library you would still need to wrap various class types to make the member functions clear to intellisense if that is important.

Large Features Missing From Current Java Bindings

  • with - Python has a notion of with that I didn't expose but it does exist in the clojure version and could be exposed.
  • classes speaking of classes it is involved but you can create python objects derived from op. In fact if you pass in something derived from the standard java classes it is exposed to python as an object derived from the appropriate abc interface.

Many of the things mentioned above will need to be carefully thought through in terms their interaction with multiple interpreters.

@subes
Copy link
Author

subes commented Dec 28, 2021

I guess such large refactorings/redesigns are not too high priority. So if you want to do them, I will test/incorporate them. Though if someone wants that functionality, he could just use jep instead of libjulia-clj. Jep also solves the modules thing via a feature called "shared modules", though also with some warnings.

I was now able to do some benchmarks: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#results

It seems the performance is rather bad because of some overhead in clojure, though more significantly some inefficient native string parsing:
image

Dunno what that code exactly does, but I guess a map to lookup functions (since I guess this is what the code does) by hash could improve the performance a lot. Or maybe it is the overhead of always returning the Map<Object, Object> for the current global/local dicts. In that case maybe a second method void exec(string) could improve the speed a lot here.

@cnuernber
Copy link
Collaborator

cnuernber commented Dec 28, 2021

Hmm. So one thing is we are running a script to get the global dict and not caching it which is odd.

But in general I wouldn't call into python that way. It would look more like:

   pyEngine.eval("def calcSpread(bid,ask):\n\treturn ask-bid\n\n")
   clojure.lang.IFn calcSpread = (clojure.lang.IFn)(pyEngine.getGlobal("calcSpread")
   loop:
      result = (Double)calcSpread.invoke(bid, ask)
   end-loop:

That wouldn't parse anything at all once things got going.

@subes
Copy link
Author

subes commented Dec 28, 2021

This is just a simple example to see which library causes the most overhead. Note the text below the results: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#solution

There are definitely better ways to write production code for this. ^^

@cnuernber
Copy link
Collaborator

Right but my point is that performance results aren't indicative as to what will actually happen because the calling convention.

@cnuernber
Copy link
Collaborator

In your profile there are things I can fix. I am not certain that anything should be calling find-pylib-fn or the sequence operator or a few things from what I see. Caching the global map would probably make a large difference and furthermore caching the conversion of the map keys to python objects and using them directly in the get would also probably be quicker. Ideally you only parse the string once and return some level of parsed thing which is something I hadn't considered before.

But all of that will still be quite a bit slower than just creating a function and calling it directly.

@subes
Copy link
Author

subes commented Dec 28, 2021

Here a benchmark of the function convention:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private IFn calcSpread;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();

        pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
        final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
        calcSpread = (clojure.lang.IFn) unwrap.get("calcSpread");

        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        final double pythonSpread = Doubles
                .checkedCast(calcSpread.invoke(lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            pythonEngine.close();
            pythonEngine = null;
        }
    }

}

=> 182.01/ms Python calls with 104.07/ms ticks (due to libpython-clj startup being included in the ticks measure, otherwise it should be 1-1)

image

@subes
Copy link
Author

subes commented Dec 28, 2021

And here another version that keeps the GIL-lock all the time:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private IFn calcSpread;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();

        GilLock.INSTANCE.lock();
        pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
        final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
        calcSpread = (clojure.lang.IFn) unwrap.get("calcSpread");

        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        final double pythonSpread = Doubles
                .checkedCast(calcSpread.invoke(lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            pythonEngine.close();
            pythonEngine = null;
        }
        GilLock.INSTANCE.unlock();
    }

}

204.62/ms Python calls with 134.98/ms ticks

image

@cnuernber
Copy link
Collaborator

The thing is a byte[][] isn't a matrix type - it could be ragged. A byte[] combined with a shape such as an int array is a matrix type.

@subes
Copy link
Author

subes commented Jan 2, 2022

Ok, I should look into using createArray anyhow. I also wanted to use that in the libjulia-clj integration. But did not go for that optimization yet.

@cnuernber
Copy link
Collaborator

I am setting up things to auto-call that in setGlobal for primtive arrays and primitive arrays-of-arrays in the case where the array-of-array has a constant inner length.

There is also a copyData call now so if you have an allocated numpy array you can quick copy an appropriately typed flat array of data into it. That copy pathway will not work with array-of-arrays.

New release is up - 2.012. This contains copyData and an updated setGlobal pathway that will convert simple arrays automagically into numpy arrays via the createArray pathway.

@subes
Copy link
Author

subes commented Jan 2, 2022

I tried it a bit, but I found no good way to make numpy arrays as the default work across engines.

  • Jython: no numpy supported and only python 2.x; transmits Lists per default
  • Jep: transmits Lists per default
  • Py4J: supports python 2.x, pypy and python 3.x; tansmits Lists per default
  • libjulia-clj: python 3.x; transmits numpy arrays per default

I think it would be better to also make libjuliy-clj transmit lists per default. And maybe have some way to opt in to automatic numpy transmission (could be a system property like "libpython_clj.manual_gil" or a setGlobalNumpy(...) or setGlobal(key, value, numpy=true) overload. I think it would be best to use the least common denominator here (even if it might be an interesting optimization) because it changes the semantics in unexpected ways when trying to reuse scripts. One could also imagine restricted environments or offline installations that don't have numpy support or can not install numpy.

Or just let users explictly use createArray for the numpy fastpath. Though it should be opt-in to use numpy.

Also here a benchmark that shows that appending data is faster with python lists:
https://towardsdatascience.com/python-lists-are-sometimes-much-faster-than-numpy-heres-a-proof-4b3dad4653ad
I would also guess that python lists could be faster than numpy for small sized lists/arrays (since native calls are not needed, dunno if it is similar expensive to JNI calls)?

Here another benchmark with random numbers:
https://towardsdatascience.com/is-numpy-really-faster-than-python-aaa9f8afb5d7
It seems the breakeven is at about 20 elements for this specific case.
grafik

@cnuernber
Copy link
Collaborator

I could see lists working better for small things for sure - very small. I like the opt-in approach honestly the most as it is the simplest from my end and the opt-in pattern works in general. I think for like 99% of the use cases that are meaningful it will be the slowest one, especially if you are running a model or something as you have to go to numpy anyway but as you said the users can solve that.

The deeper integration and especially zero-copy are distinct advantages of using libpython-clj or libjulia-clj but they are specializations. And the fastest pathway would involve preallocating a set of numpy arrays and copying into them repeatedly. Not just recreating them via setting globals.

I am fine backing off and going back to lists for setGlobal - that is at least standardized in terms of it will behave the same with function calls.

@cnuernber
Copy link
Collaborator

cnuernber commented Jan 2, 2022

Release 2.013 is up that disables the auto-numpy pathway of setGlobal.

@subes
Copy link
Author

subes commented Jan 2, 2022

setGlobal now works great. Though runStringAsInput returns a Pointer {:address 0x00007F35520C97C0 } instead of a byte[] or byte[][] now. Though the workaround as demonstrated below works:

@Override
   public Object get(final String variable) {
       IScriptTaskRunnerPython.LOG.debug("get %s", variable);
       gilLock.lock();
       try {
           //does not work due to pointer being returned
           //            return libpython_clj2.java_api.runStringAsInput(variable);
           //workaround works
           libpython_clj2.java_api.runStringAsFile("__ans__ = " + variable);
           return libpython_clj2.java_api.getGlobal("__ans__");
       } finally {
           gilLock.unlock();
       }
   }

Would be great if we can get the conversions of getGlobal also into runStringAsInput. So I don't need the extra call to have the testcases green.

@cnuernber
Copy link
Collaborator

cnuernber commented Jan 2, 2022

Sure, also to have a consistent API. You are running a bit into something that libpython-clj supports in that it by default allows both proxying python objects to java in addition to copying them. runStringAsFile wasn't wrapping the return value correctly.

What getGlobal and runStringAsFile do is they return proxied objects. There is another api fn, copyToJVM, that will always ensure the data is the in the JVM in the correct format. This is optimized for the case of returning lists or returning things such as nested json objects. In any case both will return an implementation of java.util.List whether it is proxied or not.

2.014 is up and contains a fix for runStringAsInput.

@subes
Copy link
Author

subes commented Jan 3, 2022

This works great now. I updated the benchmarks: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#results

And here the updated benchmark for the fastcallable function optimization:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private AutoCloseable calcSpread;
    private ILock gilLock;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();

        pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
        gilLock = pythonEngine.getSharedLock();
        gilLock.lock();
        final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
        final IFn calcSpreadFunction = (IFn) unwrap.get("calcSpread");
        calcSpread = libpython_clj2.java_api.makeFastcallable(calcSpreadFunction);

        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        final double pythonSpread = Doubles.checkedCast(
                libpython_clj2.java_api.call(calcSpread, lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            try {
                calcSpread.close();
            } catch (final Exception e) {
                throw new RuntimeException(e);
            }
            gilLock.unlock();
            pythonEngine.close();
            pythonEngine = null;
        }
    }

}

333.32/ms python calls with 271.22/ms ticks (about 3x more ticks per second due to less python calls per tick)

@subes
Copy link
Author

subes commented Jan 3, 2022

Here a benchmark for keeping GIL locked without the fastcallable function:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private ILock gilLock;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();
        gilLock = pythonEngine.getSharedLock();
        gilLock.lock();
        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        pythonEngine.getInputs().putDouble("ask", lastTick.getAskAbsolute());
        countPythonCalls++;
        pythonEngine.getInputs().putDouble("bid", lastTick.getBidAbsolute());
        countPythonCalls++;
        pythonEngine.eval("spread = abs(ask-bid)");
        countPythonCalls++;
        final double pythonSpread = pythonEngine.getResults().getDouble("spread");
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            gilLock.unlock();
            pythonEngine.close();
            pythonEngine = null;
        }
    }

}

614.23/ms python calls with 139.61/ms ticks

@subes
Copy link
Author

subes commented Jan 3, 2022

This is 2-3 times faster than Jep. So really good job there! You definitely improved the available python integration landscape for the JVM. This is a win for the java community and a great achievement for you personally. :)

The only reason to use Jep instead of libpython-clj is now when one wants to do multithreading with sub-interpreters. That is the only situation where Jep can be faster right now. I guess each interpreter should have its separate GIL. Though according to this, multiple interpreters seem to share the same GIL: https://github.com/ninia/jep/wiki/Jep-and-the-GIL

Also I am looking into integrating clojure via https://github.com/ato/clojure-jsr223 to make libraries like scicloj or tech.ml.dataset available. With that there might be no need to write a java binding for these libraries. Apart from allowing usage by people that don't want to write clojure. Though dunno how the clojure ScriptEngine performs. I guess if it reuses the compiled scripts it should work ok.

@subes
Copy link
Author

subes commented Jan 3, 2022

Just found another alternative to integrate python via JavaCPP (https://github.com/bytedeco/javacpp-presets/tree/master/cpython). They also have a sort-of integration for scipy: https://github.com/bytedeco/javacpp-presets/tree/master/scipy

Though don't know how useable that is. Seems very low level.

@cnuernber
Copy link
Collaborator

tech.ml.dataset is a pandas-equivalent - that doesn't exist on the JVM. It has extremely fast grouping aggregations and joins. The issue with making a java api for it is that its api is very broad - as is pandas or dplyr or such. You would end up with I think hundreds of java functions and I think without someone definitively saying they would attempt it and make the javadoc nice and everything it would just be a ton of work when you can just use things from Clojure and be done with it.

scicloj and the ml subsystem I think are a lot more generally useful but you need tmd for those... In your space scicloj allows people to efficiently use xgboost which is a damn fast and pretty good general model for lots of problems. In addition you can use any mode from scipy or smile (before GPL - I didn't know that!) but that is in general setting up a new ML community with docs and everything and we tried it and it is a ton of work and again, what is the value-add of going to Java when the system is designed for Clojure anyway? You are talking like years of work or at least a year when you could just learn Clojure, do what you need to and be done with it.

My concern with javacpp is and always has been hardcoding it to a specific python version or environment. Perhaps it finds Python but I don't think so and the system we built over a lot of time and tears to find python given various python environments such as pyenv and conda means a lot of things just work that absolutely drive you crazy with other python integrations.

Thanks for you patience with this! I love these types of benchmarks but it always takes a while to figure them out and put the best foot forward! This is truly a fantastic issue and now libpython-clj has a solid Java API and we know it is well thought out and efficient. That is a big step forward. The Java API has features in it, btw, that the public Clojure one doesn't specifically the fast execution of scripts and manual GIL control so that is interesting.

@subes
Copy link
Author

subes commented Jan 3, 2022

Smile is still LGPL, I checked again and the files in smile-core for example still contain the LGPL header. I think they only put specific parts under GPL (something called SIMLE Shell I think, but did not check further). The main license file in the repo is now just a bit confusing because it gives the impression that it is GPL. The website or documentation does not tell what the license situation is.

And yes, I agree that using clojure integration is easier. I don't know if the API will even translate well.

@cnuernber
Copy link
Collaborator

As far as a step forward for the JVM, I think also the julia integration is key. The JVM just doesn't create great vectorized code and that is something the Julia compiler does very well. Julia really is complementary to the JVM while Python is interesting solely due to the libraries; anything done in Python could be done in the JVM while for Julia that isn't true. For some really out there stuff check out kmeans-mnist. The implementation is a really tight integration passing the data via zero-copy between Julia and the JVM and using each where each has the strongest advantage sort of weaving between the two of them.

I also tried out TVM for a while but they just aren't general enough to really take the JVM forward. Julia is, however.

For a fantastic paper on optimizing literally any computational problem I think this paper really hits the nail on the head specifically pages 34-43.

@subes
Copy link
Author

subes commented Jan 3, 2022

Seems to be again wrong, Version 2.6.0 is LGPL:
grafik

The current snapshot version is GPL:
https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/association/AssociationRule.java
grafik

@cnuernber
Copy link
Collaborator

Yep, Haifeng just wants to get paid for his work and hasn't figured out how to do that. Hopefully he finds success via dual licensing but I think that is pretty tough.

@cnuernber
Copy link
Collaborator

cnuernber commented Jan 3, 2022

In fact, if I could get a group of people together I would love to get a JVM version of Julia working using JDK-17's vector intrinsics. I think it is possible to now to equal the speed of other systems especially if we can get the LLVM vectorized-bytecode and convert it to JDK vector intrinsics. So you have the same julia compiler frontend and just have it output optimized JVM bytecode.

@cnuernber
Copy link
Collaborator

oh its all LGPL - that is huge. Thanks for finding that out that means upgrading is OK :-).

@subes
Copy link
Author

subes commented Jan 3, 2022

Why is upgrading ok when they switch from LGPL to GPL? Upgrading to 2.6.0 is fine, beyond that not anymore.

@cnuernber
Copy link
Collaborator

Sorry, I misread it. You are right, GPL is a bad deal for most businesses who are trying to make money from software. Yep, thanks, I think we are at 2.6.0.

@subes
Copy link
Author

subes commented Jan 3, 2022

Regarding something like Renjin for Julia might be an interesting idea. Though Renjin has demand because R is horribly slow. Julia does not seem to be so slow. Dunno if people also use JRuby because of the speed improvements.

Though being able to mix julia code with java directly could be beneficial.

@cnuernber
Copy link
Collaborator

cnuernber commented Jan 3, 2022

I think it could be interesting. We can now, just through libjulia-clj. But it has some drawbacks in that Julia does things at times with the callstack so callbacks back into the JVM don't always work regardless of the integration.

Julia's compiler is about 100 times more involved than R's so I think trying to rebuild their compiler is a minefield but LLVM used to have a JVM bytecode generation pathway and it could always be rebuilt. With JDK-17 vector intrinsics or even without some of those optimizations are pretty powerful.

@subes
Copy link
Author

subes commented Jan 3, 2022

I guess you mean that bytecode generator: https://github.com/davidar/lljvm
Or is there a more up to date one?

How does TVM compare to tornadoVM, aparapi or rootbeer1? Is it only for matrix functions or can one also e.g. write a backtesting engine on the GPU with that? From what I looked at these things with automagic generation from java bytecode to opencl/cuda one always has some quirkyness or incomprehensible problems. Since now I though that a better approach might be to code a opencl/cuda program directly and integrate it on a higher level instead of trying to mix computation between CPU/GPU/ on a function level (similar to the other language integrations that I am doing). Regarding the machine learning backtests: currently I am already quite fast on the CPU with the generated strategies (which can also be easily scaled via cluster/cloud computing). I have to further work on robustness techniques to combat curve fitting for now. But at some point it will be interesting to also use the engine on tick data or larger portfolios. That is something where I think the performance could be improved by at least 10x if one goes from CPU to GPU in an intelligent way.

Regarding monetization of open source products (as e.g. haifeng might be trying now):

  • one has to do it primarily for the fun of it and to give back to the community
  • it makes it easier to market oneself as a consultant (like a portfolio of a designer)
  • but beyond that one has to treat it like a product or build offers on top of that. Ideally a client builds a core product on top of the project and decides to sponsor development or pays for support. But I think the real money only lies in building products on top of it to market separately, but for that you need more than one person. The tools beneath are rather worthless to try to monetize IMHO.

I will read that paper you suggested later, thanks for the link! I am always interested to read interesting things.

@cnuernber
Copy link
Collaborator

With TVM you literally program an AST and then ask TVM to compile that AST various different ways. So for something where you are controlling the code generation (generated strategy) it may work but it isn't a general programming language so it doesn't support everything you can think of. It is a very specialized programming language that keeps you within the bounds of something that can be translated to a GPU.

Those other methods (impressively!) look like they take byte code and auto-gen the GPU bindings. With TVM you have to program the AST yourself. This requires you to know the TVM micro-language and stick to it both of which are quite tough IMO.

For TVM you have three steps.

  1. Define algorithm using TVM AST.
  2. schedule algorithm which applies various transformations that do things like loop fusion and such.
  3. compile the algorithm to the hardware of your choice.

Definitely nothing automatic about it. And when I tested CPU pathways against Julia I found I was able to get more performance for at least mnist via Julia.

I was, however, able to create a simple image resize algorithm that performed better than openCV across both CPU and GPU (both CUDA and OpenCV with CUDA being notably faster) but this took a fascinating but nearly heroic effort.

TVM is specialized heavily towards optimizing neural network processing graphs so if your problem looks like a convolutional neural network then it will be better than anything else. Again my honest opinion based on my various explorations is that for general purpose processing Julia beats everything else.

@cnuernber
Copy link
Collaborator

I haven't found a better llvm wrapper and the one you point to I believe is pretty far out of date.

@cnuernber
Copy link
Collaborator

@subes - Found a tensorflow quant finance library -https://github.com/google/tf-quant-finance

Seems relevant to your interests :-)

@subes
Copy link
Author

subes commented Jan 21, 2022

Thanks, looks interesting. Nowadays there is also a tensorflow binding for java: https://www.tensorflow.org/jvm/install
I will put this into my tickets as a reminder. I wanted to integrate neural networks and deep learning into the process and add functions to train/use such algorithms in my expression language. The neural networks are used for price forecasting (I have more forecasting techniques in my backlog) or to generate artificial intelligence indicators. These can either be used as an alternative to the genetic programming I am currently doing or could be embedded into the genetic programming (to create advanced hybrid processes).

Since I already have a 3 suitable generators, I am currently focusing on other parts of the process. Currently I am working on machine learning or solver based portfolio algorithms. Among others, I have implemented these over the last few weeks and I am currently testing them in my automated processes:

  • Stochastic Frontier Analysis
  • Data Envelopment Analysis
  • Principal Component Analysis
  • Stochastic Dominance
  • ZScore
  • Kelly Criterion
  • Markowitz
  • Black Litterman
  • OptimalF (Raph Vince)
  • SecureF (Raph Vince)
  • SafeF (Howard Bandy)

I have a pitch for you:
The platform is supposed to become something like WEKA but for financial trading strategies and portfolio management. It is supposed to be free for researchers, though invitation/requests only at the moment. I am working together with my university to maybe acquire funding via grants in order to get some student assistant positions for this. And it might be useful as a teaching instrument in economics courses once the UI is finished. Though this is just the more open paths that this goes, I am also following some other more proprietary alpha seeking paths with this for institutional clients. If you think you could be interested in such things, we could do a screen sharing session sometime and if you like it I can give you access. ;)

@cnuernber
Copy link
Collaborator

The problem with the java tensorflow is it doesn't include the python layers which have a ton of functionality. For instance that library, tf-finance, relies on tensorflow-probability which is a nontrivial python layer on top of tensorflow. So to get meaningful support of tensorflow on the JVM you need to run the python layers. This is also true for mxnet which I like a lot better architecturally -- their add-on systems for language processing, for example, are very nontrivial and completely done in python. pytorch is another example although for whatever reason pytorch doesn't work well with libpython-clj.

I would love a screenshare to see the app in its full glory. Do you have an email address you like to use for these things?

@subes
Copy link
Author

subes commented Jan 22, 2022

[email protected] or [email protected] is fine. I guess we could do the screen share in zoom. I am also on skype (gsubes). Just pick a time slot and a medium that suits you well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants