This repository contains the code for demonstrating various machine learning models. It currently powers Robot-Reviewer, a system for automatically extracting Risk of Bias from Randomized Controlled Trial publications.
In essence it’s Vortext stripped down to essentials, and with the ability to run predictions in various programming languages. At some point the functionality between the two repositories will hopefully converge. Currently, we are looking for seed-investors to develop this further, if you’re interested drop us a line at vortext.systems.
See the Spá repository for an overview of used technology.
Unlike the regular Vortext repository, this set-up is a bit more involved. To run the predictions we make use of a polyglot approach. The reason for this is fairly simple: to make the predictions congruent with the client-side PDF.js we had to run PDF.js on the server-side (via NodeJS), but Javascript doesn’t have nice machine learning libraries like Python or R does so we wanted to use those languages for the machine learning. However, Javascript, Python, and R are terrible languages for scalable full-stack development, so we picked Clojure (a Lisp) for that.
To make this work we opted for a custom Remote Procedure Call framework. Clojure is our glue and runs all the web-facing stuff. But, Clojure will call the NodeJS and Python (or R in the future) processes over ZeroMQ. The dependent processes can run as separate processes, but in practice we them start as children of the Clojure app. To call the different processes (services, really) we use the Majordomo Pattern.
It is recommended to read-up on the ZeroMQ documentation (which doubles as an excellent intro into distributed systems) when developing this part of the software. We chose ZeroMQ for its wide range of supported languages, light weight and proven effectiveness. However, ZeroMQ has it’s own hangups and we’re actively considering alternatives.
But, how do you tie all this together without it becoming a complete mess? Well, admittedly it is a bit of a mess now, but for different reasons. Anyway… the way we tie it together is by using Directed Acyclic Graphs (DAG) as an abstraction over the different processing steps. We stole this idea from Prismatic Graph and Apache Storm. So, if you want to develop your own predictors it is vital to read-up on ”Graph: Abstractions for Structured Computation”. We call these graphs topologies (a term borrowed from Storm).
A topology defines the flow from source
to sink
.
The source
is the incoming HTTP POST request.
The sink
is the data to be returned.
To define this flow you can make a topology like this:
(def topology
{:source (fnk [body] (slurp body))
:incremented (fnk [source] (py "example.add_one" source))
:doubled (fnk [incremented] (js "example/multiply.js" incremented))
:sink (fnk [doubled] (str "result:" (String. doubled)))
})
This will take the body, increment the number in Python, and double it in Javascript. The Python file looks like this
import sys
sys.path.append('../../multilang/python')
from abstract_handler import AbstractHandler
class Handler(AbstractHandler):
title = "Add one"
def __init__(self):
# Setup here
print "Hello, I'm adding one"
def handle(self, input):
return str(int(input) + 1)
And similarly the Javascript
// You can do all sorts of set-up here
console.log("Hi!, I'll multiply the input");
function handler(input) {
return (parseInt(input) * 2) + ""; // Must return a string
}
module.exports = handler;
The handle function gets the raw input body, and returns the new number as a string. We make no assumptions about the serialization of the input and the output, we currently use JSON ourselves, but have used ProtocolBuffers in the past.
Take a look in the multilang folder for implementation details. The example topology is found here.
The ebm
branch of this repository contains a more complete example, currently used to run Risk of Bias and PICO predictions.
To add new predictors it is recommended to branch off develop
and create your own topology folders.
The topologies themselves get called by name from the dispatcher on the client-side (e.g. here in the ebm branch).
But this is subject to change.
To develop the server we require leiningen which can be installed with Homebrew. We require at least Java JVM/JDK 1.8 and Leiningen 2.4.
brew update # make sure you have recent versions brew install leiningen # install via Homebrew
git clone <this repo> cd <your folder> lein deps # retrieve project dependencies git submodule update --init --recursive # Compile the PDF.js files cd resources/public/scripts/spa/pdfjs brew install node # install nodejs via Homebrew npm install node make singlefile generic
Furthermore, to make the RPC stuff work we require the following
# OSX brew install zeromq # Alternatively, from source wget http://download.zeromq.org/zeromq-4.0.5.tar.gz tar zxvf zeromq-4.0.5.tar.gz cd zeromq-4.0.5 && ./configure cd zeromq-4.0.5 && make && make install rm -rf zeromq-4.0.5 # NodeJS RPC stuff npm install q underscore zmq atob commander # Python RPC pip install pyzmq argparse
And, of course, any NodeJS or Python dependencies required by the topology (such as scikit, nltk, etc).
To start the system run lein run start --port 8080
, which will start the server on 8080.
The server side is written in Clojure. If you are new to Clojure the code might look unfamiliar. But, Clojure is a wonderful language, and if you are interested in learning more we recommend the following resources:
We use Luminus as a basis for many parts, so we recommended reading their documentation as well.
Currently this is a research object. The API and organizational structure are subject to change. Comments and suggestions are much appreciated. For code contributions: fork, branch, and send a pull request.
Vortext Demo is open source, and licensed under GPLv3.