Background

This repository contains the code for demonstrating various machine learning models. It currently powers Robot-Reviewer, a system for automatically extracting Risk of Bias from Randomized Controlled Trial publications.

In essence it’s Vortext stripped down to essentials, and with the ability to run predictions in various programming languages. At some point the functionality between the two repositories will hopefully converge. Currently, we are looking for seed-investors to develop this further, if you’re interested drop us a line at vortext.systems.

Technical overview

Client side

See the Spá repository for an overview of used technology.

Server side

Unlike the regular Vortext repository, this set-up is a bit more involved. To run the predictions we make use of a polyglot approach. The reason for this is fairly simple: to make the predictions congruent with the client-side PDF.js we had to run PDF.js on the server-side (via NodeJS), but Javascript doesn’t have nice machine learning libraries like Python or R does so we wanted to use those languages for the machine learning. However, Javascript, Python, and R are terrible languages for scalable full-stack development, so we picked Clojure (a Lisp) for that.

To make this work we opted for a custom Remote Procedure Call framework. Clojure is our glue and runs all the web-facing stuff. But, Clojure will call the NodeJS and Python (or R in the future) processes over ZeroMQ. The dependent processes can run as separate processes, but in practice we them start as children of the Clojure app. To call the different processes (services, really) we use the Majordomo Pattern.

It is recommended to read-up on the ZeroMQ documentation (which doubles as an excellent intro into distributed systems) when developing this part of the software. We chose ZeroMQ for its wide range of supported languages, light weight and proven effectiveness. However, ZeroMQ has it’s own hangups and we’re actively considering alternatives.

But, how do you tie all this together without it becoming a complete mess? Well, admittedly it is a bit of a mess now, but for different reasons. Anyway… the way we tie it together is by using Directed Acyclic Graphs (DAG) as an abstraction over the different processing steps. We stole this idea from Prismatic Graph and Apache Storm. So, if you want to develop your own predictors it is vital to read-up on ”Graph: Abstractions for Structured Computation”. We call these graphs topologies (a term borrowed from Storm).

Defining topologies

A topology defines the flow from source to sink. The source is the incoming HTTP POST request. The sink is the data to be returned.

To define this flow you can make a topology like this:

(def topology
  {:source        (fnk [body] (slurp body))
   :incremented   (fnk [source] (py "example.add_one" source))
   :doubled       (fnk [incremented] (js "example/multiply.js" incremented))
   :sink          (fnk [doubled] (str "result:" (String. doubled)))
   })

This will take the body, increment the number in Python, and double it in Javascript. The Python file looks like this

import sys
sys.path.append('../../multilang/python')
from abstract_handler import AbstractHandler

class Handler(AbstractHandler):
    title = "Add one"

    def __init__(self):
        # Setup here
        print "Hello, I'm adding one"

    def handle(self, input):
        return str(int(input) + 1)

And similarly the Javascript

// You can do all sorts of set-up here
console.log("Hi!, I'll multiply the input");

function handler(input) {
  return (parseInt(input) * 2) + ""; // Must return a string
}

module.exports = handler;

The handle function gets the raw input body, and returns the new number as a string. We make no assumptions about the serialization of the input and the output, we currently use JSON ourselves, but have used ProtocolBuffers in the past.

Take a look in the multilang folder for implementation details. The example topology is found here.

The ebm branch of this repository contains a more complete example, currently used to run Risk of Bias and PICO predictions. To add new predictors it is recommended to branch off develop and create your own topology folders. The topologies themselves get called by name from the dispatcher on the client-side (e.g. here in the ebm branch). But this is subject to change.

Development prerequisites

Mac OS X

To develop the server we require leiningen which can be installed with Homebrew. We require at least Java JVM/JDK 1.8 and Leiningen 2.4.

brew update # make sure you have recent versions
brew install leiningen # install via Homebrew

git clone <this repo>
cd <your folder>
lein deps # retrieve project dependencies
git submodule update --init --recursive

# Compile the PDF.js files
cd resources/public/scripts/spa/pdfjs
brew install node # install nodejs via Homebrew
npm install
node make singlefile generic

Furthermore, to make the RPC stuff work we require the following

# OSX
brew install zeromq

# Alternatively, from source
wget http://download.zeromq.org/zeromq-4.0.5.tar.gz
tar zxvf zeromq-4.0.5.tar.gz
cd zeromq-4.0.5 && ./configure
cd zeromq-4.0.5 && make && make install
rm -rf zeromq-4.0.5

# NodeJS RPC stuff
npm install q underscore zmq atob commander

# Python RPC
pip install pyzmq argparse

And, of course, any NodeJS or Python dependencies required by the topology (such as scikit, nltk, etc).

To start the system run lein run start --port 8080, which will start the server on 8080.

New to Clojure?

The server side is written in Clojure. If you are new to Clojure the code might look unfamiliar. But, Clojure is a wonderful language, and if you are interested in learning more we recommend the following resources:

We use Luminus as a basis for many parts, so we recommended reading their documentation as well.

Contributing

Currently this is a research object. The API and organizational structure are subject to change. Comments and suggestions are much appreciated. For code contributions: fork, branch, and send a pull request.

License

Vortext Demo is open source, and licensed under GPLv3.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
resources		resources
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.org		README.org
project.clj		project.clj
topologies		topologies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Technical overview

Client side

Server side

Defining topologies

Development prerequisites

Mac OS X

New to Clojure?

Contributing

License

About

Releases

Packages

Languages

License

vortext/vortext-demo

Folders and files

Latest commit

History

Repository files navigation

Background

Technical overview

Client side

Server side

Defining topologies

Development prerequisites

Mac OS X

New to Clojure?

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages