Vortext is a system that allows you to upload PDF documents, and annotate them with various extractions. In essence it’s a very simple web-based document management system.
It relies on Spá for its client-side functionality.
Vortext is under heavy development; the idea is to ultimately pair the management of extractions from documents with customizable machine learning predictors. This way we hope to ease the burden of extracting data from literature, as is often the case in the biomedical sciences and law. At this point, however, Vortext does not do any predictions. See the Vortext Demo repository for a system that does. If you are interested in helping us with these ideas, drop us a line at vortext.systems.
The server side is written in Clojure and uses PostgreSQL as the database. If you are new to Clojure the code might look unfamiliar. But, Clojure is a wonderful language, and if you are interested in learning more we recommend the following resources:
We use Luminus as a basis for many parts, so we recommended reading their documentation as well.
See the Spá repository for an overview of used technology.
To develop the server we require leiningen which can be installed with Homebrew. We require at least Java JVM/JDK 1.8 and Leiningen 2.4.
brew update # make sure you have recent versions brew install leiningen # install via Homebrew
git clone <this repo> cd <your folder> lein deps # retrieve project dependencies git submodule update --init --recursive # Compile the PDF.js files cd resources/public/scripts/spa/pdfjs brew install node # install nodejs via Homebrew npm install node make generic
To prevent some bugs and ensure future compatibility we convert the PDF documents to PDF/A-2 (PDF archive) before storing them. To do this we use GhostScript. If you have not yet installed GhostScript run brew install ghostscript
.
We’re using PostgreSQL as the database.
The database settings can be configured with the environment variables specified by environ in project.clj.
The default database is spa
with user/pass spa/develop
.
You’ll obviously need to change this in production.
CREATE DATABASE spa; CREATE USER spa WITH PASSWORD 'develop'; GRANT ALL PRIVILEGES ON DATABASE spa TO spa;
To populate the database tables run lein migrate
.
If you’re running OS X and are looking for a easy way to run PostgreSQL, we recommend Postgres.app.
To run the server use
lein trampoline run start # will run the server DEV=true lein trampoline run start # will run in development mode
It will run on port 8080 by default.
The easiest way to deploy Vortext is to create an uberjar
and deploy that.
Run lein uberjar
to create a stand-alone version that you can call with java -jar vortext.jar start
.
This jar can then be run as a service with things like upstart, systemd or whatever your taste is.
It is also recommended to minify the assets in production. We use RequireJS r.js for this.
To install r.js run npm install -g requirejs
.
Run the following before building the uberjar.
cd resources r.js -o build.js
By default the production jar will serve the assets from the build
folder, in the development it will serve from public
. To prevent the production jar from serving the build
folder (because you haven’t minified the assets) run the server with DEV=1 java -jar vortext.jar start
, this is NOT recommended.
See ideas or the other issues.
Currently this is a research object. The API and organizational structure are subject to change. Comments and suggestions are much appreciated. For code contributions: fork, branch, and send a pull request.
Vortext is open source, and licensed under GPLv3. See LICENSE.md for more information.