Skip to content

Vortext Annotate is a platform for managing extractions from PDF documents

License

Notifications You must be signed in to change notification settings

vortext/vortext-annotate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Background

Vortext is a system that allows you to upload PDF documents, and annotate them with various extractions. In essence it’s a very simple web-based document management system.

It relies on Spá for its client-side functionality.

Vortext is under heavy development; the idea is to ultimately pair the management of extractions from documents with customizable machine learning predictors. This way we hope to ease the burden of extracting data from literature, as is often the case in the biomedical sciences and law. At this point, however, Vortext does not do any predictions. See the Vortext Demo repository for a system that does. If you are interested in helping us with these ideas, drop us a line at vortext.systems.

Technical overview

Server side

The server side is written in Clojure and uses PostgreSQL as the database. If you are new to Clojure the code might look unfamiliar. But, Clojure is a wonderful language, and if you are interested in learning more we recommend the following resources:

We use Luminus as a basis for many parts, so we recommended reading their documentation as well.

Client side

See the Spá repository for an overview of used technology.

Development prerequisites

Mac OS X

To develop the server we require leiningen which can be installed with Homebrew. We require at least Java JVM/JDK 1.8 and Leiningen 2.4.

brew update # make sure you have recent versions
brew install leiningen # install via Homebrew
git clone <this repo>
cd <your folder>
lein deps # retrieve project dependencies
git submodule update --init --recursive

# Compile the PDF.js files
cd resources/public/scripts/spa/pdfjs
brew install node # install nodejs via Homebrew
npm install
node make generic

To prevent some bugs and ensure future compatibility we convert the PDF documents to PDF/A-2 (PDF archive) before storing them. To do this we use GhostScript. If you have not yet installed GhostScript run brew install ghostscript.

Database

We’re using PostgreSQL as the database. The database settings can be configured with the environment variables specified by environ in project.clj. The default database is spa with user/pass spa/develop. You’ll obviously need to change this in production.

CREATE DATABASE spa;
CREATE USER spa WITH PASSWORD 'develop';
GRANT ALL PRIVILEGES ON DATABASE spa TO spa;

To populate the database tables run lein migrate. If you’re running OS X and are looking for a easy way to run PostgreSQL, we recommend Postgres.app.

Run

To run the server use

lein trampoline run start # will run the server
DEV=true lein trampoline run start # will run in development mode

It will run on port 8080 by default.

To deploy

The easiest way to deploy Vortext is to create an uberjar and deploy that. Run lein uberjar to create a stand-alone version that you can call with java -jar vortext.jar start. This jar can then be run as a service with things like upstart, systemd or whatever your taste is.

It is also recommended to minify the assets in production. We use RequireJS r.js for this.

To install r.js run npm install -g requirejs. Run the following before building the uberjar.

cd resources
r.js -o build.js

By default the production jar will serve the assets from the build folder, in the development it will serve from public. To prevent the production jar from serving the build folder (because you haven’t minified the assets) run the server with DEV=1 java -jar vortext.jar start, this is NOT recommended.

Future work

See ideas or the other issues.

Contributing

Currently this is a research object. The API and organizational structure are subject to change. Comments and suggestions are much appreciated. For code contributions: fork, branch, and send a pull request.

License

Vortext is open source, and licensed under GPLv3. See LICENSE.md for more information.