(The command norma
is being obsoleted and most commands will be of the form ami-*
)
Note. The commandline syntax is being migrated. See AMI-STEM) and more recent docs AMI-DOCS
A tool to convert a variety of inputs into normalized, tagged, XHTML (with embedded/linked SVG and PNG where appropriate). The initial emphasis is on scholarly publications but much of the technology is general.
This is a bundle of all norma
and ami
functionality to transform PDFs and XML into structured semantic HTML. It's alpha (2018-09) and we have 4-6 testers each with different projects. This runs on a simple commandline ; see AMI-STEM page for more details.
For a simple introduction and a description of how to install binaries of the software please see: here
UPDATE 20190122)
For simply running AMI
(not building) use the repository ami-jars. This repo will be updated frequently (at least till end 2019-02). If git is installed, a "git clone https://github.com/petermr/ami-jars.git" checks out the project.
There are two approaches:
ami-jars
provides an uber-jar
('jar-with-dependencies`) which can be run from the commandline:
java -jar <jar> -cp <classpath> <mainClass>
For this you need to know which main classes map onto the commands.
The distrib has two directories:
repo
which contains all the required jar files.bin
which contains all the scripts. Set your classpath to include this directory.
If you have the corect classpath, then exceuting ami-pdf
on the commandline will run theami-pdf
module. (Later we hope to make these submodules on the commandline.
Norma can be built with maven3 and requires java 1.8 or greater. If you are building for the first time, or if your mods are minor you can skip the integration tests. The normal tests take a minute or two. To avoid all tests (which takes 20 secs or so) :
mvn install -Dmaven.test.skip=true
will not run any tests.
If you're interested in contributing please take a look at: CONTRIBUTING.md
Norma will convert legacy files into scholarly html. It converts files that are in a CProject structure. This enables it to process multiple papers in a single run without overwriting files. It also keeps all the data from each paper together in its own CTree. This includes metadata about the paper, images that may have been extracted from the paper and supplementary files such as tables.
To convert a CTree full of NLM xml files such as those you might have downloaded from EuropePMC with getpapers you can run:
norma --project <CProject folder> --input fulltext.xml --output scholarly.html --transform nlm2html