HyperCollate is a prototype collation engine that is able to handle in-text variation (i.e. textual variation within one witness) if it’s marked with the TEI/XML elements <del>
, <add>
, <subst>
, <app>
or <rdg>
. HyperCollate considers these elements as an interruption in the linear flow of the text: when its parser encounters a <del>
, <add>
, <subst>
, <app>
or <rdg>
element, the stream of text tokens splits up into two or more branches. During the alignment process, HyperCollate looks at each branch and selects the one with the most matches. This means you do not have to linearize or “flatten” your TEI/XML transcription: the multiple layers within an individual witness can be preserved.
When you install and run HyperCollate, you create a work environment in your browser. This work environment (or “server”) is a local environment, which means that only you can access it. In other words: you have a private environment to experiment with HyperCollate at liberty.
NOTE: these instructions are optimised for users with a Mac OS.
HyperCollate is easy to install and to use. Below we give you three options to install it: downloading a JAR or a WAR file, or building it yourself.
Briefly put, a JAR file requires only Java 8 (or higher), including the Java Development Kit (JDK) to run. You can download the package on the Oracle website. For the WAR file you need a web application server like Apache Tomcat. If you don’t know what to choose, we recommend you install the pre-built JAR (option 1).
Because you install HyperCollate via the command-line, you need a little basic knowledge of how the command-line works. If you are unfamiliar with the command line, there are some good tutorials here and here.
NOTE: make sure you organise your folders in a structured way. For example: create a separate folder in your Home-directory that you call hypercollate
. You can read up on “file system hygiene” here.
-
Make sure you have installed the Java Developer Kit (JDK).
-
In your terminal or command prompt, navigate to the directory from which you want to run HyperCollate.
-
Download the jar from https://cdn.huygens.knaw.nl/hyper-collate-server.jar to the HyperCollate directory.
-
run
java -jar hyper-collate-server.jar server
-
The server will start on a random available port, look for the lines:
************************************************************* ** Starting HyperCollate Server at http://localhost:<port> ** *************************************************************
in the output, which lists the URL of the server. Open this URL (starting with
http://
) in your browser. -
That’s it! HyperCollate is running on your computer. You can now start using HyperCollate.
In principle, the server of HyperCollate uses a random port that may differ each time depending on which port is available. If you prefer to have the server use the same port each time, you can set it up as follows:
- Download an example config file from https://raw.githubusercontent.com/HuygensING/hyper-collate/master/hyper-collate-server/config.yml to the HyperCollate directory.
- Set the
baseURI
andport
parameters in the configfile - run
java -jar hyper-collate-server.jar server config.yml
If you want to find out more about which custom port to use, you’ll find some useful documentation here and here.
- On your command line prompt or terminal, navigate to the directory from which you want to run HyperCollate.
- Download the war from https://cdn.huygens.knaw.nl/hyper-collate.war to the current directory.
- Download an example config file from https://raw.githubusercontent.com/HuygensING/hyper-collate/master/hyper-collate-war/hypercollate.xml to the same directory.
- Change the
Context docBase
,Context path
and thevalue
s forprojectDir
andbaseURI
inhypercollate.xml
as needed. - copy
hypercollate.xml
to$TOMCAT_HOME/conf/[Engine]/[Host]/
(e.g./opt/tomcat8/conf/Catalina/localhost/
) - In your browser, go to the
baseURI
URL fromhypercollate.xml
.
- On your command line prompt or terminal, navigate to the directory from which you want to run HyperCollate.
- Run
mvn package
to build the hyper-collate-server JAR and WAR, then to use the JAR: cd hyper-collate-server
java -jar target/hyper-collate-server-1.0-SNAPSHOT.jar server config.yml
to start using the settings fromconfig.yml
- In your browser, open http://localhost:2018/
The WAR can be found in hyper-collate-war/target
, an example config file in hyper-collate-war/hypercollate.xml
Follow steps 3 - 5 from Option 1.
During the installation steps above you created a server on a port on your local machine. HyperCollate runs on this server and you can interact with the program through a REST-based API.
API stands for “application programming interface”. You can read more about it here; in the context of HyperCollate you simply need to understand that an API is a part of the server that receives requests and sends out responses.
The API of HyperCollate is RESTful, which means that we interact with it according to the Hypertext Transfer Protocol (HTTP). You can read more about REST and RESTful here.
You have two options: either you interact with HyperCollate via the command line, or via a graphical interface. Both options are explained in detail below, so if you don’t know which one to choose, don’t worry and read on.
In both cases, though, you interact with the server through REST calls. For HyperCollate you make use of four REST calls:
- POST (creating)
- PUT (updating / modifying)
- GET (getting)
- DELETE (removing)
The installation of HyperCollate also comes with a folder /collations
in which we provide several small XML files that you can use to test HyperCollate, such as /w1-rain.xml
.
Of course you are welcome to create your own collations. You can save them in the /collations
folder or in a folder you create for the occasion. In that case, make sure that when you use HyperCollate, you provide a path pointing to that folder.
In the Swagger Interface (see below) you can get an overview of all collations with the command GET/collations
. You can also see them by navigating to http://localhost:<port>/collations
(replace the <port>
with the port HyperCollate is running on).
If you selected option 1 above (= install the prebuilt JAR) and you opened the URL in your browser, you should get the following page:
For the GUI, select the option API
. You’re taken to HyperCollate’s user interface (built with the Swagger UI tool).
Below we outline what steps you take to create a new collation with HyperCollate and visualize the results:
PUT /collations/{name}
Click on Try it out
and provide your collation with a new name. Click on Execute
.
This should return response code 201 - created
,
with a URL to the collation in the location
header.
PUT /collations/{name}/witnesses/{sigil}
Click on Try it out
, enter your witness data in the “Witness Source” field, and click on Execute
.
This should return response code 204
. This means that the server has successfully fulfilled the request and that there is no additional content to send in the response body. Your witness is now on the server. You can add more witnesses by simply overwriting the first witness in the Witness Source field.
You can visualise an individual witness as a variant graph via the GET/collations/{name}/witnesses{sigil}.dot
in the Swagger interface.
GET /collations/{name}/ascii_table
Click on Try it out
and enter the name of the collation. Click on Execute
. This should return response code 200 - OK
. The response body has a table of the collated text using ASCII.
This should return the table:
┌───┬────────────┬────────────┬─┬────────────────────┬──────────┬─┐ │[A]│ │[+] Spain│ │ │ │ │ │ │The_rain_in_│[-] Cataluña│_│falls_mainly_on_the_│plain │.│ ├───┼────────────┼────────────┼─┼────────────────────┼──────────┼─┤ │[B]│ │ │ │ │[+] plain│ │ │ │The_rain_in_│Spain_ │ │falls_mainly_on_the_│[-] street│.│ └───┴────────────┴────────────┴─┴────────────────────┴──────────┴─┘
In this table the <del>
eted text is indicated with [-]
, and the <add>
ed text with [+]
. Significant whitespace in the witnesses is indicated with _
The .dot file outputs the collation as a variant graph.
GET /collations/{name}.dot
Click on Try it out
, enter the name of your collation and click on Execute
.
This should return response code 200 - OK
The response body has the .dot representation of the collation graph.
This should return the response body:
digraph CollationGraph{
labelloc=b
t000 [label=“”;shape=doublecircle,rank=middle]
t001 [label=“”;shape=doublecircle,rank=middle]
t002 [label=<A,B: The␣rain␣in␣<br/>A,B: <i>/xml</i>>]
t003 [label=<A,B: plain<br/>A: <i>/xml</i><br/>B: <i>/xml/add</i><br/>>]
t004 [label=<A,B: .<br/>A,B: <i>/xml</i>>]
t005 [label=<A: Cataluña<br/>A: <i>/xml/del</i>>]
t006 [label=<A: Spain<br/>B: Spain␣<br/>A: <i>/xml/add</i><br/>B: <i>/xml</i><br/>>]
t007 [label=<A: ␣<br/>A: <i>/xml</i>>]
t008 [label=<A,B: falls␣mainly␣on␣the␣<br/>A,B: <i>/xml</i>>]
t009 [label=<B: street<br/>B: <i>/xml/del</i>>]
t000->t002[label=“A,B”]
t002->t005[label=“A”]
t002->t006[label=“A,B”]
t003->t004[label=“A,B”]
t004->t001[label=“A,B”]
t005->t007[label=“A”]
t006->t007[label=“A”]
t006->t008[label=“B”]
t007->t008[label=“A”]
t008->t003[label=“A,B”]
t008->t009[label=“B”]
t009->t004[label=“B”]
}
Which, when rendered as png using the dot tool from Graphviz or using GraphvizOnline, gives:
In this representation, significant whitespace in the witnesses is represented as ␣
. You can turn this off by adding ?emphasize-whitespace=false
to the url.
The markup of the witnesses is represented as separate lines in the node with, per witness, the xpath to the text. For example, the first text node in the collation graph with ( A,B: The_rain_in_ / A,B: /xml )
indicates that the matched text "The rain in "
has markup xml
in both witnesses. Again, you can turn off the markup lines by adding ?hide-markup=true
to the url.
If you have GraphViz' dot
executable installed, you can get a .png or .svg image directly from the server by replacing the .dot
extension in the url to .png
or .svg
, respectively.
GET /collations/{name}/witnesses/{sigil}.dot
GET /collations/{name}/witnesses/{sigil}.png
GET /collations/{name}/witnesses/{sigil}.svg
Click on Try it out
, enter the name of your collation and the sigil of the witness, and click on Exectute
. This should return response code 200 - OK
. The response body has the .dot , .png or .svg representation of the witness. This should return an image like this:
To group the text nodes per markup combination, add ?join-tokens=true
to the url.
This should return an image like this:
You can also interact with the HyperCollate server via the command line. Interaction can be done in the computer language of your choice or with Curl, a command line tool often used to interact with RESTful APIs.
Below, we’ll give examples using curl
.
IMPORTANT: Make sure to run the curl
commands in a new terminal window; not in the same window as where HyperCollate is running. We recommend you simply open a new tab in your terminal (with cmd + t
, so that you are in the right directory.
curl -X PUT --header 'Content-Type: application/json' --header 'Accept: text/plain; charset=UTF-8' 'http://localhost:2018/collations/testcollation'
curl -X PUT --header 'Content-Type: text/xml; charset=UTF-8' --header 'Accept: application/json; charset=UTF-8' -d '<xml>The rain in <del>Cataluña</del><add>Spain</add> falls mainly on the plain.</xml>' 'http://localhost:2018/collations/testcollation/witnesses/A'
curl -X PUT --header 'Content-Type: text/xml; charset=UTF-8' --header 'Accept: application/json; charset=UTF-8' -d '<xml>The rain in Spain falls mainly on the <del>street</del><add>plain</add>.</xml>' 'http://localhost:2018/collations/testcollation/witnesses/B'
curl -X GET --header 'Accept: text/plain' 'http://localhost:2018/collations/testcollation/ascii_table'
This should return the response body:
┌───┬────────────┬────────────┬─┬────────────────────┬──────────┬─┐ │[A]│ │[+] Spain│ │ │ │ │ │ │The_rain_in_│[-] Cataluña│_│falls_mainly_on_the_│plain │.│ ├───┼────────────┼────────────┼─┼────────────────────┼──────────┼─┤ │[B]│ │ │ │ │[+] plain│ │ │ │The_rain_in_│Spain_ │ │falls_mainly_on_the_│[-] street│.│ └───┴────────────┴────────────┴─┴────────────────────┴──────────┴─┘
In this table the <del>
eted text is indicated with [-]
, and the <add>
ed text with [+]
. Significant whitespace in the witnesses is indicated with _
The .dot file outputs the collation as a variant graph.
curl -X GET --header 'Accept: text/plain' 'http://localhost:2018/collations/testcollation.dot'
This should return the response body:
digraph CollationGraph{
labelloc=b
t000 [label="";shape=doublecircle,rank=middle]
t001 [label="";shape=doublecircle,rank=middle]
t002 [label=<A,B: The␣rain␣in␣<br/>A,B: <i>/xml</i>>]
t003 [label=<A,B: plain<br/>A: <i>/xml</i><br/>B: <i>/xml/add</i><br/>>]
t004 [label=<A,B: .<br/>A,B: <i>/xml</i>>]
t005 [label=<A: Cataluña<br/>A: <i>/xml/del</i>>]
t006 [label=<A: Spain<br/>B: Spain␣<br/>A: <i>/xml/add</i><br/>B: <i>/xml</i><br/>>]
t007 [label=<A: ␣<br/>A: <i>/xml</i>>]
t008 [label=<A,B: falls␣mainly␣on␣the␣<br/>A,B: <i>/xml</i>>]
t009 [label=<B: street<br/>B: <i>/xml/del</i>>]
t000->t002[label="A,B"]
t002->t005[label="A"]
t002->t006[label="A,B"]
t003->t004[label="A,B"]
t004->t001[label="A,B"]
t005->t007[label="A"]
t006->t007[label="A"]
t006->t008[label="B"]
t007->t008[label="A"]
t008->t003[label="A,B"]
t008->t009[label="B"]
t009->t004[label="B"]
}
Which, when rendered as png using the dot tool from Graphviz or using GraphvizOnline, gives:
In this representation, significant whitespace in the witnesses is represented as ␣
(You can turn this off by adding ?emphasize-whitespace=false
to the url)
The markup of the witnesses is represented as separate lines in the node with, per witness, the xpath to the text.
For example, the first text node in the collation graph with ( A,B: The_rain_in_ / A,B: /xml )
indicates that the matched text "The rain in "
has markup xml
in both witnesses. You can turn off the markup lines by adding ?hide-markup=true
to the url.
If you have GraphViz' dot
executable installed, you can get a .png or .svg image directly from the server by replacing the .dot
extension in the url to .png
or .svg
, respectively.
GET /collations/{name}/witnesses/{sigil}.dot
GET /collations/{name}/witnesses/{sigil}.png
GET /collations/{name}/witnesses/{sigil}.svg
curl -X GET --header 'Accept: image/svg+xml' 'http://localhost:2018/collations/testcollation/witnesses/A.svg'
This should return an image similar to this:
To group the text nodes per markup combination, add ?join-tokens=true
to the url.
curl -X GET --header 'Accept: image/svg+xml' 'http://localhost:2018/collations/testcollation/witnesses/A.svg?join-tokens=true'
This should return an image similar to this:
If you chose the WAR option above, you probably know what to do. The war just exposes a swagger file without a UI, in the /swagger.json
or /swagger.yaml
endpoints.
If you want to stop the HyperCollate program, you can simply close the browser window in which the Swagger is running and stop the terminal processes by typing ctrl + d
on your command line.
The next time you want to start HyperCollate, you don’t have to follow all installation steps again. Simply navigate on your command line to the folder with the hyper-collate-server.jar
file and run the command java -jar hyper-collate-server.jar server
. HyperCollate is ready for use again!