Hypercane is a framework for building algorithms for sampling mementos from a web archive collection. Hypercane is the entry point of the Dark and Stormy Archives (DSA) toolkit. A user can generate samples with Hypercane and then view those samples via the Web Archive Storytelling tool Raintale, thus allowing the user to automatically summarize a web archive collection as a few small samples visualized as a social media story.
The possibilities with Hypercane do not stop there. Users can employ Hypercane actions to explore a web archive collection through different actions. This README will provide an overview of these actions, but more detailed documentation is forthcoming.
LANL C number: C22029
Hypercane requires MongoDB for caching. Install MongoDB as appropriate for your environment first. Hypercane will no longer work without a caching database.
If you would like to use the RPM installer for RHEL 8 and CentOS 8 systems:
- Install MongoDB for CentOS 8/RHEL 8. MongoDB does not come with the CentOS/RHEL distributions, so you will need to add a new repository to your system.
- Download the RPM and save it to the Linux server (e.g., hypercane-0.20211022230926-1.el8.x86_64.rpm).
- Type
dnf install hypercane-0.20211022230926-1.el8.x86_64.rpm
- Type
systemctl start hypercane-django.service
This installer only works on Unix and Linux.
- Install MongoDB on a system accessible to the server chosen for Hypercane and record the URL for this MonboDB install. Hypercane will no longer work without a caching database.
- Download the latest release of Hypercane
- Run
./install-hypercane.sh -- --mongodb-url [MONGODB_URL]
where MONGODB_URL is the URL recorded in step 1
Hypercane comes with a web user interface (WUI) providing a more user-friendly method of executing Hypercane. The WUI is a web application. Starting this web application depends on your Unix/Linux system.
To start the Hypercane WUI on a generic Unix system:
/opt/hypercane/start-hypercane-wui.sh
- Install MongoDB
- Clone this repository
- Change into the cloned directory
- Type
pip install --upgrade pip
because this next step only works with the latest version ofpip
- Type
pip install -r requirements.txt
to ensure that you install the correct dependency library versions - Type
python -m spacy download en_core_web_sm
to download a language pipeline for entity detection - (Note: attempts to automated this step insidesetup.py
have not been successful) - Type
pip install . --use-feature=in-tree-build
This grants access to the hc
command which provides the functionality of Hypercane.
By default, the Hypercane WUI uses SQLite, which does not perform well for multiple users logging into the same Hypercane WUI system. For optimial user experience, the Hypercane WUI can be connected to a Postgres database.
- Install Postgres on a system accessible to the server that the Hypercane WUI is running on. Record that system's host and the port Postgres is running on -- the default port is 5432.
- Log into postgres and create a database with postgres for Hypercane.
- Create a user and password.
- Grant all privileges on the database from step 2 in step 3.
- Run
/opt/hypercane/hypercane-gui/set-hypercane-database.sh --dbuser [DBUSER] --dbname [DBNAME] --dbhost [DBHOST] --dbport [DBPORT]
-- with DBUSER created from step 3, DBNAME replaced by the database you created in step 2, DBHOST and DBPORT recorded from step 1. The script will prompt you for the password. - Restart Hypercane as appropriate for your system.
For optimal process control, the Hypercane WUI can use a queueing service like RabbitMQ.
- Install RabbitMQ on a system accessible to the server that the Hypercane WUI is running on. Record that system's hostname and the port that RabbitMQ is running on -- the default port is 5672.
- Run
/opt/hypercane/hypercane-gui/set-hypercane-queueing-service.sh --amqp-url amqp://[HOST]:[PORT]/
where HOST is the host of the RabbitMQ server and PORT is its port
Hypercane allows you to perform actions on web archive collections, TimeMaps, or lists of Mementos.
For example, the following sample
action executes the random
command to randomly sample mementos from the TimeMaps supplied by timemap-file.txt
and writes the URI-Ms to random-mementos.txt
:
hc sample true-random -i timemaps -a timemap-file.txt -o random-mementos.txt
At the moment, the following actions are supported:
sample
- generate a sample from the collection with various commands, some of the commands may execute variousfilter
,cluster
,score
, andorder
actionsreport
- generate a report on the collection according to various commands, different commands provide information on collection metadata or provide statistics on the collectionsynthesize
- sythesize a web archive collection into the a directory containing files, such as warcs or filesidentify
- produce a list of identifiers (URIs) from the collection based on the input, the different commands indicate the type of web resource desiredfilter
- filter the given collection according to the criteria specified by the given commandcluster
- group the documents identified from the input into clusters, different commands provide different clustering algorithmsscore
- score the mementos from the input based on the command issuedorder
- order the mementos from the input based on the command issued
To discover the list of commands associated with an action, use the --help
command-line option. For example, to discover the commands associated with the filter
action, type hc filter --help
.
- Build the software as specified in the Installing Hypercane - Using Docker subsection above
- Create a working directory for your project
- Copy
docker-compose.yml
into your working directory - Type
docker-compose run hypercane
- Run your desired commands, output will appear within your working directory
- When done, exit from the hypercane container by running
exit
- To stop and remove all the services (such as the cache), run
docker-compose down
- Create a virtualenv
- Clone this repository
- Type
./test/installer/create-test-containers.sh
- Type
./test/installer/run-centos-install-start-shell.sh
- Create a virtualenv
- Clone this repository
- Type
./test/installer/create-test-containers.sh
- Type
./test/installer/run-ubuntu-install-start-shell.sh
We are working on additional sampling algorithms and options for the advanced actions. Please feel free to submit issues and pull requests at https://github.com/oduwsdl/hypercane.
© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.