if someone ever reported that there was an orang-utan in the Library, the wizards would probably go and ask the Librarian if he'd seen it.
Terry Pratchett - Night Watch
OOK is a structural search engine for data cubes.
Typically search engines allow users to find data by matching against terms in the dataset-metadata. For example a query like "Balance of Payments" would be needed to match that publication's title or summary.
OOK goes deeper, indexing the reference data terms that describe and identify each numerical observation within a datacube. This let's users find data with queries like "imports of cars from Germany". Users can search without first needing to know how data was packaged and published.
OOK also understands the structure of data cubes so users can cross-filter for different facets, asking things like "what's the trade-off between geographic precision and recency?".
OOK is powered by linked-data written to match the PublishMyData Application Profile. We extract data from a triplestore using SPARQL then transform this into compacted and framed JSON-LD before loading it into Elasticsearch for querying. The ETL process and front-end are written in Clojure.
OOK uses Elasticsearch as it's database.
We provide a docker-compose file for running elasticsearch in your local development environment.
The bin/cider
and bin/test
scripts provide a demonstration.
You can bring up a test index on port 9201 like this:
docker-compose up -d elasticsearch-test
Or one for dev on port 9200 like this:
docker-compose up -d elasticsearch-development
You can bring the services down with:
docker-compose down
You might also like to use the docker-compose start
and stop
commands. To see what's running use docker-compose ps
.
The front-end is written in Clojurescript. You'll need to compile this to JavaScript.
Using a recent version of the Yarn package manager, you can install the JavaScript dependencies:
yarn install
Then compile the CLJS to JS:
yarn compile
If you want to develop the CLJS you can have yarn watch for changes and recompile as necessary:
yarn watch
or, if you also want the tests:
yarn watch-all
With the shadow-cljs watcher running, cljs tests are run and reloaded at localhost:8021
.
The application server is written in Clojure. You can run it locally by starting a clojure REPL with the dev alias using e.g. bin/repl
(or bin/cider
if you're using emacs/cider). Within the REPL, you can load and start the server with:
(dev)
(go)
Visit localhost:3000
in your browser (or whatever port you set if you overwrite it in env/dev/resources/local.edn
).
You'll need to run the ETL pipeline to populate your Elasticsearch database.
We provide configurations for extracting data from the Integrated Data Service.
For a small set of fixtures you can use:
clojure -X:dev:ook.etl/fixtures
Or to load all trade datasets you can use:
clojure -X:dev:ook.etl/trade
You can check that the indicies have some documents loaded with:
curl -X GET "localhost:9200/_cat/indices?v=true"
Alternatively you can create an integrant profile with the :ook.etl/load
component which will populate the database when the system is started. Use
:ook.etl/target-datasets
to scope the data to a vector of pmdcat:Dataset
URIs (e.g. resources/fixture/data.edn) or provide
a SPARQL query to set the scope (e.g.
resources/trade/data.edn).
Run the tests with the alias:
clojure -M:dev:test
Clojurescript tests can be built and viewed in dev as described above. To build/run them from the command line you need to have Chrome installed and run:
yarn build-ci
node_modules/karma/bin/karma start --single-run
Or, if you have the karma cli installed globally, just
karma start --single-run
This runs the cljs tests in a way that can be reported programatically for CI.
See the deployment readme.
We're downloading RDF using drafter client.
Since we're only using the public endpoint by default, the AUTH0 credentials are being ignored. You can configure an AUTH0_SECRET
environmental variable with a dummy value if you like.
If you need to use a draft endpoint then you can specify AUTH0 credentials using e.g. a secret key for the ook application (e.g. on cogs staging or idp beta). You can store this locally in an encrypted file:
echo VALUE_OF_THE_SECRET | gpg -e -r YOUR_PGP_ID > env/dev/resources/secrets/AUTH0_SECRET.gpg
You can use this pattern and the ook.concerns.integrant/secret
reader to encrypt other secrets.
Copyright © 2021 Swirrl
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.