Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities (see entities map), and keep those up-to-date, to answer questions like "give me all the humans with a name starting by xxx" in a super snappy way, typically for the needs of an autocomplete field.
For the Wikidata-only version see the archived branch #wikidata-subset-search-engine
branch.
see setup
see setup to install dependencies:
- NodeJs
>= v6.4
- ElasticSearch (this repo was developed targeting ElasticSearch
v2.4
, but it should work with newer version with some minimal changes) - Nginx
- Let's Encrypt
- already installed in any good nix system: curl, gzip
see Wikidata and Inventaire per-entity import
3 ways to import Wikidata entities data into your ElasticSearch instance
To update any entity, simply re-add it, typically by posting its URI (ex: 'wd:Q180736' for a Wikidata entity, or 'inv:9cf5fbb9affab552cd4fb77712970141' for an Inventaire one) to the server
To un-index entities that were mistakenly added, pass the path of a results json file, supposedly made of an array of ids. All those ids' documents will be deleted
index=wikidata
type=humans
ids_json_array=./queries/results/mistakenly_added_wikidata_humans_ids.json
npm run delete-from-results $index $type $ids_json_array
index=entities-prod
type=works
ids_json_array=./queries/results/mistakenly_added_inventaire_works_ids.json
npm run delete-from-results $index $type $ids_json_array
You can import dumps from inventaire.io prod elasticsearch instance:
# Download Wikidata dump
wget -c https://dumps.inventaire.io/wd/elasticsearch/wikidata_data.json.gz
gzip -d wikidata_data.json.gz
# elasticdump should have been installed when running `npm install`
# --limit: increasing batches size
./node_modules/.bin/elasticdump --input=./wikidata_data.json --output=http://localhost:9200/wikidata --limit 2000
# Same for Inventaire
wget -c https://dumps.inventaire.io/inv/elasticsearch/entities_data.json.gz
gzip -d entities_data.json.gz
./node_modules/.bin/elasticdump --input=./entities_data.json --output=http://localhost:9200/entities --limit 2000
curl "http://localhost:9200/wikidata/humans/_search?q=Victor%20Hugo"
We are developing and maintaining tools to work with Wikidata from NodeJS, the browser, or simply the command line, with quality and ease of use at heart. Any donation will be interpreted as a "please keep going, your work is very much needed and awesome. PS: love". Donate
- wikidata-sdk: a javascript tool suite to query and work with wikidata data, heavily used by wikidata-cli
- wikidata-edit: Edit Wikidata from NodeJS
- wikidata-cli: The command-line interface to Wikidata
- wikidata-filter: A command-line tool to filter a Wikidata dump by claim
- wikidata-taxonomy: Command-line tool to extract taxonomies from Wikidata
- Other Wikidata external tools:
Do you know inventaire.io? It's a web app to share books with your friends, built on top of Wikidata! And its libre software too.
AGPL-3.0