ws23 can turn a web search result into a set of RDF triples: its name stands for Web Search to Triples, indeed. It has been developed for didactic purposes, just in order to try a minimal mimic of Sig.ma. Just type what you want and ws23
will gather all the markup triples from the first pages resulting from a Google search.
The first draft has been developed in something less than 3 hours and this was possible thanks to third-party libraries and services such as:
- rdflib: a Python library for working with RDF
- google.py: a simple Python google search interface
- Any23 (Anything to Triples): an Apache project available both as a library and as a web service.
- RQ a simple Python library for queueing jobs and processing them in background (requires Redis >= 2.6.0.)
The easies and safes way to use ws23
is to download the source code and then run the ws23/ws23.py
script from the command line:
~$ python ws23/ws23.py "QUERY" [NUMBER OF RESULTS]
The first argument is the query you want to perform (e.g. "web search to triples", "Ascoli Calcio", ecc) and the second argument is the number of pages you want to consider, so if you specify the value of 10
only the first 10 Google Search results will be parsed and triplified using Any23.