Queries and tool used to compare qEndpoint against other systems
These queries are based on the Wikidata_SPARQL_Logs, mainly the Interval 7.
This repository contains a lot of old scripts, the most importants are the work.py
and the test3.ipynb
, the others are here to show the previous expermeriments we've tried. We move our work on a remote qEndpoint endpoint to have a fair comparison against the others.
- "rec" = Recursive (=Queries with a path query)
To run these scripts from your own without using our sorted dataset, you need first to download and uncompress the query log file into wdlogsh.tsv
.
This is the script used to query the endpoints.
To config it, you need to open the file and search for the lines:
# <<CONFIG POINT 1>>
, after this line you can configure tests without or with only recursive queries (aka path queries) or to increase the count of queries used from thewdlogsh.tsv
file, the current one is 100k queries.# <<CONFIG POINT 2>>
, after this line you can configure the endpoints to send the queries.
the results files will be written in the file results.json
(non recursive) and results_rec.json
(recursive) with the format:
{
"engines": [
{
"id": "sparql endpoint id",
"name": "sparql endpoint name",
"time1": [
number
],
"number_result1": [
number
],
"error1": [
boolean
]
}
]
}
- The ith element of time1 is the time to run the query i to run.
- The ith element of number_result1 is the number of the result of the query i to run.
- The ith element of error1 is if the ith query thrown an error.
Script used to get information from a run against a remote endpoint vs the others.
Old script getting the information from a run against a local endpoint vs the others.
Same as test.ipynb with more values.
script used to do the experiments for Easily setting up a local Wikidata SPARQL endpoint using the qEndpoint. These results were done the start of the query logs with a local endpoint vs remote endpoints, added to the fact that the first queries were easier to run than the others, these values can't really be taken into account.
We took the interval 7 of the Wikidata_SPARQL_Logs and we randomly pick 100k queries from it. These queries can be find in the query_dump_100k.json
. We then tried the first queries over all the endpoints. Once 10k queries were sent without any error, we have the query_dump_10k_valid.json
dataset, the query_dump_10k_failed.json
is containing all the failing queries with all the valid queries.
The benchmark to compare qendpoint with other systems using the Berlin SPARQL Benchmark (BSBM) is available in the bsbm-bench directory.