Squirrel is a crawler for the linked web. It provides several tools to search and collect data from the heterogeneous content of the linked web.
Documentation, tutorials and more: https://dice-group.github.io/squirrel.github.io/
- Java 1.8
- Apache Maven 3.6.0
- Docker 19.03.12
- Docker 19.03.12
or
- ORCA benchmark on the HOBBIT platform
Clone the repository in a directory of your choice with:
git clone https://github.com/dice-group/Squirrel
Enter into the Squirrel directory and start RabbitMQ and MongoDB containers:
docker-compose up -d mongodb rabbit
Set up your seeds in the file seed/seeds.txt
and start the frontier and one worker instance with:
docker-compose up frontier worker1
https://www.bibsonomy.org/bibtex/29fe2ef0c2e1908276d424c1ca3e06cbf/dice-research
- Go to https://master.project-hobbit.eu/
- Register an account or log in into an existing one
- Go to "Benchmarks"
- Select "ORCA" in the Benchmark list
- Select the system and set all parameters (also can be found by following links in the paper):
Parameter | Effectiveness | Efficiency |
---|---|---|
Average crawl delay | 0 | 0 |
Average node degree | 20 | 20 |
Average ratio of disallowed resources | 0 | 0 |
Average resource degree | 9 | 9 |
Disallowed resources | 0 | 0 |
Dump file compression ratio | 0.3 | 0 |
Node size definition | Static | Static |
Number of nodes | 100 | 200 |
RDF dataset size | 1000 | 1000 |
Seed | 20200318 | 20200318 |
Use N3 dumps | true | true |
Use NT dumps | true | true |
Use RDF/XML dumps | true | true |
Use TTL dumps | true | true |
Weight of CKAN node occurrence | 5 | 0 |
Weight of dereferencing HTTP node occurrence | 21 | 100 |
Weight of HTTP dump file node occurrence | 40 | 0 |
Weight of RDFa node occurrence | 4 | 0 |
Weight of SPARQL node occurrence | 30 | 0 |
- Use "Submit" to queue the experiment
- Watch the received link for experiment results. You can use "Experiments → Experiment Status" page to check if it's still running.
It is also possible to deploy your own HOBBIT platform. Refer to the HOBBIT platform manual: https://hobbit-project.github.io/. In this case you may need system adapters for ORCA as well: https://github.com/topics/orca-system-adapter.