Skip to content

Commit

Permalink
Merge pull request #1 from psiotwo/robot-on-optimized-model-conversion
Browse files Browse the repository at this point in the history
Robot on optimized model conversion
  • Loading branch information
psiotwo authored Apr 10, 2022
2 parents 930f156 + d65f6e1 commit b7431e1
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 8 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Changed
- Optimize memory usage for update queries using `--temporary-file` switch [#978]
- Sort [`report`] violations by rule name within level [#955]

### Fixed
Expand Down
18 changes: 13 additions & 5 deletions docs/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ Instead of specifying one or more pairs (query file, output file), you can speci

## Handling Imports

By default, `query` ignores import statements. To include all imports as named graphs, add `--use-graphs true`.
By default, `query` ignores import statements. To include all imports as named graphs, add `--use-graphs true`.

robot query --input imports.owl \
--use-graphs true --catalog catalog.xml \
--query named_graph.sparql results/named_graph.csv

The example above also uses the [global](/global) `--catalog` option to specify the catalog file for the import mapping. The default graph is the union of all graphs, which allows querying over an ontology and all its imports.

The names of the graphs correspond to the ontology IRIs of the imports. If the import does not have an ontology IRI, one will be automatically generated. Running `query` with the `-vv` flag will print the names of all graphs as they are added.
Expand All @@ -70,13 +70,21 @@ The `--update` option only updates the ontology itself, not any of the imports.

**Warning:** The output of SPARQL updates will not include `xsd:string` datatypes, because `xsd:string` is considered implicit in RDF version 1.1. This behaviour differs from other ROBOT commands, where `xsd:string` datatypes from the input are maintained in the output.

### Storing intermediate results on Disk
For very large ontologies, saving heap memory might be beneficial. You can use `--temporary-file true` to ensure that intermediate results will be stored to a temporary file. Note that this makes the execution slower.

robot query --input nucleus.owl \
--update update.ru \
--temporary-file true \
--output results/nucleus_update.owl

## Executing on Disk

For very large ontologies, it may be beneficial to load the ontology to a mapping file on disk rather than loading it into memory. This is supported by [Jena TDB Datasets](http://jena.apache.org/documentation/tdb/datasets.html). To execute a query with TDB, use `--tdb true`:

robot query --input nucleus.ttl --tdb true \
--query cell_part.sparql results/cell_part.csv

Please note that this will only work with ontologies in RDF/XML or Turtle syntax, and not with Manchester Syntax. Attempting to load an ontology in a different syntax will result in a [Syntax Error](errors#syntax-error). ROBOT will create a directory to store the ontology as a dataset, which defaults to `.tdb`. You can change the location of the TDB directory by using `--tdb-directory <directory>`. If a `--tdb-directory` is specified, you do not need to include `--tdb true`. If you've already created a TDB directory, you can query from the TDB dataset without needing to specify an `--input` - just include the `--tdb-directory`.

Once the query operation is complete, ROBOT will remove the TDB directory. If you are performing many query commands on one ontology, you can include `--keep-tdb-mappings true` to prevent ROBOT from removing the TDB directory. This will greatly reduce the execution time of subsequent queries.
Expand All @@ -103,7 +111,7 @@ The file provided for `--update` does not exist. Check the path and try again.

### Missing Output Error

The `--query`, `--select`, and `--construct` options require two arguments: a query file and an output file (`--query <query> <output>`).
The `--query`, `--select`, and `--construct` options require two arguments: a query file and an output file (`--query <query> <output>`).

### Missing Query Error

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ public QueryCommand() {
o.addOption("O", "output-dir", true, "Directory for output");
o.addOption("g", "use-graphs", true, "if true, load imports as named graphs");
o.addOption("u", "update", true, "run a SPARQL UPDATE");
o.addOption("y","temporary-file", true, "(together with --update only) true reduces the heap memory used by storing intermediate results into a temporary file, but makes the execution slower.");
o.addOption("t", "tdb", true, "if true, load RDF/XML or TTL onto disk");
o.addOption("C", "create-tdb", true, "if true, create a TDB directory without querying");
o.addOption("k", "keep-tdb-mappings", true, "if true, do not remove the TDB directory");
Expand Down Expand Up @@ -155,7 +156,8 @@ public CommandState execute(CommandState state, String[] args) throws Exception
state = CommandLineHelper.updateInputOntology(ioHelper, state, line);
OWLOntology inputOntology = state.getOntology();

OWLOntology outputOntology = executeUpdate(state, inputOntology, ioHelper, updatePaths);
final boolean useTemporaryFile = CommandLineHelper.getBooleanValue(line, "temporary-file", false);
OWLOntology outputOntology = executeUpdate(state, inputOntology, ioHelper, updatePaths, useTemporaryFile);
CommandLineHelper.maybeSaveOutput(line, outputOntology);
state.setOntology(outputOntology);
return state;
Expand Down Expand Up @@ -274,11 +276,13 @@ private static void executeOnDisk(CommandLine line, List<List<String>> queries)
* @param inputOntology the ontology to update
* @param ioHelper IOHelper to handle loading OWLOntology objects
* @param updatePaths paths to update queries
* @param useTemporaryFile whether to use a temporary file for saving some heap and store
* intermediate results into a file
* @return updated OWLOntology
* @throws Exception on file or ontology loading issues
*/
private static OWLOntology executeUpdate(
CommandState state, OWLOntology inputOntology, IOHelper ioHelper, List<String> updatePaths)
CommandState state, OWLOntology inputOntology, IOHelper ioHelper, List<String> updatePaths, boolean useTemporaryFile)
throws Exception {
Map<String, String> updates = new LinkedHashMap<>();
for (String updatePath : updatePaths) {
Expand Down Expand Up @@ -324,7 +328,9 @@ private static OWLOntology executeUpdate(
catalogPath = null;
}
}
return QueryOperation.convertModel(model, ioHelper, catalogPath);
return useTemporaryFile
? QueryOperation.convertModelThroughTemporaryFile(model, ioHelper, catalogPath)
: QueryOperation.convertModel(model, ioHelper, catalogPath);
}

/**
Expand Down
19 changes: 19 additions & 0 deletions robot-core/src/main/java/org/obolibrary/robot/QueryOperation.java
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,25 @@ public static OWLOntology convertModel(Model model, IOHelper ioHelper, String ca
return ioHelper.loadOntology(new ByteArrayInputStream(os.toByteArray()), catalogPath);
}

/**
* Given a Model, an IOHelper, and a path to an XML catalog, convert the model to an OWLOntology
* object.
*
* @param model Model to convert to OWLOntology
* @param ioHelper IOHelper to load ontology
* @param catalogPath String path to XML catalog
* @return OWLOntology object version of model
* @throws IOException on issue loading ontology
*/
public static OWLOntology convertModelThroughTemporaryFile(Model model, IOHelper ioHelper, String catalogPath) throws IOException {
final File tempFile = File.createTempFile("robot", ".owl");
tempFile.deleteOnExit();
try (final BufferedOutputStream os = new BufferedOutputStream(new FileOutputStream(tempFile))) {
RDFDataMgr.write(os, model, Lang.TTL);
}
return ioHelper.loadOntology(new BufferedInputStream(new FileInputStream(tempFile)), catalogPath);
}

/**
* Count results.
*
Expand Down

0 comments on commit b7431e1

Please sign in to comment.