Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ondisk and slicing #295

Closed
luigi-asprino opened this issue Aug 27, 2022 · 4 comments
Closed

ondisk and slicing #295

luigi-asprino opened this issue Aug 27, 2022 · 4 comments
Labels
Bug Something isn't working
Milestone

Comments

@luigi-asprino
Copy link
Member

luigi-asprino commented Aug 27, 2022

Enabling slicing with ondisk makes the engine open too many files (a new TDB is created for each slice) and the execution fails.

Exception in thread "main" org.apache.jena.dboe.base.file.FileException: Failed to open: /Users/lgu/Desktop/NOTime/SA_experiment_temp/7900157629280057590/Data-0001/prefixes.dat (mode=rw)
	at org.apache.jena.dboe.base.file.ChannelManager.open$(ChannelManager.java:75)
	at org.apache.jena.dboe.base.file.ChannelManager.openref$(ChannelManager.java:51)
	at org.apache.jena.dboe.base.file.ChannelManager.acquire(ChannelManager.java:42)
	at org.apache.jena.dboe.sys.FileLib.openManaged(FileLib.java:61)
	at org.apache.jena.dboe.sys.FileLib.openManaged(FileLib.java:56)
	at org.apache.jena.dboe.base.file.BlockAccessBase.<init>(BlockAccessBase.java:50)
	at org.apache.jena.dboe.base.file.BlockAccessMapped.<init>(BlockAccessMapped.java:61)
	at org.apache.jena.dboe.base.block.BlockMgrFactory.createMMapFile(BlockMgrFactory.java:97)
	at org.apache.jena.dboe.base.block.BlockMgrFactory.createFile(BlockMgrFactory.java:88)
	at org.apache.jena.dboe.base.block.BlockMgrFactory.create(BlockMgrFactory.java:62)
	at org.apache.jena.dboe.base.block.BlockMgrFactory.create(BlockMgrFactory.java:54)
	at org.apache.jena.dboe.trans.bplustree.BPlusTreeFactory.createBPTree(BPlusTreeFactory.java:150)
	at org.apache.jena.dboe.trans.bplustree.BPlusTreeFactory.createBPTreeByBlockSize(BPlusTreeFactory.java:110)
	at org.apache.jena.dboe.trans.bplustree.BPlusTreeFactory.createBPTree(BPlusTreeFactory.java:102)
	at org.apache.jena.tdb2.store.TDB2StorageBuilder.makeRangeIndex(TDB2StorageBuilder.java:297)
	at org.apache.jena.tdb2.store.TDB2StorageBuilder.buildBaseNodeTable(TDB2StorageBuilder.java:319)
	at org.apache.jena.tdb2.store.TDB2StorageBuilder.buildNodeTable(TDB2StorageBuilder.java:303)
	at org.apache.jena.tdb2.store.TDB2StorageBuilder.buildPrefixes(TDB2StorageBuilder.java:215)
	at org.apache.jena.tdb2.store.TDB2StorageBuilder.build(TDB2StorageBuilder.java:105)
	at org.apache.jena.tdb2.sys.StoreConnection.make(StoreConnection.java:93)
	at org.apache.jena.tdb2.sys.StoreConnection.connectCreate(StoreConnection.java:61)
	at org.apache.jena.tdb2.sys.DatabaseOps.createSwitchable(DatabaseOps.java:98)
	at org.apache.jena.tdb2.sys.DatabaseOps.create(DatabaseOps.java:79)
	at org.apache.jena.tdb2.sys.DatabaseConnection.build(DatabaseConnection.java:103)
	at org.apache.jena.tdb2.sys.DatabaseConnection.lambda$make$0(DatabaseConnection.java:74)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at org.apache.jena.tdb2.sys.DatabaseConnection.make(DatabaseConnection.java:74)
	at org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:63)
	at org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:54)
	at org.apache.jena.tdb2.DatabaseMgr.DB_ConnectCreate(DatabaseMgr.java:41)
	at org.apache.jena.tdb2.DatabaseMgr.connectDatasetGraph(DatabaseMgr.java:46)
	at org.apache.jena.tdb2.TDB2Factory.connectDataset(TDB2Factory.java:40)
	at org.apache.jena.tdb2.TDB2Factory.connectDataset(TDB2Factory.java:46)
	at com.github.sparqlanything.model.BaseFacadeXGraphBuilder.getDatasetGraph(BaseFacadeXGraphBuilder.java:80)
	at com.github.sparqlanything.model.BaseFacadeXGraphBuilder.<init>(BaseFacadeXGraphBuilder.java:98)
	at com.github.sparqlanything.model.BaseFacadeXGraphBuilder.<init>(BaseFacadeXGraphBuilder.java:47)
	at com.github.sparqlanything.engine.QueryIterSlicer.hasNextBinding(QueryIterSlicer.java:86)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
	at org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38)
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
	at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:64)
	at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:55)
	at org.apache.jena.riot.resultset.rw.ResultSetWriterCSV.output(ResultSetWriterCSV.java:94)
	at org.apache.jena.riot.resultset.rw.ResultSetWriterCSV.write(ResultSetWriterCSV.java:52)
	at org.apache.jena.riot.resultset.rw.ResultsWriter.write(ResultsWriter.java:156)
	at org.apache.jena.riot.resultset.rw.ResultsWriter.write(ResultsWriter.java:126)
	at org.apache.jena.riot.resultset.rw.ResultsWriter$Builder.write(ResultsWriter.java:90)
	at org.apache.jena.query.ResultSetFormatter.output(ResultSetFormatter.java:308)
	at org.apache.jena.query.ResultSetFormatter.outputAsCSV(ResultSetFormatter.java:631)
	at com.github.sparqlanything.cli.SPARQLAnything.executeQuery(SPARQLAnything.java:94)
	at com.github.sparqlanything.cli.SPARQLAnything.main(SPARQLAnything.java:510)
Caused by: java.io.FileNotFoundException: /Users/lgu/Desktop/NOTime/SA_experiment_temp/7900157629280057590/Data-0001/prefixes.dat (Too many open files)
	at java.base/java.io.RandomAccessFile.open0(Native Method)
	at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:346)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:260)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:215)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:128)
	at org.apache.jena.dboe.base.file.ChannelManager.open$(ChannelManager.java:72)
	... 54 more

I'm not sure how to solve this. Maybe, we could just document this behaviour and suggest setting ondisk.reuse as "true" in case of sliced execution.
Alternatively, we could enforce by default ondisk.reuse=true when slicing is true.

luigi-asprino added a commit that referenced this issue Aug 27, 2022
luigi-asprino added a commit that referenced this issue Aug 27, 2022
@luigi-asprino
Copy link
Member Author

luigi-asprino commented Aug 27, 2022

Actually, slicing and ondisk don't work as expected.

The query

PREFIX  xyz:  <http://sparql.xyz/facade-x/data/>
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT  ?name ?surname ?movie
WHERE
  { SERVICE <x-sparql-anything:location=https://sparql-anything.cc/examples/simpleArray.json,slice=true,ondisk=/tmp>
      { ?p  xyz:name     ?name ;
            xyz:surname  ?surname ;
            xyz:movie    ?movie
      }
  }

gives

--------------------------
| name | surname | movie |
==========================
--------------------------

instead of

------------------------------------------
| name        | surname | movie          |
==========================================
| "Vincent"   | "Vega"  | "Pulp fiction" |
| "Winnfield" | "Vega"  | "Pulp fiction" |
| "Beatrix"   | "Kiddo" | "Kill Bill"    |
------------------------------------------

@luigi-asprino luigi-asprino added the Bug Something isn't working label Aug 27, 2022
@luigi-asprino luigi-asprino added this to the v0.8.0 milestone Aug 27, 2022
@justin2004
Copy link
Contributor

what if you add graph ?g -- does that get bindings?

@luigi-asprino
Copy link
Member Author

luigi-asprino commented Aug 30, 2022

Yes, I can get the bindings by adding the graph variable, but I would avoid this, otherwise the same query (without the graph) would give different results with ondisk.
This issue occurred for non-slicing execution (#280).
I was able to address it by telling the engine to consider the union graph of the TDB as the active graph on which evaluate the query.
I tried the same with sliced execution but it doesn't work.

@luigi-asprino
Copy link
Member Author

I close this issue as it is completed via f1355c0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants