Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

access file in zip: mixup absolute/relative url #335

Closed
VladimirAlexiev opened this issue Jan 26, 2023 · 5 comments
Closed

access file in zip: mixup absolute/relative url #335

VladimirAlexiev opened this issue Jan 26, 2023 · 5 comments
Assignees
Labels
Bug Something isn't working
Milestone

Comments

@VladimirAlexiev
Copy link

VladimirAlexiev commented Jan 26, 2023

This script (same as in #334) opens an archive (https://sparql-anything.readthedocs.io/en/latest/Configuration/#from-archive) and rdfizes files from it:

### rdfize-zip.sparql
# rdfize an archive (zip) of files. parameter values:
# -v zip: zip filename or URL
# -v file: file regexp pattern (default ~.*~)
# sparql-anything -q ../rdfize-zip.sparql -v zip=graphql-2023-01-24.zip > raw-rdf.ttl

prefix bsdd: <http://bsdd.buildingsmart.org/def#>
prefix xyz:  <http://sparql.xyz/facade-x/data/>
prefix fx:   <http://sparql.xyz/facade-x/ns/>
prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

construct {
  ?s ?p ?o
} where {
  service <x-sparql-anything:> {
    bind(coalesce(?__file,".*") as ?pattern)
    fx:properties fx:location ?_zip.
    fx:properties fx:archive.matches ?pattern.
    [] fx:anySlot ?file
    service <x-sparql-anything:media-type=application/json> {
      bind(str(bsdd:) as ?bsdd)
      fx:properties fx:namespace ?bsdd.
      fx:properties fx:location ?file.
      fx:properties fx:from-archive ?_zip.
      fx:properties fx:use-rdfs-member true.
      ?s ?p ?o
    }
  }
}

fails with the following error:

[main] ERROR com.github.sparqlanything.engine.FacadeXOpExecutor - An error occurred: 
java.io.FileNotFoundException: 
tmp\cd81ed69fe4de812bb3928afd960e85d\http:\sparql.xyz\facade-x\ns\root 
(The filename, directory name, or volume label syntax is incorrect)

[main] ERROR com.github.sparqlanything.cli.SPARQLAnything - Iteration 1 failed with error: 
java.io.IOException: java.io.FileNotFoundException:
 tmp\cd81ed69fe4de812bb3928afd960e85d\http:\sparql.xyz\facade-x\ns\root 
(The filename, directory name, or volume label syntax is incorrect)

I see two problems here:

  • forward slashes in fx: are replaced with backward slashes. Maybe that comes from Windows/Cygwin
  • the absolute URL fx:root is appended to tmp\cd81ed69fe4de812bb3928afd960e85d: WHY?
    • Notice that I set fx:namespace, but to a different namespace: to rename xyz -> bsdd (BSDD is the data we're working with)
@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Jan 26, 2023

This command causes the failure:

sparql-anything.bat -q ../scripts/rdfize-zip.sparql -v zip=graphql-2023-01-24.zip -v "file=\w+.json"

A script that merely finds and lists the files works ok:

### list-zip.sparql
prefix xyz:  <http://sparql.xyz/facade-x/data/>
prefix fx:   <http://sparql.xyz/facade-x/ns/>
prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

select ?file
where {
  service <x-sparql-anything:> {
    bind(coalesce(?__file,".*") as ?pattern)
    fx:properties fx:location ?_zip.
    fx:properties fx:archive.matches ?pattern.
    [] fx:anySlot ?file
  }
}

Invoking it finds the right files:

# sparql-anything.bat -q ../scripts/list-zip.sparql -v zip=graphql-2023-01-24.zip -v "file=\w+.json"
file
countries.json
domains.json
languages.json
properties.json
reference_documents.json
units.json

@VladimirAlexiev VladimirAlexiev changed the title access file in zip fails on cygwin (mixup absolute/relative url) access file in zip: mixup absolute/relative url Jan 26, 2023
@mihailradkov
Copy link

This issue seems to be caused by service <x-sparql-anything:media-type=application/json>, more specifically by the media-type configuration. Removing it resolves the problem immediatelly.
Not sure why this breaks the processing as the files are indeed JSONs.

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Jan 27, 2023

Which is even more puzzling since this simpler script that RDFizes a single file works ok.
This is the "inner loop" of the more complex script on top:

### rdfize.sparql
# sparql-anything -q rdfize.sparql -v file=../samples/class-IfcWall.ELEMENTEDWALL.json
construct {
  ?s ?p ?o
} where {
  SERVICE <x-sparql-anything:media-type=application/json> {
    bind(str(bsdd:) as ?bsdd)
    fx:properties fx:namespace ?bsdd.
    fx:properties fx:location ?_file.
    fx:properties fx:use-rdfs-member true.
    ?s ?p ?o
  }
}

tmp\cd81ed69fe4de812bb3928afd960e85d is a folder where sparql-anything unzips the archive.
But why does it then append http:\sparql.xyz\facade-x\ns\root instead of the file name it found?

PS: I've named the code excerpts for easier reference.

@enridaga
Copy link
Member

But why does it then append http:\sparql.xyz\facade-x\ns\root instead of the file name it found?

This looks like a bug

@luigi-asprino
Copy link
Member

The problem is the same as (or at least related to) the #334
I close the issue so we can discuss a common solution on the #334 thread.

luigi-asprino added a commit that referenced this issue Feb 9, 2023
luigi-asprino added a commit that referenced this issue Feb 9, 2023
@enridaga enridaga added this to the v0.9.0 milestone Feb 9, 2023
@enridaga enridaga added the Bug Something isn't working label Feb 9, 2023
luigi-asprino added a commit that referenced this issue Feb 10, 2023
luigi-asprino added a commit that referenced this issue Feb 10, 2023
@luigi-asprino luigi-asprino modified the milestones: v0.9.0, v0.8.2 May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants