Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv option proposal: fx:csv.triple-patterns #347

Open
justin2004 opened this issue Feb 14, 2023 · 8 comments
Open

csv option proposal: fx:csv.triple-patterns #347

justin2004 opened this issue Feb 14, 2023 · 8 comments

Comments

@justin2004
Copy link
Contributor

Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.

In order to allow an easy transition (for users) from Tarql to SPARQL Anything, what if we add an option for csv files that would do the following...

justin@parens$ cat proposal.csv 
name,age,dog
bob,32,fido
jane,,sammy

In order to capture the values we currently need to express a triple pattern for each column like:

SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
                  fx:csv.null-string  "" .
       optional { ?row xyz:name ?name .}
       optional { ?row xyz:age ?age . }
       optional { ?row xyz:dog ?dog . }
      }
  }

which yields:

row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido

The proposal is to allow this query:

SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
                  fx:csv.null-string  "" ;
                  fx:csv.triple-patterns  "true" .
      }
  } 

to produce this:

row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido

So that means fx:csv.triple-patterns "true" causes these triple patterns to get inserted implicitly behinds the scenes:

       optional { ?row xyz:name ?name .}
       optional { ?row xyz:age ?age . }
       optional { ?row xyz:dog ?dog . }
@rjyounes
Copy link

Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.

@justin2004
Copy link
Contributor Author

@rjyounes
what if, in a single csv, one column is "state_city" and another is "state city" ?
how does TARQL handle the collision?

@rjyounes
Copy link

Good question. I haven't ever encountered it. Possibly some hand-correction is required.

@enridaga
Copy link
Member

Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.

Indeed, currently, we are just making those strings URL-safe, which results in some unintuitive %20 appearing. Maybe we can think about adding an option to treat them as web page slugs, but even with that, there can be cases where the result is not intuitive anyway (cases, special chars, etc...).

@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?

We already have this problem, sometimes CSVs repeat column names multiple times. We just add _1 etc... not great but intuitive enough.

@enridaga
Copy link
Member

Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.

OK, now on the main point. I like the idea of providing a default triple pattern. It's interesting how you would get the same behaviour with the following:

{ fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
       [] xyz:name ?name ;
          xyz:age ?age ;
          xyz:dog ?dog . 
      }

without headers, we would need to add a convention for the variable name ?col_1 etc...

{ fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
       [] rdf:_1 ?col_1 ;
          rdf:_2 ?col_2 ;
          rdf:_3 ?col_3 . 
      }

@justin2004
Copy link
Contributor Author

@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.

@enridaga
Copy link
Member

@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.

Even if we remove the null-string option?

@justin2004
Copy link
Contributor Author

Even if we remove the null-string option?

oh, if we don't assert the null-string option then that might be the Tarql behavior.

but i do know that my team likes using the null-string option with the SPARQL Anything OPTIONAL triple patterns (as they transition from Tarql to SPARQL Anything).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants