Counting, filtering and additional features #123

jacoscaz · 2017-11-05T21:20:48Z

Small premise... Although these are three different issues, they are highly correlated and I would expect the conversation around them to easily go from one to the other. Hence why I've decided to group them together.

Counting (Source interface)

At the moment there is no standardized way to anticipate how many quads would be returned by any given .match() invocation. This is particularly troublesome for query planning. What about a .count(subject, predicate, object, graph) method?

Filtering (Source interface)

Particularly when dealing with timeseries and highly selective filters on large datasets, having to filter quads in-memory can lead to significant waste of resources. To implement more advanced filtering whilst maintaining API compat. with the current .match() implementation, something like this could work:

.match(<NamedNode>, <NamedNode>, [ {lt: <Literal>, gt: <Literal>} ])

The basic idea is to allow implementors to optionally pass arrays of filters instead of Terms or RegExps.

Custom features

Not everything can be standardized. Any feature that is relevant to only a few use cases, such as the advanced filtering mentioned above, might not be everyone's cup of tea. However, it would be nice to have a standardized way to advertise such features, so that other components using RDF/JS interfaces would be able to make use of these non-standard features when possible while normally defaulting to standard expectations. Something like the following could work:

source.supportedFeatures = ['source-filters']

The text was updated successfully, but these errors were encountered:

elf-pavlik · 2017-11-06T03:27:42Z

What about a .count(subject, predicate, object, graph) method?

👍 as I understand they work as estimates so possibly method name should reflect it eg. countEstimate() - https://wiki.postgresql.org/wiki/Count_estimate

However, it would be nice to have a standardized way to advertise such features, so that other components using RDF/JS interfaces would be able to make use of these non-standard features when possible while normally defaulting to standard expectations.

👍 feature detection, in current spec I think we have

variable() returns a new instance of Variable. This method is optional.
support of which one can rather easily detect.

We could maybe stay more specific about errors and some features could get detected by

try {
  // use source-filters
} catch (err) {
  // err - source-filters not supported ?
}

jacoscaz · 2017-11-06T08:30:36Z

@elf-pavlik yes to .countEstimate() rather than .count(). I think this could be made optional, like .variable(), or it could return undefined if count is not supported or not available for any other reason. On feature detection, I feel some features would be better served by being able to detect them prior to usage of the specific interface implementor. In my specific use case, the query engine has to process FILTER clauses differently when a Source supports filters.

jacoscaz · 2017-11-06T08:41:00Z

@l00mi I don't think these would belong to the high level API. Quoting #87

building on the low level api primitives

These issues do not build upon other primitives. They provide additional - and in my case fundamental, particularly when talking about count estimates - primitives upon which to build. I will put some time into researching the state of the high level API (I haven't had the time to track that so far) and comment in the other issue.

l00mi · 2017-11-06T08:50:53Z

Hmm, I see your point. Then again one of the main points of the low-level api is as stated to create:

This definition strives to provide the minimal necessary interface to enable interoperability of libraries such as serializers, parsers and higher level accessors and manipulators.

This origins from the idea that it should be easy to integrate this minimal spec to make libraries interoperable (full stop for the low-level API). Adding more primitives can make this to steep to adapt (or follow in case of new) libraries.

We might discuss again about "optional" methods (like a guideline)? Then again this kinda of renders a spec obsolete.

I guess the high-level API can extend the low-level API @bergos, @RubenVerborgh ?

For another Issue: Also we might start do define different levels of API interoperability. minimal, basic, comfort ? But this might make stuff to complex?

jacoscaz · 2017-11-06T14:12:05Z

@l00mi I understand your point, as well. I think one way to strike a good balance would be through the semantic difference between a Store instance and a Source instance. A Store instance is assumed to persist quads over time and, as such, I feel that an additional .countEstimate() method would only reflect its nature of being a storage medium, just as the .remove(), .removeMatches() and .deleteGraph() methods.

For filtering, perhaps the spec could be extended to <Term> | <RegExp> | <Mixed> while making it optional to support <Mixed>? This would allow implementors to still be spec-compliant while supporting arrays of filters (in my case) or other ways to define matching criteria.

jacoscaz · 2017-11-18T21:12:03Z

A semi-working basic implementation of filtering in quadstore: https://github.com/beautifulinteractions/node-quadstore/blob/8961b4656994a9eccd7cb16bf621afb3483d3156/test/rdfstore.prototype.match.js#L202-L204 . Still very much WIP.

jacoscaz · 2018-09-10T15:47:42Z

@RubenVerborgh has already done some work related to this with TPF: https://ruben.verborgh.org/publications/vanherwegen_iswc_2015/

bergos · 2019-01-25T14:24:46Z

Closed based on the resolution in #136

l00mi added the HighLevel API label Nov 6, 2017

l00mi mentioned this issue Nov 6, 2017

Initial High Level spec including Dataset interface #95

Closed

l00mi removed the HighLevel API label Nov 6, 2017

jacoscaz mentioned this issue Sep 26, 2018

Store matching RegExps #128

Closed

bergos mentioned this issue Jan 14, 2019

Scope of the spec and this repository #136

Closed

bergos mentioned this issue Jan 21, 2019

feature detection #141

Open

bergos closed this as completed Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counting, filtering and additional features #123

Counting, filtering and additional features #123

jacoscaz commented Nov 5, 2017

elf-pavlik commented Nov 6, 2017 •

edited

Loading

jacoscaz commented Nov 6, 2017 •

edited

Loading

jacoscaz commented Nov 6, 2017

l00mi commented Nov 6, 2017 •

edited

Loading

jacoscaz commented Nov 6, 2017 •

edited

Loading

jacoscaz commented Nov 18, 2017

jacoscaz commented Sep 10, 2018

bergos commented Jan 25, 2019

Counting, filtering and additional features #123

Counting, filtering and additional features #123

Comments

jacoscaz commented Nov 5, 2017

Counting (Source interface)

Filtering (Source interface)

Custom features

elf-pavlik commented Nov 6, 2017 • edited Loading

jacoscaz commented Nov 6, 2017 • edited Loading

jacoscaz commented Nov 6, 2017

l00mi commented Nov 6, 2017 • edited Loading

jacoscaz commented Nov 6, 2017 • edited Loading

jacoscaz commented Nov 18, 2017

jacoscaz commented Sep 10, 2018

bergos commented Jan 25, 2019

elf-pavlik commented Nov 6, 2017 •

edited

Loading

jacoscaz commented Nov 6, 2017 •

edited

Loading

l00mi commented Nov 6, 2017 •

edited

Loading

jacoscaz commented Nov 6, 2017 •

edited

Loading