Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counting, filtering and additional features #123

Closed
jacoscaz opened this issue Nov 5, 2017 · 8 comments
Closed

Counting, filtering and additional features #123

jacoscaz opened this issue Nov 5, 2017 · 8 comments

Comments

@jacoscaz
Copy link
Contributor

jacoscaz commented Nov 5, 2017

Small premise... Although these are three different issues, they are highly correlated and I would expect the conversation around them to easily go from one to the other. Hence why I've decided to group them together.

Counting (Source interface)

At the moment there is no standardized way to anticipate how many quads would be returned by any given .match() invocation. This is particularly troublesome for query planning. What about a .count(subject, predicate, object, graph) method?

Filtering (Source interface)

Particularly when dealing with timeseries and highly selective filters on large datasets, having to filter quads in-memory can lead to significant waste of resources. To implement more advanced filtering whilst maintaining API compat. with the current .match() implementation, something like this could work:

.match(<NamedNode>, <NamedNode>, [ {lt: <Literal>, gt: <Literal>} ])

The basic idea is to allow implementors to optionally pass arrays of filters instead of Terms or RegExps.

Custom features

Not everything can be standardized. Any feature that is relevant to only a few use cases, such as the advanced filtering mentioned above, might not be everyone's cup of tea. However, it would be nice to have a standardized way to advertise such features, so that other components using RDF/JS interfaces would be able to make use of these non-standard features when possible while normally defaulting to standard expectations. Something like the following could work:

source.supportedFeatures = ['source-filters']

@elf-pavlik
Copy link
Member

elf-pavlik commented Nov 6, 2017

What about a .count(subject, predicate, object, graph) method?

👍 as I understand they work as estimates so possibly method name should reflect it eg. countEstimate() - https://wiki.postgresql.org/wiki/Count_estimate

However, it would be nice to have a standardized way to advertise such features, so that other components using RDF/JS interfaces would be able to make use of these non-standard features when possible while normally defaulting to standard expectations.

👍 feature detection, in current spec I think we have

variable() returns a new instance of Variable. This method is optional.
support of which one can rather easily detect.

We could maybe stay more specific about errors and some features could get detected by

try {
  // use source-filters
} catch (err) {
  // err - source-filters not supported ?
}

@jacoscaz
Copy link
Contributor Author

jacoscaz commented Nov 6, 2017

@elf-pavlik yes to .countEstimate() rather than .count(). I think this could be made optional, like .variable(), or it could return undefined if count is not supported or not available for any other reason. On feature detection, I feel some features would be better served by being able to detect them prior to usage of the specific interface implementor. In my specific use case, the query engine has to process FILTER clauses differently when a Source supports filters.

@jacoscaz
Copy link
Contributor Author

jacoscaz commented Nov 6, 2017

@l00mi I don't think these would belong to the high level API. Quoting #87

building on the low level api primitives

These issues do not build upon other primitives. They provide additional - and in my case fundamental, particularly when talking about count estimates - primitives upon which to build. I will put some time into researching the state of the high level API (I haven't had the time to track that so far) and comment in the other issue.

@l00mi
Copy link
Member

l00mi commented Nov 6, 2017

Hmm, I see your point. Then again one of the main points of the low-level api is as stated to create:

This definition strives to provide the minimal necessary interface to enable interoperability of libraries such as serializers, parsers and higher level accessors and manipulators.

This origins from the idea that it should be easy to integrate this minimal spec to make libraries interoperable (full stop for the low-level API). Adding more primitives can make this to steep to adapt (or follow in case of new) libraries.

We might discuss again about "optional" methods (like a guideline)? Then again this kinda of renders a spec obsolete.

I guess the high-level API can extend the low-level API @bergos, @RubenVerborgh ?

For another Issue: Also we might start do define different levels of API interoperability. minimal, basic, comfort ? But this might make stuff to complex?

@jacoscaz
Copy link
Contributor Author

jacoscaz commented Nov 6, 2017

@l00mi I understand your point, as well. I think one way to strike a good balance would be through the semantic difference between a Store instance and a Source instance. A Store instance is assumed to persist quads over time and, as such, I feel that an additional .countEstimate() method would only reflect its nature of being a storage medium, just as the .remove(), .removeMatches() and .deleteGraph() methods.

For filtering, perhaps the spec could be extended to <Term> | <RegExp> | <Mixed> while making it optional to support <Mixed>? This would allow implementors to still be spec-compliant while supporting arrays of filters (in my case) or other ways to define matching criteria.

@jacoscaz
Copy link
Contributor Author

A semi-working basic implementation of filtering in quadstore: https://github.com/beautifulinteractions/node-quadstore/blob/8961b4656994a9eccd7cb16bf621afb3483d3156/test/rdfstore.prototype.match.js#L202-L204 . Still very much WIP.

@jacoscaz
Copy link
Contributor Author

@RubenVerborgh has already done some work related to this with TPF: https://ruben.verborgh.org/publications/vanherwegen_iswc_2015/

@bergos
Copy link
Member

bergos commented Jan 25, 2019

Closed based on the resolution in #136

@bergos bergos closed this as completed Jan 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants