-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query clauses for grouping and aggregation #441
Comments
Also need to consider how this will work with event streaming #440. |
@jclark |
This is the relevant section of the doc you mentioned (we should avoid links to non-public docs here):
Groups give rise to frames that have indexed variable bindings and a special binding for the number of incoming frames in the group. An indexed variable binding is a variable that has a separate value for each index in the group. Variables bound by preceding clauses other than those used as grouping keys become indexed variables. The syntax
is short for
A group by clause is executed as follows
An indexed variable binding can only be referenced in an indexed context. The expression in an aggregate-function-call-expr or aggregate-list-constructor-expr is an indexed context. An aggregate-function-call-expr is evaluated as follows:
Example
Note that you can use a where clause after group by, which provides functionality similar to an SQL HAVING clause.
|
This design above isn't final yet, because I haven't yet worked through how it would be extended to the unbounded stream case. Some sort of window concept might be needed, and that might affect the design in the bounded case. |
Google Charts query language has a pivot clause, which looks quite useful. It has similar functionality to pivot tables, which are a key feature in the spreadsheet world. We should consider whether this can fit into our approach to aggregation. |
The design in #441 (comment) requires The
For point 1, we are now predeclaring int, float, and decimal prefixes, so I think it is easy enough to say e.g.
We can make
expand into For point 2, we can instead put Still to do: find a clean way to do |
See https://blog.brownplt.org/2021/11/21/b2t2.html for some interesting design benchmarks. |
We need to make sure we can aggregate over the whole table. Instead of ending with |
I now think it's better to have an explicit syntax for an aggregated call expression. Then you can aggregate over the whole table just by using an aggregated function call expression in the
Here the This is designed to be similar to how in SQL you would say A more complicated example:
We could provide extensibility by alllowing a qualified name for the aggregation function. This would refer to a function defined in a module with a The syntax has a |
Notes on specific aggregation functions
|
We could simplify things to start with by allowing an aggregated call only for the entire select expression, but the semantics make sense without this. For example, if you didn't have an
and we will need this with |
The other way to approach this is to use an alternative keyword/operator instead of select, which results in the frame variables being bound to lists e.g.
In this case, This is semantically simpler and cleaner, but the user has to explicitly specify a qualified function name, which would be particularly awkward for It also doesn't handle expressions as nicely:
So maybe the way to do things is to desugar into something like this. |
The two basic approaches are:
The first is simpler and so preferable, other things being equal. The challenge is to make the ergonomics of the first approach good. Another idea for the first approach is to leverage streams. The idea is that
So this would look something like:
In this example, use of One challenge is langlib typing isn't strong enough currently to make |
My current preferred solution is as follows:
So a simple example would look like:
More complex example:
From a user perspective, effect is that the last argument is repeated once for each variable binding. This can be generalized to work with
The
|
Note that we could potentially do this with any incompatibilities (and so in Swan Lake Updates) by having some cleverness on how we recognize "group by". |
I think |
I agree. I updated #441 (comment) to use As you observe, with this We could allow |
I think a better way to explain "sequenced-binding" or "indexed-binding" is to say that
This connects to #52. |
We can split this up into a number of separate points:
|
All done now. |
Similar to
group by
in SQL.We can split this into the following issues as described in #441 (comment):
group by
clause #1134collect
query clause for doing aggregation #1137The text was updated successfully, but these errors were encountered: