Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow query frames to create aggregated variables #1144

Closed
jclark opened this issue Aug 1, 2022 · 13 comments
Closed

Allow query frames to create aggregated variables #1144

jclark opened this issue Aug 1, 2022 · 13 comments
Labels
Area/Lang Relates to the Ballerina language specification
Milestone

Comments

@jclark
Copy link
Collaborator

jclark commented Aug 1, 2022

This is part of #441.

The new group by #1134 and collect #1137 query clauses can cause frames to have aggregated variables. When the binding of a variable x is aggregated, the value of x is an list of values, but references to x are treated specially. Each value of the variable in the list comes from a frame that has been aggregated with other frames.

When a variable x is aggregated, it is similar to each reference to x being treated as ...x, except that:

  • when the reference occurs in a function call and the function name is unqualified, the function is looked up in langlib using the type of the variable, similarly to a method call
  • when an argument to a function call contains a reference to such a variable, the entire argument expression is repeated.

For example:

decimal wageBill =
  from var { salary, bonus } in persons
  collect sum(salary + bonus);

Here sum will be resolved to decimal:sum.

The expression salary + bonus will be evaluated once for each index (they are guaranteed to have the same length). So it is equivalent to:

decimal wageBill =
  from var { salary, bonus } in persons
  let decimal total = salary + bonus
  collect sum(total);

which is in turn equivalent to

decimal[] totals =
  from var { salary, bonus } in persons
  let decimal total = salary + bonus
  select total;
decimal wageBill = decimal:sum(...totals);

If you want to get at the value of an aggregated variable as a list, you can do [x], which will work the same as [...x] when x is array.

Originally posted by @jclark in #1134 (comment)

@jclark
Copy link
Collaborator Author

jclark commented Aug 1, 2022

Note that a collect clause can result in an aggregated variable whose value is the empty list, whereas a group by clause cannot.

@jclark jclark added this to the Swan Lake Update 3 milestone Aug 1, 2022
@jclark
Copy link
Collaborator Author

jclark commented Aug 2, 2022

Need to think about what happens when you try to aggregate an already aggregated variable, which can happen when there is more than one group by or collect clause.

@jclark
Copy link
Collaborator Author

jclark commented Aug 4, 2022

This part

when an argument to a function call contains a reference to such a variable, the entire argument expression is repeated

is problematic. Not clear whether:

from var x in stuff [1, 2] collect m:foo(m:bar(x))

evaluates to

m:foo(m:bar(1), m:bar(2))

or

m:foo(m:bar(1, 2))

?

@jclark
Copy link
Collaborator Author

jclark commented Aug 5, 2022

The only way I see so far to make this work coherently is to allow a reference to an aggregated variable only as the entire expression in an arg-list or list-member-list (i.e. in a place where ...x would be allowed for a list x).

So to get the first case in the previous comment, you would have to write

from var x in stuff [1, 2]
let var b = m:bar(x)
collect m:foo(b)

@jclark
Copy link
Collaborator Author

jclark commented Aug 5, 2022

One could maybe allow something like

from var x in stuff [1, 2] collect m:foo(...m:bar(x))

I think that is going to become too hard to understand.

@jclark
Copy link
Collaborator Author

jclark commented Aug 5, 2022

If we completely get rid of the aggregated variable magic, then simple things become harder e.g. instead of

decimal salaryBill =
  from var { salary } in persons collect sum(salary);

users would have to do

decimal salaryBill =
  from var { salary } in persons collect decimal:sum(...salary);

which isn't much different from

decimal salaryBill =
  decimal:sum(...from var { salary } in persons select salary);

Or for group by, currently we have

from var { country, salary } in persons 
group by country
select { country, totalSalary: sum(salary) };

which would turn into

from var { country, salary } in persons 
group by country
select { country, totalSalary: decimal:sum(...salary) };

So I think our current magic is still worthwhile.

@jclark jclark changed the title Allow query frames to have aggregated variables Allow query frames to create aggregated variables Oct 18, 2022
@jclark jclark added the Area/Lang Relates to the Ballerina language specification label Dec 20, 2022
@KavinduZoysa
Copy link
Contributor

@jclark, should we allow the following cases,
1.

const ALLOWANCE = 12.1;
decimal wage = from var { salary } in persons
                          collect sum(salary + ALLOWANCE);

In this case, a constant is added to the aggregated variable.

decimal maxWage = from var { salary, bonus } in persons
                                 collect sum(salary + max(bonus));

In this case, a result of an aggregated function is added to the aggregated variable.

@jclark jclark modified the milestones: 2023R1, 2013R2 Apr 25, 2023
@KavinduZoysa
Copy link
Contributor

@jclark, in the parser level we have to represent both select-clause and collect-clause using common name. We came up with two options.

  1. projection-clause
  2. output-clause
    What is the best option? Or if you have any other option please suggest.

@hasithaa
Copy link
Contributor

In the current version of the specification, the semantics of the select clause are explained as "... returning output values". The new collect clause carries a similar meaning. Therefore, referring to them as output-clauses seems fitting.

Additionally, there was a point when the select clause was referred to as the final clause. Hence, another alternative could be to label both of them as final clauses. But I personally prefer the first one.

@jclark
Copy link
Collaborator Author

jclark commented May 30, 2023

The spec currently uses the terminology intermediate-clause for the clauses between the from-clause and the select-clause.

I'm not all that keen on output-clause. Although the select-clause can be thought of as outputting values, I don't think that's the right model for something when you add collect clauses; the query-expression as a whole does not have output; it has a result (like other expressions). So we should think of a select/collect clause as constructing the query-expression result from the sequence of values emitted by the pipeline.

I would therefore suggest result-clause.

jclark added a commit that referenced this issue Jun 14, 2023
Major rework of query expressions.
Also affects spreads.
Fixes #1134,  #1137, #1144, #1252.
@jclark
Copy link
Collaborator Author

jclark commented Jun 14, 2023

Fixed by 5e76414

@jclark jclark closed this as completed Jun 14, 2023
@jclark
Copy link
Collaborator Author

jclark commented Jun 14, 2023

It turned out that I didn't need a production for result-clause. Since a query-construct-type and an on-conflict clause make sense with select but not with collect, I made query-expr be query-select-expr|query-collect-expr, instead of introducing a result-clause as select-clause|collect-clause.

@jclark
Copy link
Collaborator Author

jclark commented Jun 14, 2023

@KavinduZoysa @hasithaa Please review the changes in 5e76414. These should deal with everything in #441 apart from #1135 and #1136.

jclark added a commit that referenced this issue Jun 15, 2023
Make it more similar to method calls, so it works properly with `count`.
Part of #1135 and #1144.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Lang Relates to the Ballerina language specification
Projects
None yet
Development

No branches or pull requests

3 participants