Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add langlib functions for aggregation #1135

Closed
jclark opened this issue Jul 5, 2022 · 8 comments
Closed

Add langlib functions for aggregation #1135

jclark opened this issue Jul 5, 2022 · 8 comments
Labels
Area/LangLib Relates to lang.* libraries sl-update-priority Priority for Swan Lake Updates
Milestone

Comments

@jclark
Copy link
Collaborator

jclark commented Jul 5, 2022

Part of #441.

At least:

  • avg (at least to lang.float and lang.decimal); probably int:avg should return a decimal (as in XQuery)
  • count (probably to lang.value)
  • some, every (to lang.boolean)

Notes here #441 (comment)

@jclark jclark added the Area/LangLib Relates to lang.* libraries label Jul 5, 2022
@jclark jclark added this to the Swan Lake Update 3 milestone Jul 5, 2022
@jclark
Copy link
Collaborator Author

jclark commented Aug 1, 2022

More details. All the functions here are public isolated and = external.

// lang.int
# Returns the average of its arguments.
# The result is returned as a decimal.
function avg(int n, int... ns) returns decimal;

// lang.decimal
# Returns the average of its arguments.
function avg(decimal x, decimal... xs) returns decimal;

// lang.float
# Returns the average of its arguments.
# Return NaN if there are no arguments
// As with min/max, for the float case we have NaN, so can support no arguments
function avg(float... xs) returns float;

// lang.boolean
# Returns true if one or more of its arguments are true and false otherwise.
# In particular, it returns false if there are no arguments.
function some(boolean... bs) returns boolean;
# Returns true if all of its arguments are true and false otherwise.
# In particular, it returns true if there are no arguments.
function every(boolean... bs) returns boolean;

// lang.value
# Returns the number of arguments.
function count(any|error...) returns int;
@typeParam
type Type any|error;
# Returns the first argument.
function first(Type, any|error..) returns Type;
# Returns the last argument.
function last(Type, Type...) returns Type;

@jclark
Copy link
Collaborator Author

jclark commented Aug 1, 2022

When min/max/first/last/avg have no args, then we want them to return (). We can handle this as part of the special semantics of function calls with arguments that are aggregated variables #1134 (comment).

@jclark
Copy link
Collaborator Author

jclark commented Apr 25, 2023

Question: what should decimal:avg do on overflow?

Possibilities:

  1. panic
  2. max or min possible decimal value
  3. 0
  4. nil
  5. float +/- infinity

Given that decimal addition panics, consensus is 1. People can trap this if they want. Note that decimals can represent values up to 10^6144, so this isn't a likely situation in practice.

@jclark jclark modified the milestones: 2023R1, 2013R2 Apr 25, 2023
@KavinduZoysa
Copy link
Contributor

When min/max/first/last/avg have no args, then we want them to return (). We can handle this as part of the special semantics of function calls with arguments that are aggregated variables #1134 (comment).

@jclark, do we need to return () from aggregated function or from the collect-clause? For example,

    var input = [{name: "Saman", price: 1}, {name: "Amal", price: 2}, {name: "Saman", price: 3}];
    var x = from var {name, price} in input
                where name == "No name"
                collect min(price);

is the value of x ()?

If yes, the output of the following query expression should be ().

    var x = from var {name, price} in input
                where name == "No name"
                collect avg(price1) + avg(price2);

@jclark
Copy link
Collaborator Author

jclark commented May 9, 2023

The answer to both questions is yes.

But this doesn't change the definition of the langlib functions. Rather it's part of the process by which min(price) gets handled when price is an aggregated variable: if price refers to an empty list, then the result is nil, otherwise int:min is called.

For the second case, the static type of avg(price1) and of avg(price2) is int?, so because of nil lifting https://ballerina.io/spec/lang/2022R4/#nil_lifting the type of avg(price1) + avg(price2) is also int?.

@KavinduZoysa
Copy link
Contributor

@jclark, based on the above comment, please help me to understand the following points.

  1. If query-expr ends with collect expression and if the static type of expression is T can we say that the static type of query-expr is T? ?
  2. Please consider the following example.
    var x = from var {name, price} in input
                where name == "No name"
                collect {average: avg(price)};

Is the output of x () or {average: ()}?
Also, for an example as shown below, we are getting an empty array as the result.

type Price record {|
    decimal p?;
|};

    Price[] x = from var {name, price} in [{name: "", price: 2}]
                       where name == "No name"
                       group by var _ = true
                       select {average: avg(price)};

@jclark
Copy link
Collaborator Author

jclark commented May 10, 2023

  1. If query-expr ends with collect expression and if the static type of expression is T can we say that the static type of query-expr is T? ?

No. The static type is T. The expression following collect is evaluated even if there are no input frames.

    var x = from var {name, price} in input
                where name == "No name"
                collect {average: avg(price)};

The output is {average: ()}. Some of the aggregation functions work just fine with 0 args (e.g. sum).

Note that you can also do something like min(0, price) to get 0 if there are no args.

type Price record {|
    decimal p?;
|};

    Price[] x = from var {name, price} in [{name: "", price: 2}]
                       where name == "No name"
                       group by var _ = true
                       select {average: avg(price)};

we are getting an empty array as the result.

That's correct. With group by, you don't create empty groups. The possibility that an aggregated variable may be an empty list can happen only with collect.

jclark added a commit that referenced this issue Jun 14, 2023
@jclark
Copy link
Collaborator Author

jclark commented Jun 14, 2023

My fix in 5e76414 doesn't handle count right. It needs to fall back to lang.value as with method calls.

jclark added a commit that referenced this issue Jun 14, 2023
jclark added a commit that referenced this issue Jun 15, 2023
Make it more similar to method calls, so it works properly with `count`.
Part of #1135 and #1144.
jclark added a commit that referenced this issue Jun 15, 2023
@jclark jclark closed this as completed in b74d147 Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/LangLib Relates to lang.* libraries sl-update-priority Priority for Swan Lake Updates
Projects
None yet
Development

No branches or pull requests

2 participants