Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a construct to distribute a total over of collection of coefficients #529

Closed
AltGr opened this issue Nov 9, 2023 · 12 comments
Closed
Assignees
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request
Milestone

Comments

@AltGr
Copy link
Contributor

AltGr commented Nov 9, 2023

This is needed in the French IR computation when applying a prorata ; the difficulty is that there may be rounding errors, and we want to keep the sum of the resulting collection equal to the total: to achieve this, the usual approach is to adjust for the rounding errors in the last element of the list.

One way of handling this could be an operator of type (weights: collection(decimal), total: money) -> collection(money), with a postcondition that
$$\text{total} \times \sum{\text{weights}} = \sum{\text{results}}$$

The resulting collection would be computed as
$$\left[w_1 \times \text{total}; ...; w_{n-1} \times \text{total}; \text{total} \times \sum{\text{weights}} - \sum_{i =1}^{n-1}{w_i \times \text{total}}\right]$$

A typical application would first compute the weights as ratios of a given collection of quantities over their sum (so that $\sum{\text{weights}} = 1$, and $\sum{\text{results}} = \text{total}$)

This could be implemented as an external utility function, but if it's of general utility it could even be an overload for the operator * over types collection(decimal) and money.


Another completely different idea would be to add two much more general operations over collections, from which the above could be derived as pure Catala functions:

  • remove_last: drops the last element of a collection
  • append: adds a given value at the end of a collection

To compute the above, you would

  1. if in the typical application described above, compute the sum of the complete collection and corresponding ratios
  2. remove the last element and compute the products of the ratios with the total
  3. compute the sum of that collection using the existing operators, and substract that from the total
  4. append the value from 3. at the end of the collection from 2.
@denismerigoux denismerigoux added ✨ enhancement New feature or request 🔧 compiler Issue concerns the compiler labels Nov 9, 2023
@AltGr
Copy link
Contributor Author

AltGr commented Nov 9, 2023

Here are some possible options in catala syntax:

Distribution of a total over a collection with prorata.

Pure prorata (with rounding errors) over a collection of money values

declaration prorata0a content collection money
  depends on values content collection money,
             total content money
  equals
  let s equals sum money of values in
  v * s / total for v among values

Pure prorata (with rounding errors) over a collection of structured values

declaration structure S:
  data base_value content money
  data scaled_value content money

declaration prorata0b content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
    { -- base_value: v.base_value
      -- scaled_value: v.base_value * s / total }
    for v among values

Implementation over a collection of money values

We suppose the existence of * over collection decimal and money defined as
in #529

declaration prorata1 content collection money
  depends on values content collection money,
             total content money
  equals
  let s equals sum money of values in
  let weights equals v / s for v among values in
  total * weights

Implementation over a collection of structured values (with map2)

We suppose the existence of a map2 function on collections

declaration set_scaled content S
  depends on v content S,
             scaled_value content money
  equals
  S { -- base_value: v.base_value
      -- scaled_value: scaled_value }

declaration prorata2 content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
  let weights equals v.base_value / s for v among values in
  let scaled equals total * weights in
  map2 of set_scaled, values, scaled

Or with specific syntax:

declaration prorata3 content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
  let weights equals v.base_value / s for v among values in
  let scaled equals total * weights in
  S { -- base_value: v.base_value
      -- scaled_value: scaled_value }
  for (v, scaled_values) among (values, scaled)

Implementation over a collection of structured values (with specific construct)

declaration prorata4 content collection S
  depends on values content collection S,
             total content money
  equals
  dispatch total as r over v.base_value for v among values using
    S { -- base_value: v.base_value
        -- scaled_value: r }

Ok this is very concise but also both very specific (all the computed sums and
divisions are implicit) and quite involved: internally this would apply an
operator over total, values, a getter function fun v -> v.base_value, and
a setter function fun (v, r) -> S { ... }

(also this is assuming it's doable to integrate something like this in the
syntax, it often turns out being pretty tricky)

@AltGr
Copy link
Contributor Author

AltGr commented Nov 10, 2023

Assuming we add tuples to the language

This is similar to the map2 approach, but instead we suppose generic support
for tuples:

  • (a, b) is allowed as an expression and is the tuple value containing both a and b
  • (a, b) is allowed in patterns, and instead of a variable in binding positions (let <here> equals ..., for <here> among ..., depends on <here> content ..., ...)

(note: adding this to the language will represent some work)

In addition, we suppose a recombination built-in function that takes two same-length collections and returns a collection of pairs (since ordering on collections is not specified at the moment, we may reinforce the same-length constraint into a "same origin" constraint --- checked at runtime based on a unique identifier for collections that is only preserved through the map and recombination operators)

declaration prorata5 content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
  let weights equals v.base_value / s for v among values in
  let scaled equals total * weights in
  S { -- base_value: v.base_value
      -- scaled_value: scaled_value }
  for (v, scaled_values) among recombination of values, scaled

@AltGr
Copy link
Contributor Author

AltGr commented Nov 10, 2023

Note that I used lets but the inline version may not be much harder to understand, I am not sure. For example, this last example could be rewritten as:

declaration prorata5 content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
  S { -- base_value: v.base_value
      -- scaled_value: scaled_value }
    for (v, scaled_values)
    among recombination of
            values,
            total * (v.base_value / s for v among values)

@AltGr
Copy link
Contributor Author

AltGr commented Nov 10, 2023

Another completely random idea, which at first seems more convoluted but might in the end prove easier to grasp: lenses

Assuming we already have the prorata (here written as $\text{total} \times \text{collection}$) operator, what we want here in practice is to work within a certain field of a structure. The above use of map2 and consorts is just to deconstruct and reconstruct the structure, for which we need deconstructor and constructor functions. We have in practice here be proposing various syntaxes here to specify these two functions, more or less implicitely.

Lenses are just about combining these constructor/destructor functions for focusing inside a structure.

Not entering into details here, the implementation could be complex, but it could look something like:

declaration prorata6 content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
  let coll equals # Just copy base_value to scaled_value
    S { -- base_value: v.base_value
        -- scaled_value: v.base_value }
      for v among values
  in
  total * (values with focus on field scaled_value)

with focus on field constructs a lens: the following operations will be applied on the given field, but the structure rebuilt afterwards (preserving other fields)

@AltGr
Copy link
Contributor Author

AltGr commented Nov 10, 2023

One more idea 😅

this could be made a tad lighter with a construction set_column[field] that would be a specialised (less expressive) version of map2:

declaration prorata7 content collection S
  depends on values content collection S,
             total content money
  equals
  let s equals sum money of values in
  set_column[scaled_value](values, total * (v.base_value / s for v among values))

(note that we could achieve a similar result by defining a set_column_scaled_value function, which can be done using map2 above. You have to write some boilerplate though.)

@AltGr
Copy link
Contributor Author

AltGr commented Nov 17, 2023

Yet another not so unreasonable possibility: if we add maps (a.k.a. dictionaries) to the language, maybe it gives an angle for understanding the above that is easier to grasp ?

The downside would be that we need ids under which the different members are registered, but the upside is that we still don't need to specify an ordering, and it becomes transparent that the correct scaled value is applied to the correct member.

We could use simple integers denoting the indexes of the entries in the original collection as keys, to begin with.

We now assume that we have prorata which works on the values of a map (instead of a collection), and the syntax some_map[some_key] to access the value corresponding to some_key within the map some_map.

declaration prorata7 content dictionary S
  depends on values content dictionary S,
             total content money
  equals
  let s equals sum money of values in # should work just like on collections
  let scaled_values equals prorata of
    total,
    (s.base_value / s) for (id, s) among values
  in
  S { -- base_value: v.base_value
      -- scaled_value: scaled_values[id] }
    for (id, s) among values

@denismerigoux denismerigoux added this to the Catala v1.0 milestone Nov 28, 2023
@denismerigoux
Copy link
Contributor

So we had a meeting today with the syntax committee (Sarah and Liane) to refine our decision among the last two competing proposals :

Proposition 1 : integrated syntax

declaration structure HouseholdMember:
  data birthdate content date
  data revenue content money

declaration structure HouseholdMemberTaxed:
  data member content HouseholdMember
  data tax content money

declaration individual_tax_amount content collection HouseholdMemberTaxed
  depends on members content collection HouseholdMember,
             tax_to_distribute content money
  equals
    dispatch
      tax_to_distribute
    as
      tax_amount
    over
      member.revenue
    for
      member
    among
      members
    using
      Household_member_taxed {
        -- member: member
        -- tax: tax_amount
      }

Proposition 2 : a pro-rata function returning a list of pairs

declaration structure HouseholdMember:
  data birthdate content date
  data revenue content money

declaration structure HouseholdMemberTaxed:
  data member content HouseholdMember
  data tax content money

declaration individual_tax_amount collection HouseholdMemberTaxed
  depends on members collection HouseholdMember,
             tax_to_distribute content money
  equals
  let total_revenue equals sum money of (member.revenue for member among members) in
  let members_with_tax_amounts equals
    distribute_amount_among_list_with_proratas of
      tax_to_distribute,
      members,
      (member.revenue / total_revenue for member among members),
  in
  HouseholdMemberTaxed {
    -- member: member
    -- tax: tax_amount
  }
  for (member, tax_amount)
  among members_with_tax_amounts

Decision !

We prefer option 2. That involves having pairs in the language and a pattern destructuring available at least the map operation. Careful : the distribute_amount_among_list_with_proratas takes two lists that need to be in the same order. So map has to preserve order when creating a new list. This is confusing with respect to the implicit meaning of the collection keyword. So we need to take a new decision (#534).

@AltGr
Copy link
Contributor Author

AltGr commented Jan 25, 2024

Update: there were some issues (implementation-wise) with the selected proposal, but I could come up with something reasonably close, for which I now have a working prototype.

Hopefully that remains readable (the type definitions are the same as above):

declaration individual_tax_amount content list of HouseholdMemberTaxed
  depends on members content list of HouseholdMember,
             tax_to_distribute content money
  equals
  let revenues equals member.revenue for member among members in
  let distributed_tax equals Ext.prorata of tax_to_distribute, revenues in
  HouseholdMemberTaxed {
    -- member: member
    -- tax: tax_amount
  }
  for (member, tax_amount) among (members, distributed_tax)

There are two important elements here:

  • Ext.prorata is defined in an "external module" called Ext (written in OCaml in this case), with the following declaration:
    declaration prorata content list of money
      depends on amount content money,
                        weights content list of money
    
  • This works on lists of money directly, so the second trick to reconstruct a list of HouseHoldMemberTaxed, is to use a new extended syntax for among that allows, in place of a single collection, a tuple of collections to be specified. Here we will first process the first element of members together with the first element of distributed_tax, then the two second elements, and so on — raising an error in case the lists don't have the same length. Internally, this tuple of lists is transformed into a list of tuples, allowing the traversal of both lists at the same time.

This remains to be evaluated by our Syntax Committee :)

@denismerigoux
Copy link
Contributor

Louis' new version is validated by the syntax committee (meeting of Jan 30th, 2024).

@denismerigoux
Copy link
Contributor

@AltGr I confirm the usefulness of your zip mapping syntax, I'm already using it in another place : https://gitlab.adullact.net/dgfip/ir-catala/-/blob/deficit_revenus_quotient/sources/cgi_revenus.catala_fr#L2070-2073

@AltGr
Copy link
Contributor Author

AltGr commented Mar 12, 2024

Now working OK through externals, see the example in the tests:

The part relevant to the syntax discussion above is in the 3rd file, where the piece missing is handled by the following syntax (same as proposed 3 comments up):

  HouseholdMemberTaxed {
    -- member: member
    -- tax: tax_amount
  }
  for (member, tax_amount) among (members, distributed_tax)

which is in fact a map2 allowing to construct the HouseholdMemberTaxed element from pairs of values taken by walking a pair of lists in parallel.

I think this issue can be closed (in favor of more general discussion on externals) ?

@denismerigoux
Copy link
Contributor

Thanks @AltGr !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants