Acir size increases exponentially as loop iterations increase, instead of linearly #4629

jfecher · 2024-03-25T15:57:03Z

Aim

Testing the code in https://github.com/zac-williamson/noir-poseidon1-array-bug.

Expected Behavior

Expected the acir opcode count and backend circuit size to increase by a constant 6 for each additional iteration of the loop in poseidon::bn254::permute.

Bug

Actual change in acir and backend circuit size for iteration total N is:

// N | Acir size | Acir change | Backend Circuit Size | circuit change
// 1 | 10        |             | 15                   |
// 2 | 16        | 6           | 21                   | 6
// 3 | 23        | 7           | 28                   | 7
// 4 | 31        | 8           | 36                   | 8
// 5 | 40        | 9           | 45                   | 9
// 6 | 50        | 10          | 55                   | 10
// 7 | 61        | 11          | 66                   | 11
// 8 | 73        | 12          | 78                   | 12
// 9 | 86        | 13          | 91                   | 13
// 10| 100       | 14          | 105                  | 14
// 11| 115       | 15          | 120                  | 15
// 12| 131       | 16          | 136                  | 16
// 13| 148       | 17          | 153                  | 17

To Reproduce

Project Impact

None

Impact Context

No response

Workaround

None

Workaround Description

No response

Additional Context

No response

Installation Method

None

Nargo Version

No response

NoirJS Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

The text was updated successfully, but these errors were encountered:

jfecher · 2024-03-25T16:36:01Z

I've confirmed that SSA increases linearly by 14 extra instructions each time. The extra instructions are (in order):

2x add
4x mul
4x add then mul

(The last iteration of the loop only has 2x add then mul since the last two modify array indices 1 & 2 which are unused in main and are thus eliminated by DIE)

So this looks like an Acir-gen issue.

jfecher · 2024-03-25T16:48:13Z

I've confirmed this is unrelated to array copying in Acir. This makes sense since no arrays are modified.

jfecher · 2024-03-25T16:51:25Z

For reference, the SSA for a run with 3 iterations looks like this:

acir fn main f0 {
  b0(v0: Field, v1: Field):
    // permute iteration 1
    v770 = add v0, v1
    v775 = add v0, Field -5512639236676924786014155380588025764096985869754559557744523208552434701087
    v776 = mul v775, v775
    v777 = mul v776, v776
    v778 = mul v775, v777
    v779 = mul v778, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v780 = add v778, v779
    v781 = mul v1, Field 1781874611967874592137274483616240894881315449294815307306613366069350853425
    v782 = add v780, v781
    v783 = mul v770, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
    v784 = add v782, v783
    v785 = mul v784, Field 9676220459425127104563807626505378474104527268335041816433595157913150665495
    v786 = add v1, v785
    v787 = mul v784, Field 8364259238812534287689210722577399963878179320345509803468849104367466297989

    // iteration 2
    v788 = add v770, v787
    v789 = add v784, Field -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127
    v790 = mul v789, v789
    v791 = mul v790, v790
    v792 = mul v789, v791
    v793 = mul v792, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v794 = add v792, v793
    v795 = mul v786, Field -6684379154708237978759272567577041337887670303253204317176013706256788968730
    v796 = add v794, v795
    v797 = mul v788, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
    v798 = add v796, v797
    v799 = mul v798, Field 1645017323598148583308153743253948043010266295265950623794066679542803673813
    v800 = add v786, v799
    v801 = mul v798, Field -6902316737387657021175622823110739310551009794185512225013048131011375572021

    // iteration 3
    v802 = add v788, v801
    v803 = add v798, Field 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814
    v804 = mul v803, v803
    v805 = mul v804, v804
    v806 = mul v803, v805
    v807 = mul v806, Field 7511745149465107256748700652201246547602992235352608707588321460060273774987
    v808 = add v806, v807
    v809 = mul v800, Field -3778477114939312735135329793763823326274743295264527892924859844466140293618
    v810 = add v808, v809
    v811 = mul v802, Field 8034324828084400593020431506480243533881627849088152439427470035355284392177
    v812 = add v810, v811

    // main
    v817 = eq v812, v1
    constrain v817 == u1 0
    return 
}

TomAFrench · 2024-03-25T17:17:20Z

~~The issue results from us not reusing intermediate variables optimally during circuit transformation:~~

Doesn't seem like this is the case. Keeping this here for reference however.

// Untransformed
Compiled ACIR for main (unoptimized):
func 0
current witness index : 16
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _5) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (10302764019366810656324304069659347464634037313761163346014007424803634884116, _2, _4) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _8) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (2056738702329958338528069809778020848023814166851283194299370842509476243522, _2, _4) (692246069124406706574478606975526033386351398548558144542515845293232943739, _5, _7) (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _11) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (8244951158953924849441820526537414617456405719694233716653789536929031657217, _2, _4) (1245000739171333043151662300492364762247881201867951700385117535552263642718, _5, _7) (-8866078365483905008300628909657660427368986553774165888273494658969433806135, _8, _10) (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _14) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]

EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]

// transformed

current witness index : 44
public parameters indices : [1]
return value indices : []
EXPR [ (1, _0) (-1, _2) -5512639236676924786014155380588025764096985869754559557744523208552434701087 ]
EXPR [ (1, _2, _2) (-1, _3) 0 ]
EXPR [ (1, _3, _3) (-1, _4) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _2, _4) (-1, _17) 0 ]
EXPR [ (9676220459425127104563807626505378474104527268335041816433595157913150665495, _0) (11458095071393001696701082110121619368985842717629857123740208523982501518920, _1) (-1, _18) 0 ]
EXPR [ (-1, _5) (1, _17) (1, _18) -2⁴×387493572659975021470226716906358013055803406560817195243522295635460328127 ]
EXPR [ (1, _5, _5) (-1, _6) 0 ]
EXPR [ (1, _6, _6) (-1, _7) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _5, _7) (-1, _21) 0 ]
EXPR [ (6604154496063676802804756384484347298797004655271010100320031032419788549325, _0) (4105605326965746781005342339392760270064105286305864323348731164214007389462, _1) (-1, _22) 0 ]
EXPR [ (11478095976109321424005083581332697351833154779671800717176968483932168473324, _17) (1, _21) (-1, _23) 0 ]
EXPR [ (-1, _8) (1, _22) (1, _23) 2⁴×593199705942557006050540024999671509991229882396447677974095356964325900814 ]
EXPR [ (1, _8, _8) (-1, _9) 0 ]
EXPR [ (1, _9, _9) (-1, _10) 0 ]
EXPR [ (7511745149465107256748700652201246547602992235352608707588321460060273774988, _8, _10) (-1, _27) 0 ]
EXPR [ (-4945605217586328132698244186628229593098542664068970726496460897751429780929, _0) (6377772023842086820811309884692226915529324425293982843669511452955944766649, _1) (-1, _28) 0 ]
EXPR [ (-8171050935427428633424824527321522461511072821809794285567532217425125208927, _17) (-5514733471089785794374270767163942360219071296880559339102618199968142537142, _21) (-1, _29) 0 ]
EXPR [ (1, _27) (1, _28) (-1, _30) 0 ]
EXPR [ (-1, _11) (1, _29) (1, _30) 9797453712978351739894993124526343599910864939600507506817907398049628087845 ]
EXPR [ (1, _11, _11) (-1, _12) 0 ]
EXPR [ (1, _12, _12) (-1, _13) 0 ]
EXPR [ (-7511745149465107256748700652201246547602992235352608707588321460060273774988, _11, _13) (-1, _35) 0 ]
EXPR [ (7538040645284361082572458070655519332106145662083850722888221131624634527860, _0) (4089505136743306378354241962475825377338462365227913938936236421229037646862, _1) (-1, _36) 0 ]
EXPR [ (-6379787701423272764930233206555704103213006823652545026586546074778741051842, _17) (-6528941988757818671164751416148170116685212390218786282564565476950079182897, _21) (-1, _37) 0 ]
EXPR [ (-4913523097929112444230457750999582340681777120025624541356118512330689264044, _27) (1, _35) (-1, _38) 0 ]
EXPR [ (1, _36) (1, _37) (-1, _39) 0 ]
EXPR [ (-1, _14) (1, _38) (1, _39) 0 ]
BRILLIG: inputs: [Single(Expression { mul_terms: [], linear_combinations: [(1, Witness(14))], q_c: 0 })]
outputs: [Simple(Witness(15))]
[CalldataCopy { destination_address: MemoryAddress(0), size: 1, offset: 0 }, JumpIfNot { condition: MemoryAddress(0), location: 3 }, Const { destination: MemoryAddress(1), bit_size: 254, value: Value { inner: 1 } }, BinaryFieldOp { destination: MemoryAddress(0), op: Div, lhs: MemoryAddress(1), rhs: MemoryAddress(0) }, Stop { return_data_offset: 0, return_data_size: 1 }]

EXPR [ (1, _14, _15) (1, _16) -1 ]
EXPR [ (1, _14, _16) 0 ]
EXPR [ (1, _16) 0 ]

+---------------+----------------------+--------------+----------------------+
| Package       | Expression Width     | ACIR Opcodes | Backend Circuit Size |
+---------------+----------------------+--------------+----------------------+
| test_poseidon | Bounded { width: 3 } | 31           | 36                   |
+---------------+----------------------+--------------+----------------------+

TomAFrench · 2024-03-25T18:35:29Z

In the happy case of the initial state being comprised on witnesses only then the number of opcodes required for each iteration should be something like this:

let t = s0+const; // 1 opcode
let tttt = t^4; // 2 opcodes (one for each square)
let a = s0 * tttt // 1 opcode
let new_s0 = a + s1*const + s2*const // 1 opcode
let new_s1 = s1 + new_s0 * const // 1 opcode
let new_s2 = s1 + new_s0 * const // 1 opcode

We should have a constant 7 opcodes per iteration

My gut feeling is that the issue is due how we're laying down constraints lazily we're accumulating more and more quadratic terms inside s1 and s2 and we're not recognising that we have a common value of new_s0 being added into it which should be simplified. We're then doing a lot of recalculation because of this.

This can be seen as if we use the loop below (note the large number of iterations plus the range opcodes being used to enforce that the state is always represented as witnesses) I get a circuit size of 556. Commenting out these range opcodes causes the circuit to grow to 1480.

    for _r in 0..50 {
        s0 += C[((rf as u64 / 2)+1)*state.len() + _r as u64];
        let t = s0;
        let tt = t * t;
        let tttt = tt * tt;
        s0 *= tttt;
        s0 += s0 * S[(5)*_r ];
        s0 += s1 * S[(5)*_r +1];
        s0 += s2 * S[(5)*_r +2];
        s1 += s0 * S[5 * _r +2];
        s2 += s0 * S[5*_r  + 3];
        s1.assert_max_bit_size(253);
        s2.assert_max_bit_size(253);
    }

I also get a constant difference of 11 opcodes if I change the number of loop iterations in this case. Removing 2 for the range opcodes, this doesn't quite line up with the expected 7 but 🤷

jfecher · 2024-03-25T18:44:08Z

@TomAFrench Going from 1 to 2 iterations in the original code I see a change of 6 opcodes, which lines up with Zac's original estimate.

I also just finished testing acir-gen/Ssa::into_acir and found that that does increase by a constant 3 additional opcodes each iteration when going from 1 iteration to 2 to 3. So acir-gen seems fine here.

Sounds like we're missing a way to factor out common instances of new_s0 then? Or maybe we just can't lay down constraints lazily anymore.

TomAFrench · 2024-03-25T18:58:00Z

Sounds like we're missing a way to factor out common instances of new_s0 then?

It's kinda yucky as it exposes knowledge about circuits to the end user, but yeah it does seem that a force_witness(expr) would be useful for some of these algorithms.

Or maybe we just can't lay down constraints lazily anymore.

I don't think so, we're always going to need some laziness as emitting an acir opcode per SSA instruction will mean that we'll never fill up a plonk gate. I'd need to think on if we can identify variables such as s1 and s2 automatically.

jfecher · 2024-03-25T21:05:54Z

I've confirmed the extra constraints are being pushed here: https://github.com/noir-lang/noir/blob/master/acvm-repo/acvm/src/compiler/transformers/mod.rs#L85-L96, 3 for the first iter, 4 for the second, then 5, etc.

…ss to be equal to a variable (#4641) # Description ## Problem\* Quick fix to #4629 ## Summary\* This PR adds a `as_witness` function which takes a `Field` and instructs the compiler to lay down an opcode which constrains a witness to be equal to this value. This prevents the issue in #4629 where we collect ever growing expressions which need to be broken down multiple times while reshaping the circuit. Doing this during acirgen instead allows us to avoid this repetition. ## Additional Context ## Documentation\* Check one: - [ ] No documentation needed. - [ ] Documentation included in this PR. - [x] **[For Experimental Features]** Documentation to be submitted in a separate PR. Feature is experimental # PR Checklist\* - [x] I have tested the changes locally. - [x] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.

# Description ## Problem\* Related to #4629 ## Summary\* Do not create a witness when multiplying a witness (or an affine transformation of a witness) with an expression of degree 1. ## Additional Context This PR does NOT fix the issue #4629, but it should help reducing the overall gates number. The arithmetic expressions resulting from the multiplication should not explode in width since I only consider multiplication with a witness (or 'affine witness'). ## Documentation\* Check one: - [X] No documentation needed. - [ ] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [X] I have tested the changes locally. - [X] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.

TomAFrench · 2024-06-11T17:22:48Z

@vezenovm do you want to take a look at making ACIR gen aware of the expression width?

vezenovm · 2024-06-11T20:22:02Z

@vezenovm do you want to take a look at making ACIR gen aware of the expression width?

Yeah happy to take this on.

# Description ## Problem\* Resolves #4629 ## Summary\* This PR updates how we add vars in ACIR gen to account for an expression width. This is under a compile time flag `--bounded-codegen`. If the sum of two expressions is going to go over the specified expression width we automatically create witnesses for either the lhs, the rhs, or both, before then generating a sum expression. This new bounded codegen could provide an easy way for developers to tell whether their program can be optimized using `as_witness`. Reference the additional context section below for some numbers. ## Additional Context There are some limitations in this approach as it is pretty naive in how it decides to when to generate a new witness. It doesn't look ahead at all other than for whether the two AcirVar's being added are going to create an expression over the specified ACIR width. For example we can see the following gate counts for `poseidonsponge_x5_254`: ``` No width awareness: +-----------------------+----------+----------------------+--------------+ | Package | Function | Expression Width | ACIR Opcodes | +-----------------------+----------+----------------------+--------------+ | poseidonsponge_x5_254 | main | Bounded { width: 4 } | 3096 | +-----------------------+----------+----------------------+--------------+ No width awareness w/ as_witness (this is the currently optimized poseidon we have in the stdlib): +-----------------------+----------+----------------------+--------------+ | Package | Function | Expression Width | ACIR Opcodes | +-----------------------+----------+----------------------+--------------+ | poseidonsponge_x5_254 | main | Bounded { width: 4 } | 1302 | +-----------------------+----------+----------------------+--------------+ Width awareness: +-----------------------+----------+----------------------+--------------+ | Package | Function | Expression Width | ACIR Opcodes | +-----------------------+----------+----------------------+--------------+ | poseidonsponge_x5_254 | main | Bounded { width: 4 } | 2114 | +-----------------------+----------+----------------------+--------------+ Width awareness w/ as_witness: +-----------------------+----------+----------------------+--------------+ | Package | Function | Expression Width | ACIR Opcodes | +-----------------------+----------+----------------------+--------------+ | poseidonsponge_x5_254 | main | Bounded { width: 4 } | 1792 | +-----------------------+----------+----------------------+--------------+ ``` From the above we can see that we actually have a degradation when using the addition strategy used in this PR with a hand optimized program using `as_witness`. Although this PR still gives an improvement in the default. Another example is the following program: ```rust fn main(x: Field, y: pub Field) { let state = [x, y]; let state = oh_no_not_again(state); // This assert will fail if we execute assert(state[0] + state[1] == 0); } fn oh_no_not_again(mut state: [Field; 2]) -> [Field; 2] { for _ in 0..200 { state[0] = state[0] * state[0] + state[1]; state[1] += state[0]; } state } ``` Without any width awareness we get 1150 ACIR gates. With this PR we will get 399 gates. If we substitute `oh_no_not_again` for the following: ```rust fn oh_no_not_again_as_witness(mut state: [Field; 2]) -> [Field; 2] { for i in 0..200 { state[0] = state[0] * state[0] + state[1]; std::as_witness(state[0]); state[1] += state[0]; if (i & 1 == 1) { std::as_witness(state[1]); } } state } ``` We will get 301 gates if the method above is called instead of `oh_no_not_again`. ## Documentation\* Check one: - [X] No documentation needed. - [ ] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [X] I have tested the changes locally. - [X] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings. --------- Co-authored-by: Tom French <[email protected]>

jfecher · 2024-07-24T14:52:47Z

Reopening this since the flag introduced by #5493 is not enabled by default

jfecher added the bug Something isn't working label Mar 25, 2024

github-project-automation bot added this to Noir Mar 25, 2024

github-project-automation bot moved this to 📋 Backlog in Noir Mar 25, 2024

TomAFrench mentioned this issue Mar 26, 2024

feat: add as_witness builtin function in order to constrain a witness to be equal to a variable #4641

Merged

5 tasks

jfecher mentioned this issue Mar 26, 2024

chore: Optimize HashMap: bucketed approach #4603

Draft

5 tasks

guipublic mentioned this issue May 24, 2024

chore: Avoid creating witness for simple multiplications #5100

Merged

5 tasks

jfecher mentioned this issue May 29, 2024

Conditional RAM writes scale nonlinearly in loops #5027

Open

TomAFrench assigned vezenovm Jun 11, 2024

vezenovm moved this from 📋 Backlog to 🏗 In progress in Noir Jun 27, 2024

vezenovm mentioned this issue Jul 11, 2024

feat(acir_gen): Width aware ACIR gen addition #5493

Merged

5 tasks

vezenovm closed this as completed in #5493 Jul 24, 2024

github-project-automation bot moved this from 🏗 In progress to ✅ Done in Noir Jul 24, 2024

jfecher reopened this Jul 24, 2024

TomAFrench mentioned this issue Nov 18, 2024

Gate blowup due to optimization bug in loops #6539

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acir size increases exponentially as loop iterations increase, instead of linearly #4629

Acir size increases exponentially as loop iterations increase, instead of linearly #4629

jfecher commented Mar 25, 2024

jfecher commented Mar 25, 2024 •

edited

Loading

jfecher commented Mar 25, 2024

jfecher commented Mar 25, 2024

TomAFrench commented Mar 25, 2024 •

edited

Loading

TomAFrench commented Mar 25, 2024 •

edited

Loading

jfecher commented Mar 25, 2024

TomAFrench commented Mar 25, 2024 •

edited

Loading

jfecher commented Mar 25, 2024

TomAFrench commented Jun 11, 2024

vezenovm commented Jun 11, 2024

jfecher commented Jul 24, 2024

Acir size increases exponentially as loop iterations increase, instead of linearly #4629

Acir size increases exponentially as loop iterations increase, instead of linearly #4629

Comments

jfecher commented Mar 25, 2024

Aim

Expected Behavior

Bug

To Reproduce

Project Impact

Impact Context

Workaround

Workaround Description

Additional Context

Installation Method

Nargo Version

NoirJS Version

Would you like to submit a PR for this Issue?

Support Needs

jfecher commented Mar 25, 2024 • edited Loading

jfecher commented Mar 25, 2024

jfecher commented Mar 25, 2024

TomAFrench commented Mar 25, 2024 • edited Loading

TomAFrench commented Mar 25, 2024 • edited Loading

jfecher commented Mar 25, 2024

TomAFrench commented Mar 25, 2024 • edited Loading

jfecher commented Mar 25, 2024

TomAFrench commented Jun 11, 2024

vezenovm commented Jun 11, 2024

jfecher commented Jul 24, 2024

jfecher commented Mar 25, 2024 •

edited

Loading

TomAFrench commented Mar 25, 2024 •

edited

Loading

TomAFrench commented Mar 25, 2024 •

edited

Loading

TomAFrench commented Mar 25, 2024 •

edited

Loading