Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace boolean ANDs with multiplication #1954

Merged
merged 13 commits into from
Aug 2, 2023
Merged

feat: replace boolean ANDs with multiplication #1954

merged 13 commits into from
Aug 2, 2023

Conversation

TomAFrench
Copy link
Member

Description

Problem*

Resolves

Summary*

The Noir program below

fn main(x: u1) -> pub u1 {
    x & 1
}

previously would output the ACIR

current witness index : 3
public parameters indices : []
return value indices : [3]
BLACKBOX::RANGE [(_1, num_bits: 1)] [ ]
EXPR [ (-1, _2) 1 ]
BLACKBOX::AND [(_1, num_bits: 1), (_2, num_bits: 1)] [ _3]

This is suboptimal for a couple of reasons:

  • we know that x & 1 == x so these gates should be optimised away
  • even if we're performing x & y where neither are comptime then it's cheaper to multiply them together than do an AND.

I've then added a special case for AND between boolean variables which performs a multiplication. This results in the same program being optimized away such that we get the ACIR:

Compiled ACIR for main:
current witness index : 1
public parameters indices : []
return value indices : [1]
BLACKBOX::RANGE [(_1, num_bits: 1)] [ ]

Documentation

  • This PR requires documentation updates when merged.

    • I will submit a noir-lang/docs PR.
    • I will request for and support Dev Rel's help in documenting this PR.

Additional Context

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@TomAFrench
Copy link
Member Author

For instance

fn main(x: u1, y: u1) -> pub u1 {
    x & y + 1
}

requires 2784 gates without this optimisation but afterwards requires just 11.

Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good, but have you considered adding it during ssa-gen instead of acir gen so that further optimizations can potentially be triggered afterward?

@jfecher
Copy link
Contributor

jfecher commented Jul 18, 2023

requires 2784 gates without this optimisation but afterwards requires just 11.

I wonder if we can have an optimization when we convert something to bits to only convert N bits for a uN type.

@TomAFrench
Copy link
Member Author

have you considered adding it during ssa-gen instead of acir gen so that further optimizations can potentially be triggered afterward?

No I didn't but this seems like a good excuse to get more familiar with the internals of that. Seems to me like we want to insert a snippet like

let instruction = match instruction {
    Instruction::Binary(op) if self.type_of_value(op.lhs).is_boolean() => {
        instruction.booleanize()
    }
    instruction => instruction,
};

before L162 in

match instruction.simplify(self, block) {
SimplifyResult::SimplifiedTo(simplification) => SimplifiedTo(simplification),
SimplifyResult::SimplifiedToMultiple(simplification) => {
SimplifiedToMultiple(simplification)
}
SimplifyResult::Remove => InstructionRemoved,
SimplifyResult::None => {
let id = self.make_instruction(instruction, ctrl_typevars);
self.blocks[block].insert_instruction(id);
if let Some(location) = location {
self.locations.insert(id, location);
}
InsertInstructionResult::Results(self.instruction_results(id))
}
}

Does this look sensible to you?

@jfecher
Copy link
Contributor

jfecher commented Jul 18, 2023

I think it should probably go within the simplify call. There should be a case that handles binary operators already, and you can find the case for And and add to that.

@TomAFrench
Copy link
Member Author

TomAFrench commented Jul 18, 2023

Ah, I looked at that but wasn't sure if I could return a non-constant value out. So I then just need to return something like SimplifyResult::SimplifiedTo(dfg.make_value(booleanised_instruction)) from there?

@kevaundray
Copy link
Contributor

For instance

fn main(x: u1, y: u1) -> pub u1 {
    x & y + 1
}

requires 2784 gates without this optimisation but afterwards requires just 11.

Why does this require 2784 gates?

@TomAFrench
Copy link
Member Author

Why does this require 2784 gates?

Barretenberg is using a plookup version of AND/XOR afaik so I guess that has some constant lower bound on the number of gates it uses.

@TomAFrench
Copy link
Member Author

Compilation error comes from this line.

https://github.com/noir-lang/noir/blob/master/noir_stdlib/src/ec/swcurve.nr#L38-L39

Seems like we don't update some values correctly when we swap out the instruction. I need to dig into this more though.

@TomAFrench
Copy link
Member Author

Minimal program for reproducing the panic:

use dep::std::ec::tecurve::affine::Curve as AffineCurve;
use dep::std::ec::tecurve::affine::Point as Gaffine;
use dep::std::ec::tecurve::curvegroup::Curve;
use dep::std::ec::tecurve::curvegroup::Point as G;

use dep::std::ec::swcurve::affine::Point as SWGaffine;
use dep::std::ec::swcurve::curvegroup::Point as SWG;

use dep::std::ec::montcurve::affine::Point as MGaffine;
use dep::std::ec::montcurve::curvegroup::Point as MG;

fn main() {
    // This test only makes sense if Field is the right prime field.
    if 21888242871839275222246405745257275088548364400416034343698204186575808495617 == 0
    {
        // Define Baby Jubjub (ERC-2494) parameters in affine representation
        let bjj_affine = AffineCurve::new(168700, 168696, Gaffine::new(995203441582195749578291179787384436505546430278305826713579947235728471134,5472060717959818805561601436314318772137091100104008585924551046643952123905));

        // Test addition
        let p1_affine = Gaffine::new(17777552123799933955779906779655732241715742912184938656739573121738514868268, 2626589144620713026669568689430873010625803728049924121243784502389097019475);
        let p2_affine = Gaffine::new(16540640123574156134436876038791482806971768689494387082833631921987005038935, 20819045374670962167435360035096875258406992893633759881276124905556507972311);      
        let p3_affine = bjj_affine.add(p1_affine, p2_affine);


        
        // Test CurveGroup equivalents
        let bjj = bjj_affine.into_group(); // Baby Jubjub

        let p1 = p1_affine.into_group();
        let p2 = p2_affine.into_group();
        let p3 = p3_affine.into_group();

        
        // Test SWCurve equivalents of the above
        // First the affine representation
        let bjj_swcurve_affine = bjj_affine.into_swcurve();

        let p1_swcurve_affine = bjj_affine.map_into_swcurve(p1_affine);
        let p2_swcurve_affine = bjj_affine.map_into_swcurve(p2_affine);
        let p3_swcurve_affine = bjj_affine.map_into_swcurve(p3_affine);
        
        // Addition
        assert(
            p3_swcurve_affine.eq(
                bjj_swcurve_affine.add(
                p1_swcurve_affine,
                p2_swcurve_affine
                )
            )
        );

        // Check that these points are on the curve
        assert(
            bjj_swcurve_affine.contains(bjj_swcurve_affine.gen) &
            bjj_swcurve_affine.contains(p1_swcurve_affine) &
            bjj_swcurve_affine.contains(p2_swcurve_affine) &
            bjj_swcurve_affine.contains(p3_swcurve_affine)
        );
        
        // Then the CurveGroup representation
        let bjj_swcurve = bjj.into_swcurve();

        let p1_swcurve = bjj.map_into_swcurve(p1);
        let p2_swcurve = bjj.map_into_swcurve(p2);
        let p3_swcurve = bjj.map_into_swcurve(p3);
        
        // Addition
        assert(p3_swcurve.eq(bjj_swcurve.add(p1_swcurve,p2_swcurve)));        
    }            
}

@TomAFrench
Copy link
Member Author

I'm getting a panic on this line due to one of the inputs to the multiplication instruction being a Value::Param which should be cached.

Value::Instruction { .. } | Value::Param { .. } => {
unreachable!("ICE: Should have been in cache {value_id} {value:?}")
}

I'm not clear on how changing instruction.operation has resulted in this as lhs and rhs are unaffected. @jfecher Do you have any idea how this error might be occurring?

@vezenovm
Copy link
Contributor

Barretenberg is using a plookup version of AND/XOR afaik so I guess that has some constant lower bound on the number of gates it uses.

The plookup versions of different ops are going to have a constant lower bound around 3k gates. Do we know if it is always cheaper to do a multiply rather than an AND? Like if this optimization is used in a larger circuit that is significantly over the 3k lower bound?

@TomAFrench
Copy link
Member Author

If I use the program in new and old SSA I get

fn main(array: [u1; 1000]) -> pub u1 {
    let mut result = 1;
    for element in array {
        result &= element;
    }
    result
}
// Old
...
EXPR [ (1, _996, _996) (-1, _996) 0 ]
EXPR [ (1, _997, _997) (-1, _997) 0 ]
EXPR [ (1, _998, _998) (-1, _998) 0 ]
EXPR [ (1, _999, _999) (-1, _999) 0 ] // Range checks
EXPR [ (1, _1000, _1000) (-1, _1000) 0 ]
EXPR [ (1, _1, _2) (-1, _1001) 0 ]   // AND multiplications
EXPR [ (1, _3, _1001) (-1, _1002) 0 ]
EXPR [ (1, _4, _1002) (-1, _1003) 0 ]
EXPR [ (1, _5, _1003) (-1, _1004) 0 ]
...

Total ACIR opcodes generated for language PLONKCSat { width: 3 }: 1999
Backend circuit size: 2000

// New
....
BLACKBOX::RANGE [(_997, num_bits: 1)] [ ]
BLACKBOX::RANGE [(_998, num_bits: 1)] [ ]
BLACKBOX::RANGE [(_999, num_bits: 1)] [ ]
BLACKBOX::RANGE [(_1000, num_bits: 1)] [ ] // Range checks
EXPR [ (-1, _1001) 1 ] // `result` initialisation
BLACKBOX::AND [(_1001, num_bits: 1), (_1, num_bits: 1)] [ _1002]
BLACKBOX::AND [(_1002, num_bits: 1), (_2, num_bits: 1)] [ _1003]
BLACKBOX::AND [(_1003, num_bits: 1), (_3, num_bits: 1)] [ _1004]
BLACKBOX::AND [(_1004, num_bits: 1), (_4, num_bits: 1)] [ _1005]
....

Total ACIR opcodes generated for language PLONKCSat { width: 3 }: 2001
Backend circuit size: 41991

We can see that we actually have this multiplication optimisation in the old SSA code which causes it to outperform. In case this was an artifact of only having boolean ANDs and we would get better scaling with the plookup AND when we have a mix of boolean and non-boolean types I repeated with

fn main(array: [u1; 1000], byte_array: [u8; 1000]) -> pub u8 {
    let mut result = 1;
    for element in array {
        result &= element;
    };

    let mut byte_result: u8 = 5;
    for element in byte_array {
        byte_result &= element;
    };

    (result as u8) * byte_result
}

which gives the results

// Old
Total ACIR opcodes generated for language PLONKCSat { width: 3 }: 4001
Backend circuit size: 44098

// New
Total ACIR opcodes generated for language PLONKCSat { width: 3 }: 4007
Backend circuit size: 81360

@vezenovm
Copy link
Contributor

If I use the program in new and old SSA I get

Awesome. Thank you for the clear comparison

@jfecher
Copy link
Contributor

jfecher commented Jul 24, 2023

I'm not clear on how changing instruction.operation has resulted in this as lhs and rhs are unaffected. @jfecher Do you have any idea how this error might be occurring?

No, unfortunately. My thought was the same as yours, as long as the arguments of the instruction remained the same it shouldn't require extra handling. Let me know if you want me to try to look into this as well.

* master: (53 commits)
  chore: Update `noir-source-resolver` to v1.1.3 (#1912)
  chore: Document `GeneratedAcir::more_than_eq_comparison` (#2085)
  chore: refresh ACIR test artifacts (#2091)
  feat: Add `deprecated` attribute (#2041)
  chore(ssa refactor): Implement `acir_gen` errors (#2071)
  chore: use witnesses from the generated acir in the ABI (#2095)
  fix: Fix methods not mutating fields (#2087)
  chore(nargo): Use Display impl for InputValue (#1990)
  feat: Make arrays and slices polymorphic over each other (#2070)
  feat: Remove an unnecessary witness in `mul_with_witness` (#2078)
  chore: document truncate (#2082)
  fix: avoid potential panic in `two_complement` (#2081)
  chore: Cleanup integration tests (#2074)
  chore: replace `Type::TypeVariable`, `Type::PolymorphicInteger`, and … (#2065)
  chore!: Require package names in `Nargo.toml` files (#2056)
  fix: Avoid non-determinism in defunctionalization (#2069)
  chore: change 'unnecessary pub' error to a warning (#2064)
  feat!: Update to ACVM 0.21.0 (#2051)
  chore: Rename execute tests for an accurate description (#2063)
  chore: Restore lost integration test (#2062)
  ...
@TomAFrench
Copy link
Member Author

TomAFrench commented Aug 1, 2023

New minimal reproduction.

use dep::std::ec::tecurve::affine::Curve as AffineCurve;
use dep::std::ec::tecurve::affine::Point as Gaffine;
use dep::std::ec::tecurve::curvegroup::Curve;
use dep::std::ec::tecurve::curvegroup::Point as G;

use dep::std::ec::swcurve::affine::Point as SWGaffine;
use dep::std::ec::swcurve::curvegroup::Point as SWG;

use dep::std::ec::montcurve::affine::Point as MGaffine;
use dep::std::ec::montcurve::curvegroup::Point as MG;

fn main() {
    
    // Define Baby Jubjub (ERC-2494) parameters in affine representation
    let bjj_affine = AffineCurve::new(168700, 168696, Gaffine::new(995203441582195749578291179787384436505546430278305826713579947235728471134,5472060717959818805561601436314318772137091100104008585924551046643952123905));

    // Test addition
    let p1_affine = Gaffine::new(17777552123799933955779906779655732241715742912184938656739573121738514868268, 2626589144620713026669568689430873010625803728049924121243784502389097019475);
    let p2_affine = Gaffine::new(16540640123574156134436876038791482806971768689494387082833631921987005038935, 20819045374670962167435360035096875258406992893633759881276124905556507972311);      
    let p3_affine = bjj_affine.add(p1_affine, p2_affine);
    
    // Test SWCurve equivalents of the above
    // First the affine representation
    let bjj_swcurve_affine = bjj_affine.into_swcurve();

    let p1_swcurve_affine = bjj_affine.map_into_swcurve(p1_affine);
    let p2_swcurve_affine = bjj_affine.map_into_swcurve(p2_affine);
    let p3_swcurve_affine = bjj_affine.map_into_swcurve(p3_affine);

    let p3_swcurve_affine_from_add = bjj_swcurve_affine.add(
        p1_swcurve_affine,
        p2_swcurve_affine
    );

    // Check that these points are on the curve
    assert(
        bjj_swcurve_affine.contains(p1_swcurve_affine)
    );            
}

@TomAFrench
Copy link
Member Author

TomAFrench commented Aug 1, 2023

This now panics with

The application panicked (crashed).
Message:  internal error: entered unreachable code: ICE: Should have been in cache v170 Param { block: Id(12), position: 2, typ: Numeric(Unsigned { bit_size: 1 }) }
Location: crates/noirc_evaluator/src/ssa_refactor/acir_gen/mod.rs:722

with the printed SSA being

/* snip */

@jfecher I don't think I can meaningfully debug this to be honest. It seems odd to me that we wouldn't create a numeric constant rather than a param for an function argument if we know it's value at compile time though, should we be rewriting these?

@jfecher
Copy link
Contributor

jfecher commented Aug 1, 2023

@TomAFrench found the error: #2117. Not sure why this PR specifically was triggering it though.

* master:
  feat!: Support workspaces and package selection on every nargo command (#1992)
  chore: Make a more clear error for slices passed to std::println (#2113)
  feat: Implement type aliases (#2112)
  feat: Add `Option<T>` to noir stdlib (#1781)
  feat: Format strings for prints  (#1952)
  feat(acir_gen): RecursiveAggregation opcode and updates to black box func call generation (#2097)
  fix: Mutating a variable no longer mutates its copy (#2057)
  fix: Implement `.len()` in Acir-Gen (#2077)
  chore: clippy fixes (#2101)
* master:
  fix: flattening pass no longer overwrites previously mapped condition values (#2117)
  chore(noirc_driver): Unify crate preparation (#2119)
@TomAFrench TomAFrench requested a review from jfecher August 2, 2023 09:43
@TomAFrench
Copy link
Member Author

TomAFrench commented Aug 2, 2023

@jfecher can you rereview please? Your requested changes should be addressed.

* master:
  feat: Add support for bitshifts by distances known at runtime (#2072)
  feat: Add additional `BinaryOp` simplifications (#2124)
Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to finally merge this in 🙂

@jfecher jfecher enabled auto-merge August 2, 2023 15:50
@jfecher jfecher added this pull request to the merge queue Aug 2, 2023
Merged via the queue into master with commit 435ab35 Aug 2, 2023
4 checks passed
@jfecher jfecher deleted the boolean-and branch August 2, 2023 16:13
TomAFrench added a commit that referenced this pull request Aug 2, 2023
* master: (50 commits)
  fix(globals): Accurately filter literals for resolving globals (#2126)
  feat: Optimize away constant calls to black box functions (#1981)
  fix: Rename `Option::value` to `Option::_value` (#2127)
  feat: replace boolean `AND`s with multiplication (#1954)
  chore: create a `const` to hold the panic message (#2122)
  feat: Add support for bitshifts by distances known at runtime (#2072)
  feat: Add additional `BinaryOp` simplifications (#2124)
  fix: flattening pass no longer overwrites previously mapped condition values (#2117)
  chore(noirc_driver): Unify crate preparation (#2119)
  feat!: Support workspaces and package selection on every nargo command (#1992)
  chore: Make a more clear error for slices passed to std::println (#2113)
  feat: Implement type aliases (#2112)
  feat: Add `Option<T>` to noir stdlib (#1781)
  feat: Format strings for prints  (#1952)
  feat(acir_gen): RecursiveAggregation opcode and updates to black box func call generation (#2097)
  fix: Mutating a variable no longer mutates its copy (#2057)
  fix: Implement `.len()` in Acir-Gen (#2077)
  chore: clippy fixes (#2101)
  chore: Update `noir-source-resolver` to v1.1.3 (#1912)
  chore: Document `GeneratedAcir::more_than_eq_comparison` (#2085)
  ...
TomAFrench added a commit that referenced this pull request Aug 2, 2023
* master:
  chore: rename `ssa_refactor` module to `ssa` (#2129)
  chore: Use `--show-output` flag on execution rather than compilation  (#2116)
  fix(globals): Accurately filter literals for resolving globals (#2126)
  feat: Optimize away constant calls to black box functions (#1981)
  fix: Rename `Option::value` to `Option::_value` (#2127)
  feat: replace boolean `AND`s with multiplication (#1954)
  chore: create a `const` to hold the panic message (#2122)
  feat: Add support for bitshifts by distances known at runtime (#2072)
  feat: Add additional `BinaryOp` simplifications (#2124)
  fix: flattening pass no longer overwrites previously mapped condition values (#2117)
  chore(noirc_driver): Unify crate preparation (#2119)
  feat!: Support workspaces and package selection on every nargo command (#1992)
  chore: Make a more clear error for slices passed to std::println (#2113)
  feat: Implement type aliases (#2112)
  feat: Add `Option<T>` to noir stdlib (#1781)
  feat: Format strings for prints  (#1952)
  feat(acir_gen): RecursiveAggregation opcode and updates to black box func call generation (#2097)
  fix: Mutating a variable no longer mutates its copy (#2057)
  fix: Implement `.len()` in Acir-Gen (#2077)
TomAFrench added a commit that referenced this pull request Aug 2, 2023
* master:
  chore: rename `ssa_refactor` module to `ssa` (#2129)
  chore: Use `--show-output` flag on execution rather than compilation  (#2116)
  fix(globals): Accurately filter literals for resolving globals (#2126)
  feat: Optimize away constant calls to black box functions (#1981)
  fix: Rename `Option::value` to `Option::_value` (#2127)
  feat: replace boolean `AND`s with multiplication (#1954)
  chore: create a `const` to hold the panic message (#2122)
  feat: Add support for bitshifts by distances known at runtime (#2072)
  feat: Add additional `BinaryOp` simplifications (#2124)
  fix: flattening pass no longer overwrites previously mapped condition values (#2117)
  chore(noirc_driver): Unify crate preparation (#2119)
  feat!: Support workspaces and package selection on every nargo command (#1992)
  chore: Make a more clear error for slices passed to std::println (#2113)
  feat: Implement type aliases (#2112)
  feat: Add `Option<T>` to noir stdlib (#1781)
  feat: Format strings for prints  (#1952)
  feat(acir_gen): RecursiveAggregation opcode and updates to black box func call generation (#2097)
  fix: Mutating a variable no longer mutates its copy (#2057)
  fix: Implement `.len()` in Acir-Gen (#2077)
TomAFrench added a commit that referenced this pull request Aug 2, 2023
* master:
  chore: rename `ssa_refactor` module to `ssa` (#2129)
  chore: Use `--show-output` flag on execution rather than compilation  (#2116)
  fix(globals): Accurately filter literals for resolving globals (#2126)
  feat: Optimize away constant calls to black box functions (#1981)
  fix: Rename `Option::value` to `Option::_value` (#2127)
  feat: replace boolean `AND`s with multiplication (#1954)
  chore: create a `const` to hold the panic message (#2122)
  feat: Add support for bitshifts by distances known at runtime (#2072)
  feat: Add additional `BinaryOp` simplifications (#2124)
  fix: flattening pass no longer overwrites previously mapped condition values (#2117)
  chore(noirc_driver): Unify crate preparation (#2119)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants