make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

andrewrk · 2021-11-30T07:11:12Z

Let's examine integer parsing without @addWithOverflow and @mulWithOverflow:

const maxInt = std.math.maxInt;

fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T {
    const Big = @Type( .{ .Int = .{
        .signedness = false,
        .bits = @typeInfo(T).Int.bits * 2,
    }});
    var x: Big = 0;

    for (buf) |c| {
        const digit = switch (c) {
            '0'...'9' => c - '0',
            'A'...'Z' => c - 'A' + 10,
            'a'...'z' => c - 'a' + 10,
            else => return error.InvalidCharacter,
        };
        x *= radix;
        if (x > maxInt(T)) return error.Overflow;

        x += digit;
        if (x > maxInt(T)) return error.Overflow;
    }

    return @intCast(T, x);
}

With @addWithOverflow and @mulWithOverflow:

fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T {
    var x: T = 0;

    for (buf) |c| {
        const digit = switch (c) {
            '0'...'9' => c - '0',
            'A'...'Z' => c - 'A' + 10,
            'a'...'z' => c - 'a' + 10,
            else => return error.InvalidCharacter,
        };
        if (@mulWithOverflow(T, x, radix, &x)) return error.Overflow;
        if (@addWithOverflow(T, x, digit, &x)) return error.Overflow;
    }

    return x;
}

godbolt link

Observations: Even with -OReleaseFast optimizations on, LLVM is able to generate better code with @addWithOverflow and @mulWithOverflow. The reason for this is that the backend that generates machine code can special-case the result of these functions, and turn struct field accesses followed by jumps into jo rather than actually doing multiplication with more bits.

Certainly in a debug build, this has the potential to generate much better runtime code.

So that's why these builtins exist: they help the programmer avoid integer overflow bugs, while generating efficient code.

There is a problem though, which is that the current function signature sabotages the potential for debug code to be efficient. The status quo function signature is:

@addWithOverflow(comptime T: type, a: T, b: T, result: *T) bool

The problem is the result pointer. As a point of comparison, here is the corresponding LLVM builtin signature:

declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
declare {<4 x i32>, <4 x i1>} @llvm.uadd.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)

Now that we have started to get into writing our own backends and not relying exclusively on LLVM, I'm seeing the flaw: writing the result through a pointer parameter makes it too hard to use a special value returned from the builtin and detect the pattern that allows lowering to the efficient code.

For example, in the x86 backend code, the MCValues tagged union can represent a value that is partially in the condition flags and partially in a register, which is exactly what would be the return type of one of these arithmetic overflow functions. However, loading through the result pointer messes this mechanism up.

Likewise in a debug build even with the LLVM backend, it messes up LLVM's ability to do the same. It ends up writing the overflow flag to a stack value, which then gets fetched via the result pointer.

Furthermore, the result pointer does not harmonize with SIMD vectors (related: #6835).

I propose that the arithmetic overflow functions return a tuple, like this (related: #6771, #4335):

@addOverflow(a: T, b: T) tuple {T, u1}
@addOverflow(a: @Vector(T, N), b: @Vector(T, N)) tuple {@Vector(T, N), @Vector(u1, N)}

With #498 we can now look at our parseInt function again:

fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T {
    var x: T = 0;

    for (buf) |c| {
        const digit = switch (c) {
            '0'...'9' => c - '0',
            'A'...'Z' => c - 'A' + 10,
            'a'...'z' => c - 'a' + 10,
            else => return error.InvalidCharacter,
        };
        x, const mul_overflow = @mulWithOverflow(T, x, radix);
        if (mul_overflow != 0) return error.Overflow;

        x, const add_overflow = @addWithOverflow(T, x, digit);
        if (add_overflow != 0) return error.Overflow;
    }

    return x;
}

It's effectively the same code, except we have eliminated the use of a mutable variable, and it is now possible for even simple, unoptimizing backends to quickly lower this to efficient runtime code.

The text was updated successfully, but these errors were encountered:

RogierBrussee · 2021-12-06T11:57:41Z

Any reason to not to have

@addOverflow(a: T, b: T) Overflow!T
or
@addOverflow(a:T, b:T) ?T

which seems to be pretty much made for this kind thing?

andrewrk · 2021-12-06T19:23:42Z

Yes. This builtin returns not only the overflow bit but the other bits as well.

tecanec · 2021-12-13T08:49:27Z

@RogierBrussee Here are a couple of use cases for getting both the overflowed result and the excess carry bit:

/// An incredibly cheap oscillator. Each step is only one instruction on some ISAs.
/// May be useful on some low-end systems, or if you're using a lot of parallel oscillators.
const OverflowOscillator = struct {
    x: u32,

    fn step(self: *OverflowOscillator, freq: u32) bool {
        self.x, const overflow = @addWithOverflow(u32, self.x, freq);
        return overflow;
    }
};

/// An int using an arbitrary number of bytes.
/// This is a very primitive implementation, but more advanced implementations might use the same concept.
/// On some ISAs with low word sizes, a similar technique can be used forr emulating larger ints, such as 
/// doing 32-bit arithmetics on a 8-bit CPU.
const ArbPrecInt = struct {
    val: []u8,

    fn increment(self: *ArbPrecInt) !void {
        var carry: bool = true;
        for (self.val) |*byte| {
            if (!carry) {
                break;
            }
            val.*, carry= @addWithOverflow(u8, val.*, 1);
        }
        if (carry) {
            // Handle resize or return an error
        }
    }

    // Other operations may be similarly implemented using @addWithOverflow and @mulWithOverflow.
};

matu3ba · 2022-02-06T22:06:58Z

I can confirm the findings https://gist.github.com/matu3ba/f848765e3c67ee74781b57858049d692

  './addo_noptr' ran
    1.06 ± 0.03 times faster than './addo_fast'
    1.15 ± 0.04 times faster than './addo_simple'

Since the benchmarks show 1.10x speedup of the Hacker's Delight solution vs the simple overflow test everyone does, I will provide __addo and __subo for compiler_rt.

@operation

change signature of arithmetic operations @addwithOverflow, @subWithOverflow, @mulWithOverflow, shlWithOverflow from @operation(comptime T: type, a: T, b: T, result: *T) bool to @operation(comptime T: type, a: T, b: T) anytype with anytype being a tuple struct { res: T, ov: bool } This removes the pointer store and load for efficiency of codegen. Comptime operation is accordingly kept in sync. closes ziglang#10248

@operation

change signature of arithmetic operations @addwithOverflow, @subWithOverflow, @mulWithOverflow, shlWithOverflow from @operation(comptime T: type, a: T, b: T, result: *T) bool to @operation(comptime T: type, a: T, b: T) anytype with anytype being a tuple struct { res: T, ov: bool } This removes the pointer store and load for efficiency of codegen. Comptime operation is accordingly kept in sync. closes ziglang#10248

joachimschmidt557 · 2022-03-24T22:24:39Z

What is the new signature for the builtin functions? The initial issue comment contains two differing versions in the specification and in the example code. Is it

@addWithOverflow(a: T, b: T) tuple {T, u1}

or

@addWithOverflow(comptime T: type, a: T, b: T) tuple {T, u1}

Asking because I'm working on implementing this change in the self-hosted compiler (on top of #10854).

wooster0 · 2022-08-22T08:38:42Z

I could be missing something but was it ever discussed how the result of @addWithOverflow etc. is actually assigned and used?
What I'm seeing in this issue are these:

        x, const add_overflow = @addWithOverflow(T, x, digit);

        self.x, const overflow = @addWithOverflow(u32, self.x, freq);

            val.*, carry= @addWithOverflow(u8, val.*, 1);

which doesn't look like Zig code at all to me. Surely this is not how it's going to look anyway? Or is this un-Ziggy assignment syntax part of #4335 and I'm missing it?

I don't know what the accepted way was to access the members of a tuple but I think it was @"0" etc., right?
So technically it would look like this, right?

        const op = @addWithOverflow(@as(u8, 128), @as(u8, 64));
        _ = op.@"0";
        _ = op.@"1";

I think this is not optimal in terms of readability.
Why don't we use a tuple with named fields (so a regular struct)? So the signature would be:

@addWithOverflow(a: T, b: T) struct { result: T, overflow: u1 }

and used like this:

        const op = @addWithOverflow(@as(u8, 128), @as(u8, 64));
        _ = op.result;
        _ = op.overflow;

which I think is just much easier to understand. Compared to this using a tuple with unnamed fields could require you to consult the langref.

It should return a a vector of bools for compatibility with scalar operands and stage1 until ziglang#10248 can be implemented.

It should return a a vector of bools for compatibility with scalar operands and stage1 until ziglang#10248 can be implemented. Closes ziglang#13201

likern · 2022-12-22T09:57:16Z

@andrewrk Is there proposal or discussion for this syntax?

x, const mul_overflow = @mulWithOverflow(T, x, radix);

Vexu · 2022-12-22T10:39:00Z

andrewrk Is there proposal or discussion for this syntax?
x, const mul_overflow = @mulWithOverflow(T, x, radix);

#498

andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Nov 30, 2021

andrewrk added this to the 0.10.0 milestone Nov 30, 2021

andrewrk added the accepted This proposal is planned. label Dec 7, 2021

andrewrk mentioned this issue Jan 14, 2022

remove the type parameter from builtins that support vectors #6835

Closed

joachimschmidt557 mentioned this issue Mar 26, 2022

stage2: Change semantics of AIR arithmetic overflow instructions #11311

Merged

Luukdegram mentioned this issue Mar 27, 2022

stage2: wasm - Implement overflow arithmetic #11321

Merged

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

Vexu added a commit to Vexu/zig that referenced this issue Nov 23, 2022

Sema: fix overflow arithmetic with runtime vectors

f21cce8

It should return a a vector of bools for compatibility with scalar operands and stage1 until ziglang#10248 can be implemented.

Vexu added a commit to Vexu/zig that referenced this issue Nov 23, 2022

Sema: fix overflow arithmetic with runtime vectors

4cd7b2d

It should return a a vector of bools for compatibility with scalar operands and stage1 until ziglang#10248 can be implemented. Closes ziglang#13201

Vexu added a commit to Vexu/zig that referenced this issue Nov 25, 2022

Sema: fix overflow arithmetic with runtime vectors

cf6f997

It should return a a vector of bools for compatibility with scalar operands and stage1 until ziglang#10248 can be implemented. Closes ziglang#13201

Vexu added a commit to Vexu/zig that referenced this issue Nov 26, 2022

Sema: fix overflow arithmetic with runtime vectors

72fa8d4

It should return a a vector of bools for compatibility with scalar operands and stage1 until ziglang#10248 can be implemented. Closes ziglang#13201

Vexu mentioned this issue Dec 21, 2022

Make overflow arithmetic builtins return tuples #14024

Merged

andrewrk closed this as completed in #14024 Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

andrewrk commented Nov 30, 2021

RogierBrussee commented Dec 6, 2021

andrewrk commented Dec 6, 2021

tecanec commented Dec 13, 2021 •

edited

Loading

matu3ba commented Feb 6, 2022 •

edited

Loading

joachimschmidt557 commented Mar 24, 2022

wooster0 commented Aug 22, 2022

likern commented Dec 22, 2022

Vexu commented Dec 22, 2022

make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

Comments

andrewrk commented Nov 30, 2021

RogierBrussee commented Dec 6, 2021

andrewrk commented Dec 6, 2021

tecanec commented Dec 13, 2021 • edited Loading

matu3ba commented Feb 6, 2022 • edited Loading

joachimschmidt557 commented Mar 24, 2022

wooster0 commented Aug 22, 2022

likern commented Dec 22, 2022

Vexu commented Dec 22, 2022

tecanec commented Dec 13, 2021 •

edited

Loading

matu3ba commented Feb 6, 2022 •

edited

Loading