Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make overflow arithmetic builtins return a tuple instead of using a pointer parameter and bool return value #10248

Closed
andrewrk opened this issue Nov 30, 2021 · 8 comments · Fixed by #14024
Labels
accepted This proposal is planned. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

Let's examine integer parsing without @addWithOverflow and @mulWithOverflow:

const maxInt = std.math.maxInt;

fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T {
    const Big = @Type( .{ .Int = .{
        .signedness = false,
        .bits = @typeInfo(T).Int.bits * 2,
    }});
    var x: Big = 0;

    for (buf) |c| {
        const digit = switch (c) {
            '0'...'9' => c - '0',
            'A'...'Z' => c - 'A' + 10,
            'a'...'z' => c - 'a' + 10,
            else => return error.InvalidCharacter,
        };
        x *= radix;
        if (x > maxInt(T)) return error.Overflow;

        x += digit;
        if (x > maxInt(T)) return error.Overflow;
    }

    return @intCast(T, x);
}

With @addWithOverflow and @mulWithOverflow:

fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T {
    var x: T = 0;

    for (buf) |c| {
        const digit = switch (c) {
            '0'...'9' => c - '0',
            'A'...'Z' => c - 'A' + 10,
            'a'...'z' => c - 'a' + 10,
            else => return error.InvalidCharacter,
        };
        if (@mulWithOverflow(T, x, radix, &x)) return error.Overflow;
        if (@addWithOverflow(T, x, digit, &x)) return error.Overflow;
    }

    return x;
}

godbolt link

Observations: Even with -OReleaseFast optimizations on, LLVM is able to generate better code with @addWithOverflow and @mulWithOverflow. The reason for this is that the backend that generates machine code can special-case the result of these functions, and turn struct field accesses followed by jumps into jo rather than actually doing multiplication with more bits.

Certainly in a debug build, this has the potential to generate much better runtime code.

So that's why these builtins exist: they help the programmer avoid integer overflow bugs, while generating efficient code.

There is a problem though, which is that the current function signature sabotages the potential for debug code to be efficient. The status quo function signature is:

@addWithOverflow(comptime T: type, a: T, b: T, result: *T) bool

The problem is the result pointer. As a point of comparison, here is the corresponding LLVM builtin signature:

declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b)
declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
declare {<4 x i32>, <4 x i1>} @llvm.uadd.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)

Now that we have started to get into writing our own backends and not relying exclusively on LLVM, I'm seeing the flaw: writing the result through a pointer parameter makes it too hard to use a special value returned from the builtin and detect the pattern that allows lowering to the efficient code.

For example, in the x86 backend code, the MCValues tagged union can represent a value that is partially in the condition flags and partially in a register, which is exactly what would be the return type of one of these arithmetic overflow functions. However, loading through the result pointer messes this mechanism up.

Likewise in a debug build even with the LLVM backend, it messes up LLVM's ability to do the same. It ends up writing the overflow flag to a stack value, which then gets fetched via the result pointer.

Furthermore, the result pointer does not harmonize with SIMD vectors (related: #6835).

I propose that the arithmetic overflow functions return a tuple, like this (related: #6771, #4335):

@addOverflow(a: T, b: T) tuple {T, u1}
@addOverflow(a: @Vector(T, N), b: @Vector(T, N)) tuple {@Vector(T, N), @Vector(u1, N)}

With #498 we can now look at our parseInt function again:

fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T {
    var x: T = 0;

    for (buf) |c| {
        const digit = switch (c) {
            '0'...'9' => c - '0',
            'A'...'Z' => c - 'A' + 10,
            'a'...'z' => c - 'a' + 10,
            else => return error.InvalidCharacter,
        };
        x, const mul_overflow = @mulWithOverflow(T, x, radix);
        if (mul_overflow != 0) return error.Overflow;

        x, const add_overflow = @addWithOverflow(T, x, digit);
        if (add_overflow != 0) return error.Overflow;
    }

    return x;
}

It's effectively the same code, except we have eliminated the use of a mutable variable, and it is now possible for even simple, unoptimizing backends to quickly lower this to efficient runtime code.

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Nov 30, 2021
@andrewrk andrewrk added this to the 0.10.0 milestone Nov 30, 2021
@RogierBrussee
Copy link

Any reason to not to have

@addOverflow(a: T, b: T) Overflow!T
or
@addOverflow(a:T, b:T) ?T

which seems to be pretty much made for this kind thing?

@andrewrk
Copy link
Member Author

andrewrk commented Dec 6, 2021

Yes. This builtin returns not only the overflow bit but the other bits as well.

@andrewrk andrewrk added the accepted This proposal is planned. label Dec 7, 2021
@tecanec
Copy link
Contributor

tecanec commented Dec 13, 2021

@RogierBrussee Here are a couple of use cases for getting both the overflowed result and the excess carry bit:

/// An incredibly cheap oscillator. Each step is only one instruction on some ISAs.
/// May be useful on some low-end systems, or if you're using a lot of parallel oscillators.
const OverflowOscillator = struct {
    x: u32,

    fn step(self: *OverflowOscillator, freq: u32) bool {
        self.x, const overflow = @addWithOverflow(u32, self.x, freq);
        return overflow;
    }
};

/// An int using an arbitrary number of bytes.
/// This is a very primitive implementation, but more advanced implementations might use the same concept.
/// On some ISAs with low word sizes, a similar technique can be used forr emulating larger ints, such as 
/// doing 32-bit arithmetics on a 8-bit CPU.
const ArbPrecInt = struct {
    val: []u8,

    fn increment(self: *ArbPrecInt) !void {
        var carry: bool = true;
        for (self.val) |*byte| {
            if (!carry) {
                break;
            }
            val.*, carry= @addWithOverflow(u8, val.*, 1);
        }
        if (carry) {
            // Handle resize or return an error
        }
    }

    // Other operations may be similarly implemented using @addWithOverflow and @mulWithOverflow.
};

@matu3ba
Copy link
Contributor

matu3ba commented Feb 6, 2022

I can confirm the findings https://gist.github.com/matu3ba/f848765e3c67ee74781b57858049d692

  './addo_noptr' ran
    1.06 ± 0.03 times faster than './addo_fast'
    1.15 ± 0.04 times faster than './addo_simple'

Since the benchmarks show 1.10x speedup of the Hacker's Delight solution vs the simple overflow test everyone does, I will provide __addo and __subo for compiler_rt.

matu3ba added a commit to matu3ba/zig that referenced this issue Feb 10, 2022
change signature of arithmetic operations @addwithOverflow,
@subWithOverflow, @mulWithOverflow, shlWithOverflow from
  @operation(comptime T: type, a: T, b: T, result: *T) bool
to
  @operation(comptime T: type, a: T, b: T) anytype
with anytype being a tuple
  struct { res: T, ov: bool }

This removes the pointer store and load for efficiency of codegen.
Comptime operation is accordingly kept in sync.

closes ziglang#10248
joachimschmidt557 pushed a commit to joachimschmidt557/zig that referenced this issue Mar 24, 2022
change signature of arithmetic operations @addwithOverflow,
@subWithOverflow, @mulWithOverflow, shlWithOverflow from
  @operation(comptime T: type, a: T, b: T, result: *T) bool
to
  @operation(comptime T: type, a: T, b: T) anytype
with anytype being a tuple
  struct { res: T, ov: bool }

This removes the pointer store and load for efficiency of codegen.
Comptime operation is accordingly kept in sync.

closes ziglang#10248
@joachimschmidt557
Copy link
Member

What is the new signature for the builtin functions? The initial issue comment contains two differing versions in the specification and in the example code. Is it

@addWithOverflow(a: T, b: T) tuple {T, u1}

or

@addWithOverflow(comptime T: type, a: T, b: T) tuple {T, u1}

Asking because I'm working on implementing this change in the self-hosted compiler (on top of #10854).

@wooster0
Copy link
Contributor

I could be missing something but was it ever discussed how the result of @addWithOverflow etc. is actually assigned and used?
What I'm seeing in this issue are these:

        x, const add_overflow = @addWithOverflow(T, x, digit);
        self.x, const overflow = @addWithOverflow(u32, self.x, freq);
            val.*, carry= @addWithOverflow(u8, val.*, 1);

which doesn't look like Zig code at all to me. Surely this is not how it's going to look anyway? Or is this un-Ziggy assignment syntax part of #4335 and I'm missing it?

I don't know what the accepted way was to access the members of a tuple but I think it was @"0" etc., right?
So technically it would look like this, right?

        const op = @addWithOverflow(@as(u8, 128), @as(u8, 64));
        _ = op.@"0";
        _ = op.@"1";

I think this is not optimal in terms of readability.
Why don't we use a tuple with named fields (so a regular struct)? So the signature would be:

@addWithOverflow(a: T, b: T) struct { result: T, overflow: u1 }

and used like this:

        const op = @addWithOverflow(@as(u8, 128), @as(u8, 64));
        _ = op.result;
        _ = op.overflow;

which I think is just much easier to understand. Compared to this using a tuple with unnamed fields could require you to consult the langref.

Vexu added a commit to Vexu/zig that referenced this issue Nov 23, 2022
It should return a a vector of bools for compatibility with scalar
operands and stage1 until ziglang#10248 can be implemented.
Vexu added a commit to Vexu/zig that referenced this issue Nov 23, 2022
It should return a a vector of bools for compatibility with scalar
operands and stage1 until ziglang#10248 can be implemented.

Closes ziglang#13201
Vexu added a commit to Vexu/zig that referenced this issue Nov 25, 2022
It should return a a vector of bools for compatibility with scalar
operands and stage1 until ziglang#10248 can be implemented.

Closes ziglang#13201
Vexu added a commit to Vexu/zig that referenced this issue Nov 26, 2022
It should return a a vector of bools for compatibility with scalar
operands and stage1 until ziglang#10248 can be implemented.

Closes ziglang#13201
@likern
Copy link

likern commented Dec 22, 2022

@andrewrk Is there proposal or discussion for this syntax?

x, const mul_overflow = @mulWithOverflow(T, x, radix);

@Vexu
Copy link
Member

Vexu commented Dec 22, 2022

andrewrk Is there proposal or discussion for this syntax?

x, const mul_overflow = @mulWithOverflow(T, x, radix);

#498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
8 participants