Consider adding sumproduct() or dotproduct() to the math module #100485

rhettinger · 2022-12-24T01:59:20Z

I was reviewing the itertools recipes to see whether some were worth promoting to be builtin tools. The dotproduct() recipe was the best candidate. To non-matrix people this is known as sumproduct() and it comes up in many non-vector applications, possibly the most common being sum([price * quantity for price, quantity in zip(prices, quantities)]) and the second most common being weighted averages.

The current version of the recipe is:

   def dotproduct(vec1, vec2):
       "Compute a sum of products."
       return sum(starmap(operator.mul, zip(vec1, vec2, strict=True)))

If we offered this as part of the math module, we could make a higher quality implementation.

For float inputs or mixed int/float inputs, we could square and sum in quad precision, making a single rounding at the end. This would make a robust building block to serve as a foundation for users to construct higher level tools. It is also something that is difficult for them to do on their own.

Linked PRs

The text was updated successfully, but these errors were encountered:

PurityLake · 2022-12-24T04:51:45Z

@rhettinger working on implementing this feature. I'm curious, how would one be expected to use quad precision in this context? Should I just write a multiplication and addition function for a custom quad precision data structure?

tim-one · 2022-12-24T05:03:14Z

how would one be expected to use quad precision in this context

Raymond's use of "quad precision" was an active link to an article spelling out what he means. It's a software technique for getting twice native precision using single native precision float operations, and for that reason is usually called "doubled precision". Since Python's "float" is IEEE-754 "double precision" on most (all?) platforms, twice that leads to "quad".

If you look at the edit history of Raymond's post, at first it had a prototype implementation in pure Python.

rhettinger · 2022-12-24T05:44:41Z

@rhettinger working on implementing this feature

@PurityLake Please leave that for me. I've already done substantial work on it and opened this issue to get buy in before going further.

tim-one · 2022-12-24T05:48:24Z

to get buy in

Not needed - it would self-evidently be nice to have 😄

rhettinger · 2023-01-03T01:40:03Z

Here's the logic I'm planning to use (and the tests to make sure it correct). If you all want something different or can suggest improvements, please let me know.

'Pure Python implementation of sumprod() with accuracy enhancements.'

from math import fabs, fsum, frexp, isfinite
from sys import stderr

verbose = False

def accumulate(total, frac, x):
    if fabs(total) < fabs(x):
        x, total = total, x
    t = total + x
    frac += (total - t) + x
    return t, frac

def split(x, VELTKAMP_CONSTANT=float(0x8000001)):
    t = x * VELTKAMP_CONSTANT
    hi = t - (t - x)
    lo = x - hi
    assert not isfinite(hi) or (frexp(hi)[0] * 2**26).is_integer()
    assert not isfinite(hi) or (frexp(lo)[0] * 2**26).is_integer()
    assert not isfinite(hi) or hi + lo == x
    return hi, lo

def multiply(p, q):
    hp, lp = split(p)
    hq, lq = split(q)
    parts = hp*hq, hp*lq, hq*lp, lp*lq
    assert not all(map(isfinite, parts)) or fsum(parts) == p * q
    return parts

def both_int(p, q):
    return {type(p), type(q)} == {int}

def one_float_one_int(p, q):
    return {type(p), type(q)} == {int, float}

def both_float(p, q):
    return {type(p), type(q)} == {float}

def PyLong_AsLongAndOverflow(p):
    p = int(p)
    if p.bit_length() > 63:
        raise OverflowError
    return p

def check_long_mult_overflow(p, q):
    "Multiply longs and raise if too large for a longlong."
    result = p * q
    if result.bit_length() > 63:
        raise OverflowError
    return result

def PyFloat_AsDouble(p):
    assert type(p) in {int, float}
    return float(p)  # This can raise OverflowError

NULL = object()

def sumprod(p, q, verbose=False):
    p_stopped = q_stopped = False
    int_path_enabled = flt_path_enabled = True
    int_total_in_use = flt_total_in_use = False
    int_total = 0
    flt_total = flt_frac = 0.0
    obj_total = 0

    p_it = iter(p)
    q_it = iter(q)
    while True:
        p_i = next(p_it, NULL)
        if p_i is NULL:
            p_stopped = True
        q_i = next(q_it, NULL)
        if q_i is NULL:
            q_stopped = True
        if p_stopped != q_stopped:
            raise ValueError('Inputs are not the same length')
        finished = p_stopped & q_stopped

        if int_path_enabled:

            if not finished and both_int(p_i, q_i):
                try:
                    if verbose: print('I', end='', file=stderr)
                    int_p = PyLong_AsLongAndOverflow(p_i)
                    int_q = PyLong_AsLongAndOverflow(q_i)
                    int_total += check_long_mult_overflow(int_p, int_q)
                    int_total_in_use = True
                    continue
                except OverflowError:
                    pass

           # We're finished, overflowed, or have a non-int
            int_path_enabled = False
            if int_total_in_use:
                obj_total += int(int_total)
                int_total_in_use = False

        if flt_path_enabled:

            if not finished and (both_float(p_i, q_i) or one_float_one_int(p_i, q_i)):
                try:
                    if verbose: print('F', end='', file=stderr)
                    flt_p = PyFloat_AsDouble(p_i)
                    flt_q = PyFloat_AsDouble(q_i)
                    for part in multiply(flt_p, flt_q):
                        flt_total, flt_frac = accumulate(flt_total, flt_frac, part)
                    flt_total_in_use = True
                    if isfinite(flt_total):
                        continue
                    # If a non-finite value arises, fallback to the slow path
                    flt_total = flt_frac = 0.0
                except OverflowError:
                    pass

            # We're finished, overflowed, have a non-float, or have a non-finite value
            flt_path_enabled = False
            if flt_total_in_use:
                flt_total += flt_frac
                obj_total += float(flt_total)
                flt_total_in_use = False

        assert not int_total_in_use
        assert not flt_total_in_use

        if finished:
            return obj_total

        if verbose: print('O', end='', file=stderr)
        obj_total += p_i * q_i


if __name__ == '__main__':

    from decimal import Decimal
    from fractions import Fraction
    from itertools import product
    from math import isnan
    from random import randrange

    def baseline_sumprod(p, q):
        "This defines the target behavior including expections and special values."
        total = 0
        for p_i, q_i in zip(p, q, strict=True):
            total += p_i * q_i
        return total

    def run(func, *args):
        "Make comparing functions easier. Returns error status, type, and result."
        try:
            result = func(*args)
        except Exception as e:
            return 'Error', type(e), None
        if isinstance(result, (float, Decimal)) and isnan(result):
            result = 'NaN'
        return None, type(result), result,

    pool = [3.75, 2.5, -1.5, 0.0, 1.0,
            float('inf'), -float('inf'), float('NaN'),
            3, -5, 7, 0, 1, 10*2**64, 10**1000,
            Fraction(7, 8), Fraction(21, 4), Fraction(-41, 2),
            Fraction(0), Fraction(1),
            Decimal('6.125'), Decimal('12.125'), Decimal('-2.125'),
            Decimal(0), Decimal(1),
            Decimal('Inf'), -Decimal('Inf'), Decimal('NaN'),
    ]

    args = ((), ())
    assert run(baseline_sumprod, *args) == run(sumprod, *args)
    print('Done 0')

    for a, b in product(pool, repeat=2):
        args = (a,), (b,)
        assert run(baseline_sumprod, *args) == run(sumprod, *args)
    print('Done 2')

    for m, n in product(range(5), repeat=2):
        args = [randrange(100)/16 for i in range(m)], [randrange(100)/16 for i in range(n)]
        assert run(baseline_sumprod, *args) == run(sumprod, *args)
    print('Done uneven')

    for a, b, c, d in product(pool, repeat=4):
        args = (a, b), (c, d)
        assert run(baseline_sumprod, *args) == run(sumprod, *args)
    print('Done 4')

    alt_pool = (1e200, 2e200, 5e200, 0.0, -1e200, -2e200, -5e200)
    for a, b, c, d, e, f in product(pool, repeat=6):
        args = (a, b, c), (d, e, f)
        assert run(baseline_sumprod, *args) == run(sumprod, *args)
    print('Done overflow 6')

    for a, b, c, d, e, f in product(pool, repeat=6):
        args = (a, b, c), (d, e, f)
        assert run(baseline_sumprod, *args) == run(sumprod, *args)
    print('Done regular 6')

rhettinger · 2023-01-07T19:28:19Z

I checked in the core logic so that we could have a baseline for further improvements. The beta release is still far away. Suggestions are welcome. Here are some topics that I'm mulling over:

Determine whether 2SUM would be faster than Fast2Sum with a fabs test. The disassembly from both Intel and M1 Arm builds show that Fast2Sum is compiled without branches. The x86_64 code selects between the results with cmpnlesd and andnpd. The AArch64 code uses fcmp and fcsel to the same effect.
The API is a bit awkward for some inputs. To compute the sum of squares from an iterator input, you need sumprod(*tee(it)). Inputs consisting of 2-tuples need to be regrouped with sumprod(*zip(*paired)). Generator expression inputs work fine but should be formatted on two lines for readabilty:

sumprod(((xi - xbar) for xi in x),
        ((yi - ybar) for yi in y))

For grins, I'm experimenting with 2-fold Neumaier summation. At a cost of an extra 2Sum call, the error bound becomes proportional to ε³·n²·Σ|aᵢ|. The extra factor of ε would only be relevant in cases where large terms mostly cancel (this is typical for some use cases; for example, summing the moments on a lever).

def twofold(data):
    ce = e = b = 0.0
    for n in data:
        bb, ee = twosum(b, n)
        eee, cce = twosum(e, ee)
        b = bb
        e = eee
        ce += cce
    return cce + b + e

It might be worthwhile to support a high accuracy code path for complex numbers.
The MS Excel implementation generalizes sumprod to handle an arbitrary number of input vectors. If there were valid use cases, we could do the same:

def sumprod(*vecs):
    return sum(map(math.prod, zip(*vecs, strict=True)))

>>> sumprod([10, 20, 30, 40], [2, 4, 6, 8], [1, 2, 3, 4])
2000
>>> 10*2*1 + 20*4*2 + 30*6*3 + 40*8*4
2000

The test_sumprod_stress() unittest does too much work. Running test_math now takes 15 seconds.

rhettinger · 2023-01-08T06:19:23Z

Determine whether 2SUM would be faster than Fast2Sum

2SUM was faster so I put it in.

For grins, I'm experimenting with 2-fold Neumaier summation.

Interestingly, this only makes the code two lines longer. The running time only increases by 12%. Worth it?

gpshead · 2023-01-08T08:05:54Z

Please add @support.requires_resource('cpu') to test_sumprod_stress.

hugovk · 2023-01-08T15:19:58Z

It would be good to include this in What's New.

https://docs.python.org/3.12/whatsnew/3.12.html

rhettinger · 2023-01-08T19:39:05Z

@gpshead Done.
@hugovk Done.

rhettinger · 2023-01-09T02:20:27Z

Fun fact, sumprod can be used to make a single rounding fused-mulitply-add:

def fma(x, y, z):
    'Return x * y + z with a single rounding'
    return sumprod((x, z), (y, 1.0))

TrangOul · 2023-01-12T08:24:09Z

If we're going to have an even more efficient method, the dot product example from itertools should be replaced with something else.

…ariant. GH-101383)

…ased variant. pythonGH-101383)

…lementation is not available (#101567)

Dutcho · 2023-03-12T16:52:10Z

@rhettinger
I'd indeed like below:

The MS Excel implementation generalizes sumprod to handle an arbitrary number of input vectors. If there were valid use cases, we could do the same:
def sumprod(*vecs):
    return sum(map(math.prod, zip(*vecs, strict=True)))

>>> sumprod([10, 20, 30, 40], [2, 4, 6, 8], [1, 2, 3, 4])
2000
>>> 10*2*1 + 20*4*2 + 30*6*3 + 40*8*4
2000

Example use case (as variant to the motivating example at the very top message of this thread):
sum([price * quantity * (1 - discount) * (1 + tax_rate) for price, quantity, discount, tax_rate in zip(prices, quantities, discounts, tax_rates)])
which would then become:
sumprod(prices, quantities, (1 - discount for discount in discounts), (1 + tax_rate for tax_rate in tax_rates))

I'd welcome this generalisation, but perhaps my comment is too late because sumproduct was added to 3.12a6 and this issue was closed.

TrangOul · 2023-03-20T15:51:29Z

It should not be too late for such an improvement yet. Beta 1 is the point when no new features are supposed to be added.
@Dutcho, maybe you should open a new issue, though.

rhettinger added type-feature A feature request or enhancement 3.12 bugs and security fixes labels Dec 24, 2022

bedevere-bot mentioned this issue Jan 2, 2023

GH-100485: Add math.sumprod() #100677

Merged

rhettinger added a commit that referenced this issue Jan 7, 2023

GH-100485: Add math.sumprod() (GH-100677)

47b9f83

rhettinger closed this as completed Jan 7, 2023

rhettinger reopened this Jan 7, 2023

bedevere-bot mentioned this issue Jan 8, 2023

GH-100485: Convert from Fast2Sum to 2Sum #100836

Merged

rhettinger added a commit that referenced this issue Jan 8, 2023

GH-100485: Convert from Fast2Sum to 2Sum (GH-100836)

df3851f

bedevere-bot mentioned this issue Jan 8, 2023

GH-100485: Tweaks to sumprod() #100857

Merged

rhettinger added a commit that referenced this issue Jan 8, 2023

GH-100485: Tweaks to sumprod() (GH-100857)

b139bcd

rhettinger closed this as completed Jan 8, 2023

bedevere-bot mentioned this issue Jan 28, 2023

GH-100485: Add extended accuracy test. Switch to faster fma() based variant. #101383

Merged

rhettinger added a commit that referenced this issue Jan 28, 2023

GH-100485: Add extended accuracy test. Switch to faster fma() based v…

84483aa

…ariant. GH-101383)

bedevere-bot mentioned this issue Jan 28, 2023

GH-100485: Tighter accuracy testing of sumprod #101397

Closed

mdboom pushed a commit to mdboom/cpython that referenced this issue Jan 31, 2023

pythonGH-100485: Add extended accuracy test. Switch to faster fma() b…

dc5314f

…ased variant. pythonGH-101383)

bedevere-bot mentioned this issue Feb 4, 2023

GH-100485: Create an alternative code path when an accurate fma() implementation is not available #101567

Merged

rhettinger added a commit that referenced this issue Feb 4, 2023

GH-100485: Create an alternative code path when an accurate fma() imp…

5a2b984

…lementation is not available (#101567)

vstinner mentioned this issue Oct 5, 2023

test_math.test_sumprod_stress() takes 5 minutes on debug build #110327

Closed

TrangOul mentioned this issue Nov 8, 2023

Extend math.sumprod to handle arbitrary number of input iterables #111840

Closed

clairvoyante mentioned this issue Oct 13, 2024

Accurate fma operation for Windows System UXARRAY/uxarray#661

Open

clairvoyante mentioned this issue Nov 6, 2024

test_math fails on Windows #126252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding sumproduct() or dotproduct() to the math module #100485

Consider adding sumproduct() or dotproduct() to the math module #100485

rhettinger commented Dec 24, 2022 •

edited by bedevere-bot

Loading

PurityLake commented Dec 24, 2022

tim-one commented Dec 24, 2022

rhettinger commented Dec 24, 2022

tim-one commented Dec 24, 2022

rhettinger commented Jan 3, 2023 •

edited by AlexWaygood

Loading

rhettinger commented Jan 7, 2023 •

edited

Loading

rhettinger commented Jan 8, 2023

gpshead commented Jan 8, 2023

hugovk commented Jan 8, 2023

rhettinger commented Jan 8, 2023

rhettinger commented Jan 9, 2023

TrangOul commented Jan 12, 2023

Dutcho commented Mar 12, 2023 •

edited

Loading

TrangOul commented Mar 20, 2023 •

edited

Loading

Consider adding sumproduct() or dotproduct() to the math module #100485

Consider adding sumproduct() or dotproduct() to the math module #100485

Comments

rhettinger commented Dec 24, 2022 • edited by bedevere-bot Loading

Linked PRs

PurityLake commented Dec 24, 2022

tim-one commented Dec 24, 2022

rhettinger commented Dec 24, 2022

tim-one commented Dec 24, 2022

rhettinger commented Jan 3, 2023 • edited by AlexWaygood Loading

rhettinger commented Jan 7, 2023 • edited Loading

rhettinger commented Jan 8, 2023

gpshead commented Jan 8, 2023

hugovk commented Jan 8, 2023

rhettinger commented Jan 8, 2023

rhettinger commented Jan 9, 2023

TrangOul commented Jan 12, 2023

Dutcho commented Mar 12, 2023 • edited Loading

TrangOul commented Mar 20, 2023 • edited Loading

rhettinger commented Dec 24, 2022 •

edited by bedevere-bot

Loading

rhettinger commented Jan 3, 2023 •

edited by AlexWaygood

Loading

rhettinger commented Jan 7, 2023 •

edited

Loading

Dutcho commented Mar 12, 2023 •

edited

Loading

TrangOul commented Mar 20, 2023 •

edited

Loading