Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.sum on flat 1-D numpy array: different results between axis=None and axis=-1 #1241

Closed
kreczko opened this issue Jan 21, 2022 · 9 comments
Closed

Comments

@kreczko
Copy link

kreczko commented Jan 21, 2022

Version of Awkward Array

1.5.1

Description and code to reproduce

I might be misunderstanding something, but for a non-jagged, 1-D array I would expect ak.sum to return the same result independent of the axis. However, only axis = None agrees.

The example code

import awkward as ak
import numpy as np

with open('eventweights.npy', 'rb') as f:
        np_array = np.load(f)
    
    np_sum = np.sum(np_array) # calls ak.sum for awkward arrays
    for axis in [None, -1, 0]:
        ak_sum = ak.sum(np_array, axis=axis)
        print(f"axis = {axis}: np_sum = {np_sum}; ak_sum = {ak_sum}; diff = {np_sum - ak_sum}")

produces

axis = None: np_sum = 75.38936614990234; ak_sum = 75.38936614990234; diff = 0.0
axis =   -1: np_sum = 75.38936614990234; ak_sum = 75.3895034790039; diff = -0.0001373291015625
axis =    0: np_sum = 75.38936614990234; ak_sum = 75.3895034790039; diff = -0.0001373291015625

which is a small, but relevant difference (the example uses event weights).

Input file: eventweights.zip

Note: I added the input file which includes 1486 entries which are typically < 1. I tried to reproduce the issue with smaller numpy arrays, but there they all agree. We've seen issues like these in the past with TH2F vs TH2D, but I am not sure what the culprit is here

Is that expected or a bug?

@kreczko kreczko added the bug (unverified) The problem described would be a bug, but needs to be triaged label Jan 21, 2022
@agoose77
Copy link
Collaborator

agoose77 commented Jan 21, 2022

I think this is probably expected behaviour. When you call ak.sum with axis=None, it flattens the array and then calls the NumPy sum function. Only if you pass a non-None axis does Awkward use its own kernel. In this case, unlike NumPy, Awkward does not do anything special in the sum routine. We can mimic this by using a scalar float32 array:

Consider a naive sum:

def sum_t(x):
    tot = np.array(0, dtype=x.dtype)
    for y in x:
        tot += y
    return tot

Evaluating this for your array gives me:

>>> sum_t(x) - np.sum(x)
0.0001373291

Now, with a pairwise sum with N=128:

def pairwise_t(x):
    if len(x) < 128:
        return sum_t(x)
    m = math.floor(len(x)/2)
    return pairwise_t(x[:m])+pairwise_t(x[m:])

Evaluating this for your array gives me

>>> pairwise_t(x) - np.sum(x)
1.5258789e-05

which is a factor of ten closer. I suspect the rest of the difference might be due to the fact that NumPy does more here with vectorization that effectively reduces N. Incidentally, I wonder @jpivarski if we've considered something like pairwise summing for Awkward?

@kreczko
Copy link
Author

kreczko commented Jan 21, 2022

I see, so this is a speed vs accuracy trade-off.
Found this old numpy discussion and a related project - accupy.

I guess from a science perspective it is more important to understand the uncertainty of a given method - accuracy is key (sometimes).

@jpivarski jpivarski added question and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Jan 21, 2022
@jpivarski
Copy link
Member

It's not a bug. I'm not 100% sure whether the NumPy discussion about np.math.fsum is relevant here, since this code path uses np.ndarray.sum, and I don't know whether those two are the same. (np.math.fsum is not the same function as np.sum, for instance.)

axis=None and axis=-1 do take completely different code paths:

https://github.com/scikit-hep/awkward-1.0/blob/a1a03dbe95c6b53607d1c8d4b40b5848f007b6b6/src/awkward/operations/reducers.py#L334-L349

Fundamentally, they would never be exactly the same because this is explicitly adding the numbers in a different order and floating point addition is not associative. The axis=None code path takes every separate array buffer within the ak.Array, sums each buffer separately, and then adds together those results. The axis=-1 code path goes linearly from the beginning to the end in the currently-implemented cpu-kernels. The future cuda-kernels would add them in parallel, using this completely different algorithm, which is a bit closer to tree-reduce than left-to-right.

However, there's more going on because your values are all np.float32. The arrays are internally added by NumPy, which uses the available 32 bits of precision, but the sum over arrays is done by Python (reduce and +). This might be adding the numbers as Python float, which is 64-bit. That would definitely change the precision.

The Awkward 2.0 implementation of this uses only NumPy:

https://github.com/scikit-hep/awkward-1.0/blob/3af8b016140f3ed02cd2f933dee675c6e115e1b0/src/awkward/_v2/operations/reducers/ak_sum.py#L189-L202

but I just got notified (while writing this comment) that it's about 2× slower than the v1 implementation, so I'll be looking into that. We might revert to using reduce if there's a factor of 2 involved.

@jpivarski
Copy link
Member

it's about 2× slower than the v1 implementation, so I'll be looking into that. We might revert to using reduce if there's a factor of 2 involved.

Which is now fixed: #1245

So—no performance issues, but the floating point accuracy is unchanged. I replaced the + with explicit np.add, though it turns out that Python was adding them as np.float32 objects, anyway.

@agoose77
Copy link
Collaborator

agoose77 commented Jan 21, 2022

@jpivarski yes, fsum is a method less vulnerable to loss of significance, and I don't believe it is related to the issue here.

IIRC that reduction is just over the separate parts of the array. In this case, Awkward shouldn't perform a reduction because completely_flatten returns the array for a NumPy array. In this case, only the none axis=None paths differ, which is where the kernel reducer is used. The example code above which performs a naive float32 sum produces the same result, which makes it clear that this difference between numpy and Awkward is from naive summation (I'm pretty sure NumPy's reduction is using float32 too). I checked and the Awkward kernel is also using float32 until it casts the result to 64 bits.

Evidence for this being a loss of significance problem (to me) is that when one uses a less naive sum, the difference approaches the order of the float precision rather than 1e-4. If we use np.sum but shuffle the order, we get the same result.

@jpivarski
Copy link
Member

Evidence for this being a loss of significance problem (to me) is that when one uses a less naive sum, the difference approaches the order of the float precision rather than 1e-4.

What do you mean by that? Is there something that we could be doing differently that would handle the precision better?

As an updated pointer, this is now the state of the art (in v2 after #1245):

https://github.com/scikit-hep/awkward-1.0/blob/b20413193caa2dd1b75272753f8680ba4cd9dd92/src/awkward/_v2/operations/reducers/ak_sum.py#L189-L204

where completely_flatten just returns the underlying arrays (only one here), without modification.

@agoose77
Copy link
Collaborator

agoose77 commented Jan 22, 2022

What do you mean by that? Is there something that we could be doing differently that would handle the precision better?

The loss of significance that I'm referring to above is in the kernel, where we are currently just using a for loop over the indices, and accumulating a running total.

for (int64_t i = 0;  i < lenparents;  i++) {
    toptr[parents[i]] += (OUT)fromptr[i];
}

Obviously this is slightly more complicated than a single sum as we perform multiple sums here.

There are methods like Kahan summation whose theoretical worst case is relatively independent of the number of values to be added, but this particular example is reasonably slower than the naive (simple) sum. NumPy uses a middle-of-the-road solution that performs pair-wise summation - diving the array into chunks, with the idea being that the error is bounded by the error of summing naively within one chunk:
https://github.com/numpy/numpy/blob/c30876f6411ef0c5365a8e4cf40cc3d4ba41196c/numpy/core/src/umath/loops_utils.h.src#L77-L138

I think that we could reasonably update our sum kernels to do this without much loss of performance.

@kreczko
Copy link
Author

kreczko commented Jan 24, 2022

Hi @agoose77 and @jpivarski,

Thank you for the illuminating discussion. I think I understand the discrepancy now.
It seems whatever the solution, we will need to see how to best attribute and uncertainty to any operation performed.

In the meantime, @jpivarski, you mentioned the v2 - I've seen references to it in the code. Is there an easy way to switch between v1 and v2 implementations?

@jpivarski
Copy link
Member

@agoose77 Oh, I see! I was just considering the fact that non-associativity of floating point addition implies that different algorithms will give different results, but you were talking about using associativity to get closer to the correct result. I've seen something similar to Kahan summation for computing variances and standard deviations, where the numerical stability is a bigger issue. (Histogrammar uses it.) The reductions that we do in Awkward Array's cpu-kernels,

https://github.com/scikit-hep/awkward-1.0/blob/71b0a16ccf093fadc7892bd59362e43f0979e904/src/cpu-kernels/awkward_reduce_sum.cpp#L7-L21

might be able to benefit from that. The fact that NumPy doesn't do a naive, left-to-right sum would explain why axis=None (which uses NumPy) is different from axis=0 (which uses the above, with parents[i] = 0 for all i). If the fact that this has to be a jagged reducer (general parents) prevents a Kahan version of it, at least axis=0 might be implemented that way, as a special case.

@kreczko, the central place for talking about v2, so far, is #1151. You can switch between v1 and v2 with

import awkward._v2 as ak

but it's not all there yet. All of the functions for creating arrays exist (so that we can run the tests), and some of the functions exist. It's enough for https://github.com/continuumIO/dask-awkward/ to be able to develop, so far.

P.S. If you create a v2 Array, call .show() on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants