BUG DDrppi buggy for cross-corr in periodic spaces #210

beckermr · 2020-01-28T16:06:27Z

General information

Corrfunc version: 2.3.2
platform: osx
installation method (pip/source/other?): conda

Issue description

I am computing the pairs by hand and not getting what corrfunc reports.

Expected behavior

It should match the brute force computation.

Actual behavior

It doesn't match it.

What have you tried so far?

A bunch of stuff to try and see if it will miss one particle. It does not, so the bug is more subtle. It does not appear to be a bin edge thing.

Minimal failing example

import numpy as np
import Corrfunc

lbox = 120
zmax = 60
n_pi = int(zmax)
rmin = 40
rmax = 45
n_r = 2
rbins = np.array([0, rmin, rmax])

lbox_2 = lbox / 2

npts = 11
rng = np.random.RandomState(seed=42)
x1 = rng.uniform(low=0, high=lbox, size=npts)
y1 = rng.uniform(low=0, high=lbox, size=npts)
z1 = rng.uniform(low=0, high=lbox, size=npts)
w1 = rng.uniform(size=npts) * 0 + 1

nparts = 100
px1 = rng.uniform(low=0, high=lbox, size=nparts)
py1 = rng.uniform(low=0, high=lbox, size=nparts)
pz1 = rng.uniform(low=0, high=lbox, size=nparts)

print("\n")

dd_cf = Corrfunc.theory.DDrppi(
    False,
    1,
    zmax,
    rbins,
    x1,
    y1,
    z1,
    weights1=w1,
    X2=px1,
    Y2=py1,
    Z2=pz1,
    weights2=np.ones_like(px1),
    periodic=True,
    boxsize=lbox,
    weight_type="pair_product",
    verbose=True,
    isa="fallback",
)
dd_cf = dd_cf["weightavg"].reshape((n_r, n_pi)) * dd_cf["npairs"].reshape(
    (n_r, n_pi)
)

print("\n")
print("testing particles one by one...")

dd = np.zeros(n_r)
for hind, (hx, hy, hz) in enumerate(zip(x1, y1, z1)):

    for px, py, pz in zip(px1, py1, pz1):
        dx = np.abs(hx - px)
        dy = np.abs(hy - py)
        dz = np.abs(hz - pz)

        if dx > lbox_2:
            dx = lbox - dx
        if dy > lbox_2:
            dy = lbox - dy
        if dz > lbox_2:
            dz = lbox - dz

        if dz <= zmax:
            r = np.sqrt(dx * dx + dy * dy)

            if r < rmin:
                dd[0] += w1[hind]
                if dd_cf[0, int(dz)] == 0:
                    print(
                        "    def missed part:",
                        r,
                        dz,
                        int(dz),
                        dd_cf[1, int(dz)],
                    )

            elif r >= rmin and r < rmax:
                dd[1] += w1[hind]
                if dd_cf[1, int(dz)] == 0:
                    print(
                        "    def missed part:",
                        r,
                        dz,
                        int(dz),
                        dd_cf[1, int(dz)],
                    )

print("dd corrfunc:", np.sum(dd_cf, axis=1))
print("dd brute   :", dd)

dd_cf_sum = np.sum(dd_cf, axis=1)
assert np.array_equal(dd_cf_sum, dd)

This script outputs

(anl) clarence:Desktop beckermr$ python corrfunc_test.py 


[Warning] The CPU supports AVX2 but the compiler does not.  Can you try another compiler?
ND1 =           11 [xmin,ymin,zmin] = [2.470139,16.739263,5.574050], [xmax,ymax,zmax] = [114.085717,116.389182,94.221115]
ND2 =          100 [xmin,ymin,zmin] = [0.662654,0.607390,1.727219], [xmax,ymax,zmax] = [118.426432,118.278054,118.806462]
Running with points in [xmin,xmax] = 0.662654,118.426432 with periodic wrapping = 120.000000
Running with points in [ymin,ymax] = 0.607390,118.278054 with periodic wrapping = 120.000000
Running with points in [zmin,zmax] = 1.727219,118.806462 with periodic wrapping = 120.000000
In gridlink_double> Running with [nmesh_x, nmesh_y, nmesh_z]  = 5,5,2.  Time taken =   0.000 sec
countpairs_rp_pi_double> gridlink seems inefficient. nmesh = (5, 5, 2); avg_np = 0.22. Boosting bin refine factor - should lead to better performance
xmin = 0.662654 xmax=118.426432 rpmax = 45.000000
In gridlink_double> Running with [nmesh_x, nmesh_y, nmesh_z]  = 8,8,2.  Time taken =   0.000 sec
In gridlink_double> Running with [nmesh_x, nmesh_y, nmesh_z]  = 8,8,2.  Time taken =   0.000 sec
Using fallback kernel
0%.........10%.........20%.........30%.........40%.........50%.........60%.........70%.........80%.........90%.........100% done. Time taken =  0.001 secs


testing particles one by one...
    def missed part: 41.087356477185 48.94941175529149 48 0.0
dd corrfunc: [362.  93.]
dd brute   : [387.  96.]
Traceback (most recent call last):
  File "corrfunc_test.py", line 99, in <module>
    assert np.array_equal(dd_cf_sum, dd)
AssertionError

beckermr · 2020-01-28T16:07:13Z

@aphearin @manodeep I have been banging my head against this for a week. Here is my most minimal example of something I don't understand. Hopefully, it is my code!

manodeep · 2020-01-28T20:04:52Z

Thanks for the bug report @beckermr - let me check whether I can reproduce it. I did have a couple of questions though - i) Does the bug still appear if you reduce zmax to a value smaller than Lbox/2, say 55 ? and ii) Can you reproduce the bug with Corrfunc installed from source/pip? (Related, I thought that the conda installed Corrfunc was called corrfunc - is that not the case?)

lgarrison · 2020-01-28T20:08:21Z

Just confirming that I can run your script and get the same output; thanks for making the minimal reproducer! Haven't had a chance to track down the bug yet.

…

On Tue, Jan 28, 2020 at 3:04 PM Manodeep Sinha ***@***.***> wrote: Thanks for the bug report @beckermr <https://github.com/beckermr> - let me check whether I can reproduce it. I did have a couple of questions though - i) Does the bug still appear if you reduce zmax to a value smaller than Lbox/2, say 55 ? and ii) Can you reproduce the bug with Corrfunc installed from source/pip? (Related, I thought that the conda installed Corrfunc was called corrfunc - is that not the case?) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#210?email_source=notifications&email_token=ABLA7S6STYSQOINX2HARYGTRACFWLA5CNFSM4KMVBGQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKEWPHQ#issuecomment-579430302>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA7S3JT5UZ7TAWWHTCM4TRACFWLANCNFSM4KMVBGQA> .

-- Lehman Garrison <[email protected]> Flatiron Research Fellow, Cosmology X Data Science Group Center for Computational Astrophysics, Flatiron Institute lgarrison.github.io

beckermr · 2020-01-28T20:18:19Z

@manodeep yes I can reproduce the bug in both of those cases.

manodeep · 2020-01-28T23:45:19Z

I suspect this is connected to the relative size of rmax and zmax to Lbox. For instance, if I change (xbin, ybin, zbin) refine-factors to (1, 1, 1), then we have many mis-matches. And if I increase all the bin-refine-factors to >= 3, then the bug disappears. Since allowing such large relative separations is a new feature introduced with #189 (via commit e53bce3), may be a bug was introduced.

@lgarrison At minimum, we will need a separate test case (from a brute force output) to test these sorts of large separations.

manodeep · 2020-02-01T10:43:17Z

If I disable the duplicate cell-checking in here, then the bug disappears. However, we need to work through the logic and make sure we are not double-counting under any circumstances...

lgarrison · 2020-02-01T14:23:24Z

@manodeep is it possible that we we are missing particles when we do duplicate cell elimination because we are only considering one "wrapping" direction? I think we need to consider both, the question is how to do so without double counting pairs. Probably the answer lies in some restriction llike Rmax < CellSize/2...

aphearin · 2020-02-01T17:19:07Z

Once this is resolved, do y'all have a brute force pair-counter test in the unit-testing to protect against regression? I find it's really hard to fool those kind of tests. Let me know in case that would be useful for me to pass along your way.

manodeep · 2020-02-01T20:13:56Z

@lgarrison That is definitely what is happening here. Consider zmax = Lbox/2, and we have 2 bins. In that case, the top half of iz=0 can be within zmax of the bottom half of the iz=1, and the bottom half of the iz=0 can be within the top half of iz=1 cell after periodic wrapping. However, we will only associate (iz=0, iz=1) once and without periodic wrapping. Essentially the duplicate check has to needs to make sure both directions of the cell-pairs are included. For this specific scenario, no double-counting should occur but I am not sure about all the possible combinations of bin-refs, min-sep-opt, and various zmax/Lbox ratios ranging between [0.3-0.5] (i.e, between 2 <= nzbins <= 3. I am reasonably confident that once nzbins > 3, Corrfunc works correctly. We would need to run the tests with the COMPREHENSIVE_INTEGRATION_TESTS` option enabled.

@aphearin Thanks for the offer - we will absolutely need to add tests for large Rmax/zmax for all the pair-counters.

@lgarrison In the meantime, should we revert the previous change (introduced in v2.3.2) for allowing large ratios of (Rmax, zmax)/Lboox?

lgarrison · 2020-02-01T20:19:14Z

@manodeep Yes, I think since we happen to be making a release right now anyways we should revert this change while we consider the best way to fix this. Let's double-check with @beckermr that it fixes his problem before releasing!

manodeep · 2020-02-01T20:36:32Z

@lgarrison If we revert the change, then Corrfunc will report an error for (nx, ny, nzbins < 3) and @beckermr will not be able to run this example

manodeep · 2023-06-14T11:57:37Z

@beckermr This should be fixed now with #277. As long as Rmax < Lbox/2, Corrfunc should correctly compute the pairs.

Closing the issue but please feel free to reopen if you are still facing any issues

beckermr · 2023-06-14T12:20:56Z

Thank you!

manodeep added a commit that referenced this issue Feb 3, 2020

Reverting change in #189 to prevent incorrect results (see #210)

ac52942

manodeep mentioned this issue Feb 3, 2020

Add a check for duplicate neighbouring cells within the theory pair counters (periodic) #192

Closed

manodeep added this to the v2.4.0 milestone Feb 3, 2020

manodeep modified the milestones: v2.4.0, v2.5.0 Sep 28, 2021

This was referenced Jul 19, 2022

Implement per-axis periodicity and box size #276

Merged

WIP: enable larger Rmax (up to half the box size) #277

Merged

manodeep closed this as completed Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG DDrppi buggy for cross-corr in periodic spaces #210

BUG DDrppi buggy for cross-corr in periodic spaces #210

beckermr commented Jan 28, 2020

beckermr commented Jan 28, 2020

manodeep commented Jan 28, 2020

lgarrison commented Jan 28, 2020 via email

beckermr commented Jan 28, 2020

manodeep commented Jan 28, 2020

manodeep commented Feb 1, 2020

lgarrison commented Feb 1, 2020

aphearin commented Feb 1, 2020

manodeep commented Feb 1, 2020

lgarrison commented Feb 1, 2020

manodeep commented Feb 1, 2020

manodeep commented Jun 14, 2023

beckermr commented Jun 14, 2023

BUG DDrppi buggy for cross-corr in periodic spaces #210

BUG DDrppi buggy for cross-corr in periodic spaces #210

Comments

beckermr commented Jan 28, 2020

General information

Issue description

Expected behavior

Actual behavior

What have you tried so far?

Minimal failing example

beckermr commented Jan 28, 2020

manodeep commented Jan 28, 2020

lgarrison commented Jan 28, 2020 via email

beckermr commented Jan 28, 2020

manodeep commented Jan 28, 2020

manodeep commented Feb 1, 2020

lgarrison commented Feb 1, 2020

aphearin commented Feb 1, 2020

manodeep commented Feb 1, 2020

lgarrison commented Feb 1, 2020

manodeep commented Feb 1, 2020

manodeep commented Jun 14, 2023

beckermr commented Jun 14, 2023