Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG DDrppi buggy for cross-corr in periodic spaces #210

Closed
beckermr opened this issue Jan 28, 2020 · 13 comments
Closed

BUG DDrppi buggy for cross-corr in periodic spaces #210

beckermr opened this issue Jan 28, 2020 · 13 comments
Milestone

Comments

@beckermr
Copy link

General information

  • Corrfunc version: 2.3.2
  • platform: osx
  • installation method (pip/source/other?): conda

Issue description

I am computing the pairs by hand and not getting what corrfunc reports.

Expected behavior

It should match the brute force computation.

Actual behavior

It doesn't match it.

What have you tried so far?

A bunch of stuff to try and see if it will miss one particle. It does not, so the bug is more subtle. It does not appear to be a bin edge thing.

Minimal failing example

import numpy as np
import Corrfunc

lbox = 120
zmax = 60
n_pi = int(zmax)
rmin = 40
rmax = 45
n_r = 2
rbins = np.array([0, rmin, rmax])

lbox_2 = lbox / 2

npts = 11
rng = np.random.RandomState(seed=42)
x1 = rng.uniform(low=0, high=lbox, size=npts)
y1 = rng.uniform(low=0, high=lbox, size=npts)
z1 = rng.uniform(low=0, high=lbox, size=npts)
w1 = rng.uniform(size=npts) * 0 + 1

nparts = 100
px1 = rng.uniform(low=0, high=lbox, size=nparts)
py1 = rng.uniform(low=0, high=lbox, size=nparts)
pz1 = rng.uniform(low=0, high=lbox, size=nparts)

print("\n")

dd_cf = Corrfunc.theory.DDrppi(
    False,
    1,
    zmax,
    rbins,
    x1,
    y1,
    z1,
    weights1=w1,
    X2=px1,
    Y2=py1,
    Z2=pz1,
    weights2=np.ones_like(px1),
    periodic=True,
    boxsize=lbox,
    weight_type="pair_product",
    verbose=True,
    isa="fallback",
)
dd_cf = dd_cf["weightavg"].reshape((n_r, n_pi)) * dd_cf["npairs"].reshape(
    (n_r, n_pi)
)

print("\n")
print("testing particles one by one...")

dd = np.zeros(n_r)
for hind, (hx, hy, hz) in enumerate(zip(x1, y1, z1)):

    for px, py, pz in zip(px1, py1, pz1):
        dx = np.abs(hx - px)
        dy = np.abs(hy - py)
        dz = np.abs(hz - pz)

        if dx > lbox_2:
            dx = lbox - dx
        if dy > lbox_2:
            dy = lbox - dy
        if dz > lbox_2:
            dz = lbox - dz

        if dz <= zmax:
            r = np.sqrt(dx * dx + dy * dy)

            if r < rmin:
                dd[0] += w1[hind]
                if dd_cf[0, int(dz)] == 0:
                    print(
                        "    def missed part:",
                        r,
                        dz,
                        int(dz),
                        dd_cf[1, int(dz)],
                    )

            elif r >= rmin and r < rmax:
                dd[1] += w1[hind]
                if dd_cf[1, int(dz)] == 0:
                    print(
                        "    def missed part:",
                        r,
                        dz,
                        int(dz),
                        dd_cf[1, int(dz)],
                    )

print("dd corrfunc:", np.sum(dd_cf, axis=1))
print("dd brute   :", dd)

dd_cf_sum = np.sum(dd_cf, axis=1)
assert np.array_equal(dd_cf_sum, dd)

This script outputs

(anl) clarence:Desktop beckermr$ python corrfunc_test.py 


[Warning] The CPU supports AVX2 but the compiler does not.  Can you try another compiler?
ND1 =           11 [xmin,ymin,zmin] = [2.470139,16.739263,5.574050], [xmax,ymax,zmax] = [114.085717,116.389182,94.221115]
ND2 =          100 [xmin,ymin,zmin] = [0.662654,0.607390,1.727219], [xmax,ymax,zmax] = [118.426432,118.278054,118.806462]
Running with points in [xmin,xmax] = 0.662654,118.426432 with periodic wrapping = 120.000000
Running with points in [ymin,ymax] = 0.607390,118.278054 with periodic wrapping = 120.000000
Running with points in [zmin,zmax] = 1.727219,118.806462 with periodic wrapping = 120.000000
In gridlink_double> Running with [nmesh_x, nmesh_y, nmesh_z]  = 5,5,2.  Time taken =   0.000 sec
countpairs_rp_pi_double> gridlink seems inefficient. nmesh = (5, 5, 2); avg_np = 0.22. Boosting bin refine factor - should lead to better performance
xmin = 0.662654 xmax=118.426432 rpmax = 45.000000
In gridlink_double> Running with [nmesh_x, nmesh_y, nmesh_z]  = 8,8,2.  Time taken =   0.000 sec
In gridlink_double> Running with [nmesh_x, nmesh_y, nmesh_z]  = 8,8,2.  Time taken =   0.000 sec
Using fallback kernel
0%.........10%.........20%.........30%.........40%.........50%.........60%.........70%.........80%.........90%.........100% done. Time taken =  0.001 secs


testing particles one by one...
    def missed part: 41.087356477185 48.94941175529149 48 0.0
dd corrfunc: [362.  93.]
dd brute   : [387.  96.]
Traceback (most recent call last):
  File "corrfunc_test.py", line 99, in <module>
    assert np.array_equal(dd_cf_sum, dd)
AssertionError
@beckermr
Copy link
Author

@aphearin @manodeep I have been banging my head against this for a week. Here is my most minimal example of something I don't understand. Hopefully, it is my code!

@manodeep
Copy link
Owner

Thanks for the bug report @beckermr - let me check whether I can reproduce it. I did have a couple of questions though - i) Does the bug still appear if you reduce zmax to a value smaller than Lbox/2, say 55 ? and ii) Can you reproduce the bug with Corrfunc installed from source/pip? (Related, I thought that the conda installed Corrfunc was called corrfunc - is that not the case?)

@lgarrison
Copy link
Collaborator

lgarrison commented Jan 28, 2020 via email

@beckermr
Copy link
Author

@manodeep yes I can reproduce the bug in both of those cases.

@manodeep
Copy link
Owner

I suspect this is connected to the relative size of rmax and zmax to Lbox. For instance, if I change (xbin, ybin, zbin) refine-factors to (1, 1, 1), then we have many mis-matches. And if I increase all the bin-refine-factors to >= 3, then the bug disappears. Since allowing such large relative separations is a new feature introduced with #189 (via commit e53bce3), may be a bug was introduced.

@lgarrison At minimum, we will need a separate test case (from a brute force output) to test these sorts of large separations.

@manodeep
Copy link
Owner

manodeep commented Feb 1, 2020

If I disable the duplicate cell-checking in here, then the bug disappears. However, we need to work through the logic and make sure we are not double-counting under any circumstances...

@lgarrison
Copy link
Collaborator

@manodeep is it possible that we we are missing particles when we do duplicate cell elimination because we are only considering one "wrapping" direction? I think we need to consider both, the question is how to do so without double counting pairs. Probably the answer lies in some restriction llike Rmax < CellSize/2...

@aphearin
Copy link
Contributor

aphearin commented Feb 1, 2020

Once this is resolved, do y'all have a brute force pair-counter test in the unit-testing to protect against regression? I find it's really hard to fool those kind of tests. Let me know in case that would be useful for me to pass along your way.

@manodeep
Copy link
Owner

manodeep commented Feb 1, 2020

@lgarrison That is definitely what is happening here. Consider zmax = Lbox/2, and we have 2 bins. In that case, the top half of iz=0 can be within zmax of the bottom half of the iz=1, and the bottom half of the iz=0 can be within the top half of iz=1 cell after periodic wrapping. However, we will only associate (iz=0, iz=1) once and without periodic wrapping. Essentially the duplicate check has to needs to make sure both directions of the cell-pairs are included. For this specific scenario, no double-counting should occur but I am not sure about all the possible combinations of bin-refs, min-sep-opt, and various zmax/Lbox ratios ranging between [0.3-0.5] (i.e, between 2 <= nzbins <= 3. I am reasonably confident that once nzbins > 3, Corrfunc works correctly. We would need to run the tests with the COMPREHENSIVE_INTEGRATION_TESTS` option enabled.

@aphearin Thanks for the offer - we will absolutely need to add tests for large Rmax/zmax for all the pair-counters.

@lgarrison In the meantime, should we revert the previous change (introduced in v2.3.2) for allowing large ratios of (Rmax, zmax)/Lboox?

@lgarrison
Copy link
Collaborator

@manodeep Yes, I think since we happen to be making a release right now anyways we should revert this change while we consider the best way to fix this.  Let's double-check with @beckermr that it fixes his problem before releasing!

@manodeep
Copy link
Owner

manodeep commented Feb 1, 2020

@lgarrison If we revert the change, then Corrfunc will report an error for (nx, ny, nzbins < 3) and @beckermr will not be able to run this example

@manodeep
Copy link
Owner

@beckermr This should be fixed now with #277. As long as Rmax < Lbox/2, Corrfunc should correctly compute the pairs.

Closing the issue but please feel free to reopen if you are still facing any issues

@beckermr
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants