-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unnecessary vpf check? #237
Comments
it's a sanity check to make sure that the total number of spheres requested is sensible. If the total volume exceeds the box volume, then there must be some non-independent data-points, and therefore, the total number of spheres used is no longer a fair representation of the amount of scatter expected. In those cases, it is betterr to use a smaller number of random spheres (which might overlap). Does that make sense? |
I guess I was thinking about it like this. The VPF tells you what the probability is of having 0 (or N) particles in a sphere thrown down at random in a box. So if we throw down as many such random spheres as possible, isn't the fraction with N exactly what we want to know? I agree a single box is only going to be a single sampling of the "true" VPF, but I would think limiting the number of spheres just gives you a noisier estimate of the VPF, not a less biased one. Perhaps the real issue is that one shouldn't trust the large-radius VPF measurements from one box, but maybe one wants to be able to get a low-noise estimate of it, even if it is biased, so one can average over multiple boxes, for example. And at small scales, maybe it's useful to be able to "oversample" to beat down noise some more when it's at a radius that you do trust the measurement. |
I am not sure how repeatedly re-sampling the same volume would give a less noisy estimate of the (large-R) VPF. If anything, the larger number of spheres will make you underestimate the error on the VPF. Of course, the only way to really get the scatter in the VPF is by looking at the scatter in the measurement over many different realisations. Having said all that, I have never tested the bias-variance - seems like an easy enough test to run and check whether limiting the total volume is meaningful. And I can certainly see an use-case for a keyword parameter that allows the user to over-ride the volume check. |
For the large-R VPF, if one is going out to 20% of the box, for example, then that's only 30 spheres. So the granularity of the VPF is only 3%! That's a pretty extreme case, but perhaps we should let the user make this choice. What do you think about changing the error to a warning? That's slightly less overhead than a new parameter, but I'm also fine with that. |
If the user really wants to probe Rmax to 20% of Lbox, then restricting the total number of spheres seems like a good idea! Though I like the idea of changing to a warning, I have found that people just ignore warnings. It would be a little more work to put in a |
Well, even a much more reasonable case, like 5% of the box, is only 2000
spheres, which may not be enough to capture rare small-scale events that
may nonetheless hold interesting information.
If the user wants the vpf across a range of Rmax scales, we might say from
a computational perspective that it makes sense for the user to run vpf
many times, decreasing nspheres as one increases Rmax. But the structure
of the API doesn't really suggest that, since we are providing `nbins` to
allow simultaneously measuring across many scales. So... given that the
vpf is cheap, I am in favor of lifting the restriction on spheres
altogether and making it a warning! What do you think?
…On Tue, Nov 10, 2020 at 5:08 PM Manodeep Sinha ***@***.***> wrote:
If the user really wants to probe Rmax to 20% of Lbox, then restricting
the total number of spheres seems like a good idea!
Though I like the idea of changing to a warning, I have found that people
just ignore warnings. It would be a little more work to put in a
override_volume_check keyword, but at least that will ensure that users
have to consciously make that choice.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLA7S77LXC5TQCUKYZPVDLSPG2WTANCNFSM4TNEVXUA>
.
--
Lehman Garrison <[email protected]>
Flatiron Research Fellow, Cosmology X Data Science Group
Center for Computational Astrophysics, Flatiron Institute
lgarrison.github.io
|
Are you concerned with the extra code required to over-ride the default behaviour? To me, it seems like adding a boolean keyword I am not sure how you would capture a rare, small-scale event with the vpf? The measurement (i.e., the mean of |
Yes, it's a combination of not wanting to add more surface area to the API, and a lingering feeling that this check shouldn't be there at all. The condition forces you to miss a lot of sphere configurations! By "rare events", I mean low-amplitude counts-in-cells. I think for any radius (doesn't have to be small), if you go to high enough (or small enough) counts, the probability p(N) will be very low. And then you probably want at least 10/p(N) spheres to get above Poisson noise. But the volume condition doesn't know anything about p(N), so in many cases it won't let you use the appropriate number of spheres! |
Seems like you feel fairly strongly about this :). Changing to a warning falls under the same scope as checking for the ratio - Since the change will alter the default behaviour, should we milestone for a minor release (i.e., |
Sorry! I swear I wasn't planning on dying on this hill... I haven't worked with the VPF before, so I'm just trying to think it through and make sure I've wrapped my head around what's going on. And hopefully doing so will help newcomers to the code, especially since I think many students use corrfunc! So do you think you agree with my assessment? If so, then yes, 2.4.0 sounds good to me. Do you think we could also do the boxsize change (#199) in that release? |
Hehehe - definitely seemed like the hill :). All good - let's milestone this change for Seems like the boxsize PR has some merge conflicts - are you able to fix those? Otherwise, I think that one is good to go but I will take one final look (it's been a while) I am assuming this means that the next release will be |
Yes, sounds good, I'll take a look tomorrow. |
Following the discussion in issue manodeep#237 and after conversations with LGarrison, I've completely removed the check that confirmed that the volume inhabited by the dropped spheres was less than the volume of the sample. This check was unnecessary for the VPF to run, and actively discouraged the oversampling of the sample volume with test spheres that one needs to precisely measure the VPF at a given radius.
Hi @manodeep! I'm Lucia, and I'm the one who brought up this issue to @lgarrison while using CORRFUNC for my current projects. I was throwing CORRFUNC at much smaller volumes that I'm sure you wrote this for, and was struggling with the code stumbling for radii where I knew the VPF could still be trustworthy and measured, and finding it was giving inconsistent results because of the undersampling the check imposed. (So if anyone is dying on this hill, it's me, lol.) My paper that's in press right now is very relevant here, and @lgarrison 's mentioned some things that I brought up to him from my work: https://arxiv.org/abs/2011.03556. Specifically, Section 3.3 discusses some common-sense limits of when the VPF can be measured given a sample's density and volume, which quantifies the concern of when a radius is too large. Still, regardless of whether the radius you're measuring the VPF for is valid or can give accurate results, one should still oversample with the dropped test spheres to at least guarantee that the act of measuring the VPF isn't adding additional uncertainty. I'm about to PR a branch with the updated vpf.py that completely omits the check, though I did draft one earlier that gave folks the option to override the check with a flag. CORRFUNC is awesome and I'm happy to contribute a tiny improvement to it! |
Hi @laperezNYC - my name is Manodeep - nice to e-meet you! Thank you for using Corrfunc and even more thanks for contributing an improvement. |
* Removed independent volume check in vpf.py Following the discussion in issue #237 and after conversations with LGarrison, I've completely removed the check that confirmed that the volume inhabited by the dropped spheres was less than the volume of the sample. This check was unnecessary for the VPF to run, and actively discouraged the oversampling of the sample volume with test spheres that one needs to precisely measure the VPF at a given radius. * Update vpf.py Without the check of independent volumes, there's no need to overtly calculate the volume the samples inhabit. * Update CHANGES.rst Updated 2.4.0 with this new enhancement to theory.vpf * Update CHANGES.rst I promise I can type! * Tweaked wording @laperezNYC Tweaked the wording a bit. If this looks acceptable to you, then I will merge in the PR Co-authored-by: Manodeep Sinha <[email protected]>
There's a check in vpf to make sure the total volume in spheres doesn't exceed the box volume: https://github.com/manodeep/Corrfunc/blob/master/Corrfunc/theory/vpf.py#L207
But I'm not sure that condition is necessary? Spheres are allowed to overlap in vpf, right?
@manodeep do you know why this is there?
The text was updated successfully, but these errors were encountered: