WL hierarchical inference theory #23

drphilmarshall · 2015-03-11T20:49:41Z

We have a simple PGM for this problem, but now we need to:

Expand that into maths
Discuss!
Translate that in to a latex document, that describes how each PDF might be handled (samples, weighting, etc)
Agree on this
Make some simple feasibility calculations to see how computationally intensive this inference will be, and how it could be made tractable (eg which parts can be parallelised and how, whether there are any speed-ups that can be made, etc).

@beckermr and @tcollett, let's do this together! Tom suggests we attempt to work this through independently and then compare, what do you think?

drphilmarshall · 2015-04-15T21:16:21Z

Hi!

I have worked through the probability theory for our enormous inference: no galaxy clustering or clusters yet, "just" stellar masses, photo-zs and weak lensing, but still, its a start. Here are my notes, hopefully both legible and interesting!

What do you think?

beckermr · 2015-04-17T05:16:27Z

@tcollett Some discussion of this happened on the GFC KIPAC slack. Changes coming...

drphilmarshall · 2015-04-17T06:01:55Z

OK, link is replaced with better PDF. The intuition that we need a sum over sample catalogs is still good, but now the maths actually backs that up. Good catch, @beckermr!
In these new notes you will find the following amusing sentence:

"We replace this 5 billion dimensional integral with a sum over 12 sample catalogs" - I guess we'll just have to see how that works out :-)

drphilmarshall · 2015-04-20T10:58:12Z

@tcollett Here are some more notes, from an offline discussion with @beckermr:

The PGM w/lensing looks funny, because the j subscript somehow implies independence when you have a single box representing each member of the set of galaxies. We probably want to use vector notation to signify "all halos" and then discuss in words the identification of a particular subset of foreground halos for each source. I'm not sure if we then dispense with the plate... I like it because it delineates the individual object parameters from the global hyper-parameters, but let's check the PGM rulebook.
A significant concern is the usual importance sampling failure mode, ie running out of samples. We are drawing halos from the interim prior (or maybe proposal distribution is more descriptive here?) and grading each configuration: this approach typically starts to fail as the data get very informative and the samples get down-weighted to very low probabilities. And as a technical point, none of the 10^8 importances can be allowed to go to exactly zero! (In other words, we must be watchful of numerical issues). In practice, these considerations mean that every sample catalog needs to be a plausible universe, which sounds tough but a) the WL likelihood is very forgiving and b) this statement can be recast as a need to draw Mh samples very efficiently. The trick is going to be designing Pr(Mh|U) - this is a good place for us to focus some attention,
Hogg and Foreman-Mackey pointed out to me that with 10 sample catalogs of K galaxies, we actually have something like 10^K sample realizations of the mass distribution, because we can mix and match the samples (a consequence of the galaxies are independently measured). We could try first with just 10 catalogs, but then we can draw some alternate permutations to investigate the robustness of results as the number of permutations increases.
The lensing signal applied to a given galaxy only depends on the stuff in front of it. Suppose we had 10 disjoint patches, separated by some large angle. Then the likelihood could be computed independently for each patch, to a good approximation. What this means, is that the integral over halo masses could be factorized at some level, but the patch boundaries are “fuzzy" and so would have to be treated carefully. In practice of course we'll have to start with some sky binning, just to make anything work!
The galaxy clustering signals are very constraining, which should sound a warning bell when thinking about importance sampling. However, in Pangloss the galaxy positions are assumed fixed, so even if the halos are allowed to move around a little bit (as various mis-centring studies show that they need to be), the correlation function of the model halos is never going to be very far from the observed galaxy one, by construction. Now, suppose we tried to make a shortcut, and we implemented galaxy clustering via summary statistics like some sort of 2-point halo-mass marked/weighted correlation function (which I suppose will have a good prediction from theory): if we don't draw our halo masses from something sensible, I agree we might fail to fit this summary well. So, the answer will probably still be to draw the halo masses intelligently, from some joint PDF that ensures that the model halo correlation function always stays close to the prediction... It'll be fun to think about recipes for doing this!

tcollett · 2015-04-23T17:19:08Z

Ok, I have been thinking about this; there are too many things to write in
an email. I'll talk to Phil next week.

It seems to me that the probability theory isn't too bad, but the
computational issues are potentially very hard. It seems vital to me that
we build a highly simplistic toy model with which to test the inference
problem, and then see how the feasibility scales as we make the parameters
of the toy model more complicated and increase the number of halos +
sources that the toy has to do inference with.

On 20 April 2015 at 11:58, Phil Marshall [email protected] wrote:

@tcollett https://github.com/tcollett Here are some more notes, from an
offline discussion with @beckermr https://github.com/beckermr:

The PGM w/lensing looks funny, because the j subscript somehow implies
independence when you have a single box representing each member of the set
of galaxies. We probably want to use vector notation to signify "all halos"
and then discuss in words the identification of a particular subset of
foreground halos for each source. I'm not sure if we then dispense with the
plate... I like it because it delineates the individual object parameters
from the global hyper-parameters, but let's check the PGM rulebook.

A significant concern is the usual importance sampling failure mode,
ie running out of samples. We are drawing halos from the interim prior (or
maybe proposal distribution is more descriptive here?) and grading each
configuration: this approach typically starts to fail as the data get very
informative and the samples get down-weighted to very low probabilities.
And as a technical point, none of the 10^8 importances can be allowed to go
to exactly zero! (In other words, we must be watchful of numerical issues).
In practice, these considerations mean that every sample catalog needs to
be a plausible universe, which sounds tough but a) the WL likelihood is
very forgiving and b) this statement can be recast as a need to draw
Mh samples very efficiently. The trick is going to be designing Pr(Mh|U) -
this is a good place for us to focus some attention,

Hogg and Foreman-Mackey pointed out to me that with 10 sample catalogs
of K galaxies, we actually have something like 10^K sample realizations of
the mass distribution, because we can mix and match the samples (a
consequence of the galaxies are independently measured). We could try first
with just 10 catalogs, but then we can draw some alternate permutations to
investigate the robustness of results as the number of permutations
increases.

The lensing signal applied to a given galaxy only depends on the stuff
in front of it. Suppose we had 10 disjoint patches, separated by some large
angle. Then the likelihood could be computed independently for each patch,
to a good approximation. What this means, is that the integral over halo
masses could be factorized at some level, but the patch boundaries are
“fuzzy" and so would have to be treated carefully. In practice of course
we'll have to start with some sky binning, just to make anything work!

The galaxy clustering signals are very constraining, which should
sound a warning bell when thinking about importance sampling. However, in
Pangloss the galaxy positions are assumed fixed, so even if the halos are
allowed to move around a little bit (as various mis-centring studies show
that they need to be), the correlation function of the model halos is never
going to be very far from the observed galaxy one, by construction. Now,
suppose we tried to make a shortcut, and we implemented galaxy clustering
via summary statistics like some sort of 2-point halo-mass marked/weighted
correlation function (which I suppose will have a good prediction from
theory): if we don't draw our halo masses from something sensible, I agree
we might fail to fit this summary well. So, the answer will probably still
be to draw the halo masses intelligently, from some joint PDF that ensures
that the model halo correlation function always stays close to the
prediction... It'll be fun to think about recipes for doing this!

—
Reply to this email directly or view it on GitHub
#23 (comment)
.

mrbecker · 2015-04-23T17:23:26Z

please remove me!

On Apr 23, 2015, at 10:19 AM, tcollett [email protected] wrote:

Ok, I have been thinking about this; there are too many things to write in
an email. I'll talk to Phil next week.

It seems to me that the probability theory isn't too bad, but the
computational issues are potentially very hard. It seems vital to me that
we build a highly simplistic toy model with which to test the inference
problem, and then see how the feasibility scales as we make the parameters
of the toy model more complicated and increase the number of halos +
sources that the toy has to do inference with.

On 20 April 2015 at 11:58, Phil Marshall [email protected] wrote:

@tcollett https://github.com/tcollett Here are some more notes, from an
offline discussion with @beckermr https://github.com/beckermr:

The PGM w/lensing looks funny, because the j subscript somehow implies
independence when you have a single box representing each member of the set
of galaxies. We probably want to use vector notation to signify "all halos"
and then discuss in words the identification of a particular subset of
foreground halos for each source. I'm not sure if we then dispense with the
plate... I like it because it delineates the individual object parameters

from the global hyper-parameters, but let's check the PGM rulebook.

A significant concern is the usual importance sampling failure mode,
ie running out of samples. We are drawing halos from the interim prior (or
maybe proposal distribution is more descriptive here?) and grading each
configuration: this approach typically starts to fail as the data get very
informative and the samples get down-weighted to very low probabilities.
And as a technical point, none of the 10^8 importances can be allowed to go
to exactly zero! (In other words, we must be watchful of numerical issues).
In practice, these considerations mean that every sample catalog needs to
be a plausible universe, which sounds tough but a) the WL likelihood is
very forgiving and b) this statement can be recast as a need to draw
Mh samples very efficiently. The trick is going to be designing Pr(Mh|U) -

this is a good place for us to focus some attention,

Hogg and Foreman-Mackey pointed out to me that with 10 sample catalogs
of K galaxies, we actually have something like 10^K sample realizations of
the mass distribution, because we can mix and match the samples (a
consequence of the galaxies are independently measured). We could try first
with just 10 catalogs, but then we can draw some alternate permutations to
investigate the robustness of results as the number of permutations

increases.

The lensing signal applied to a given galaxy only depends on the stuff
in front of it. Suppose we had 10 disjoint patches, separated by some large
angle. Then the likelihood could be computed independently for each patch,
to a good approximation. What this means, is that the integral over halo
masses could be factorized at some level, but the patch boundaries are
“fuzzy" and so would have to be treated carefully. In practice of course

we'll have to start with some sky binning, just to make anything work!

The galaxy clustering signals are very constraining, which should
sound a warning bell when thinking about importance sampling. However, in
Pangloss the galaxy positions are assumed fixed, so even if the halos are
allowed to move around a little bit (as various mis-centring studies show
that they need to be), the correlation function of the model halos is never
going to be very far from the observed galaxy one, by construction. Now,
suppose we tried to make a shortcut, and we implemented galaxy clustering
via summary statistics like some sort of 2-point halo-mass marked/weighted
correlation function (which I suppose will have a good prediction from
theory): if we don't draw our halo masses from something sensible, I agree
we might fail to fit this summary well. So, the answer will probably still
be to draw the halo masses intelligently, from some joint PDF that ensures
that the model halo correlation function always stays close to the
prediction... It'll be fun to think about recipes for doing this!

—
Reply to this email directly or view it on GitHub
#23 (comment)
.

—
Reply to this email directly or view it on GitHub #23 (comment).

Sincerely,

Rob
Robert Becker | www.RobertBecker.com http://www.robertbecker.com/
Animation | Digital Rendering | Watercolor
925.254.4234 | 53 Parklane Drive, Orinda, CA 94563

drphilmarshall · 2015-04-23T18:58:01Z

Sorry @mrbecker! I removed you from the repo, but I now see that you are still on this thread - and I can't see how to take you off. So, everyone: I am going to lock this issue, and start some new ones based on the discussion so far. Thanks for all your input!

drphilmarshall added latex weak lensing labels Mar 11, 2015

drphilmarshall self-assigned this Mar 11, 2015

drphilmarshall added this to the Start of Summer 2015 milestone Mar 11, 2015

drphilmarshall mentioned this issue Apr 16, 2015

Calculations in databases? #26

Open

drphilmarshall closed this as completed Apr 23, 2015

Repository owner locked and limited conversation to collaborators Apr 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WL hierarchical inference theory #23

WL hierarchical inference theory #23

drphilmarshall commented Mar 11, 2015

drphilmarshall commented Apr 15, 2015

beckermr commented Apr 17, 2015

drphilmarshall commented Apr 17, 2015

drphilmarshall commented Apr 20, 2015

tcollett commented Apr 23, 2015

mrbecker commented Apr 23, 2015

from the global hyper-parameters, but let's check the PGM rulebook.

this is a good place for us to focus some attention,

increases.

we'll have to start with some sky binning, just to make anything work!

drphilmarshall commented Apr 23, 2015

WL hierarchical inference theory #23

WL hierarchical inference theory #23

Comments

drphilmarshall commented Mar 11, 2015

drphilmarshall commented Apr 15, 2015

beckermr commented Apr 17, 2015

drphilmarshall commented Apr 17, 2015

drphilmarshall commented Apr 20, 2015

tcollett commented Apr 23, 2015

mrbecker commented Apr 23, 2015

from the global hyper-parameters, but let's check the PGM rulebook.

this is a good place for us to focus some attention,

increases.

we'll have to start with some sky binning, just to make anything work!

drphilmarshall commented Apr 23, 2015