Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing features and todos for score estimation #1226

Open
janfb opened this issue Aug 19, 2024 · 4 comments
Open

missing features and todos for score estimation #1226

janfb opened this issue Aug 19, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@janfb
Copy link
Contributor

janfb commented Aug 19, 2024

there are a couple of unsolved problems and enhancements for NPSE:

MAP

MAP is using the score directly for doing gradient ascent on the posterior to find to MAP. this is currently not working accurately

IID sampling

  • iid sampling is implemented as proposed in Geffner et al, i.e., using the iid_bridge and by accumulating the individual scores over a batch of iid samples. However, this is not working accurately either. I have not found the source of the error yet, I added a couple of TODOs, e.g., here:
    # TODO: for iid setting, self.batch_shape.numel() will be the iid-batch. But we
    # don't want to generate num_obs samples, but only one sample given the the iid
    # batch.
    # TODO: the solution will probably be to distinguish between the iid setting and
    # batched sampling setting with a flag.
    # TODO: this fixes the iid setting shape problems, but iid inference via
    # iid_bridge is not accurate.
    # num_batch = self.batch_shape.numel()
    # init_shape = (num_batch, num_samples) + self.input_shape
    init_shape = (
    num_samples,
    ) + self.input_shape # just use num_samples, not num_batch
    # NOTE: for the IID setting we might need to scale the noise with iid batch
    # size, as in equation (7) in the paper.
    eps = torch.randn(init_shape, device=self.device)
    mean, std, eps = torch.broadcast_tensors(self.init_mean, self.init_std, eps)
    return mean + std * eps

Log prob and sampling via CNF

Once trained, we can use the score_estimator to define a probabilistic ODE, e.g., a CNF via zuko and directly call log_prob and sample on it. At the moment this is already happening when constructing the ScorePosterior with sample_with="ode". However, it is a bit all over the place, e.g., log_prob is coming from the potential via zuko anyways, and the for the sampling we construct a flow with each call. A possible solution to make things clearer is creating a ODEPosterior that could be used by flow matching as well.

Allow transforms for potential

See score_estimator_based_potential, which currently asserts whether theta enable_transform=False

Better converged checks

Unlike the ._converged method in base.py, this method does not reset to the best model. We noticed that this improves performance. Deleting this method will make C2ST tests fail. This is because the loss is very stochastic, so resetting might reset to an underfitted model. Ideally, we would write a custom ._converged() method which checks whether the loss is still going down for all t.

@janfb janfb added the enhancement New feature or request label Aug 19, 2024
@gmoss13 gmoss13 self-assigned this Sep 6, 2024
@gmoss13
Copy link
Contributor

gmoss13 commented Sep 6, 2024

MAP

MAP is using the score directly for doing gradient ascent on the posterior to find to MAP. this is currently not working accurately

Is it the case that it does not work accurately, or that this doesn't run at all? Trying to find the MAP with gradient ascent requires differentiating through the backward method of zuko CNF's, which causes autograd errors for me.

Regardless, even if we can backprop through log_prob as constructed with CNFs, this would be incredibly slow as evaluating the log prob in this way requires an ODE solve. I am wondering if we could instead find an approximate MAP by using a variational lower bound of the log prob, e.g. Eq. 11 of Maximum Likelihood Training of Score-Based Diffusion Models. This way we don't need to compute a lot of ODE solves. @manuelgloeckler do you have any thoughts on this?

@gmoss13
Copy link
Contributor

gmoss13 commented Sep 11, 2024

Update after talking to @manuelgloeckler, the easiest way to calculate the MAP here would be to use the score directly at a time t = epsilon, instead of calculating and gradding through the exact log_prob. which as stated above would be really inefficient. I will implement this soon.

@janfb
Copy link
Contributor Author

janfb commented Sep 11, 2024

Update after talking to @manuelgloeckler, the easiest way to calculate the MAP here would be to use the score directly at a time t = epsilon, instead of calculating and gradding through the exact log_prob. which as stated above would be really inefficient. I will implement this soon.

I think that's actually what @michaeldeistler had implemented already. It's in the backup branch, here:

sbi/sbi/utils/sbiutils.py

Lines 946 to 954 in 2b233ce

optimize_inits.requires_grad_(False) # type: ignore
gradient = potential_fn.gradient(optimize_inits)
except (NotImplementedError, AttributeError):
optimize_inits.requires_grad_(True) # type: ignore
probs = potential_fn(optimize_inits).squeeze()
loss = probs.sum()
loss.backward()
gradient = optimize_inits.grad
assert isinstance(gradient, Tensor), "Gradient must be a tensor."

and then in case of the score-based potential it would just use the gradient directly from here:

def gradient(
self, theta: Tensor, time: Optional[Tensor] = None, track_gradients: bool = True
) -> Tensor:
r"""Returns the potential function gradient for score-based methods.
Args:
theta: The parameters at which to evaluate the potential.
time: The diffusion time. If None, then `t_min` of the
self.score_estimator is used (i.e. we evaluate the gradient of the
actual data distribution).
track_gradients: Whether to track gradients.
Returns:
The gradient of the potential function.
"""
if time is None:
time = torch.tensor([self.score_estimator.t_min])
if self._x_o is None:
raise ValueError(
"No observed data x_o is available. Please reinitialize \
the potential or manually set self._x_o."
)
with torch.set_grad_enabled(track_gradients):
if not self.x_is_iid or self._x_o.shape[0] == 1:
score = self.score_estimator.forward(
input=theta, condition=self.x_o, time=time
)

Or are you referring to yet a different approach?

@gmoss13
Copy link
Contributor

gmoss13 commented Dec 3, 2024

@manuelgloeckler ping re: IID sampling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants